LightRAG is a "Simple and Fast Retrieval-Augmented Generation" framework developed by the Department of Data Science at the University of Hong Kong (HKUDS). This project aims to provide developers with a complete RAG (Retrieval-Augmented Generation) solution, supporting document indexing, knowledge graph construction, and intelligent question answering.
LightRAG supports five different retrieval modes to meet various scenario requirements:
pip install "lightrag-hku[api]"
# Create a Python virtual environment (if necessary)
# Install in editable mode, including API support
pip install -e ".[api]"
import os
import asyncio
from lightrag import LightRAG, QueryParam
from lightrag.llm.openai import gpt_4o_mini_complete, openai_embed
from lightrag.kg.shared_storage import initialize_pipeline_status
from lightrag.utils import setup_logger
setup_logger("lightrag", level="INFO")
async def initialize_rag():
rag = LightRAG(
working_dir="your/path",
embedding_func=openai_embed,
llm_model_func=gpt_4o_mini_complete
)
await rag.initialize_storages()
await initialize_pipeline_status()
return rag
def main():
rag = asyncio.run(initialize_rag())
rag.insert("Your text")
result = rag.query(
"What are the top themes in this story?",
param=QueryParam(mode="mix")
)
print(result)
if __name__ == "__main__":
main()
# Create conversation history
conversation_history = [
{"role": "user", "content": "What is the main character's attitude towards Christmas?"},
{"role": "assistant", "content": "At the beginning of the story, Ebenezer Scrooge has a very negative attitude towards Christmas..."},
{"role": "user", "content": "How does his attitude change?"}
]
# Create query parameters with conversation history
query_param = QueryParam(
mode="mix", # or any other mode: "local", "global", "hybrid"
conversation_history=conversation_history, # Add the conversation history
history_turns=3 # Number of recent conversation turns to consider
)
# Make a query that takes into account the conversation history
response = rag.query(
"What causes this change in his character?",
param=query_param
)
# Create new entity
entity = rag.create_entity("Google", {
"description": "Google is a multinational technology company specializing in internet-related services and products.",
"entity_type": "company"
})
# Create another entity
product = rag.create_entity("Gmail", {
"description": "Gmail is an email service developed by Google.",
"entity_type": "product"
})
# Create relation between entities
relation = rag.create_relation("Google", "Gmail", {
"description": "Google develops and operates Gmail.",
"keywords": "develops operates service",
"weight": 2.0
})
LightRAG Server provides a complete web interface, including:
working_dir
: Working directory pathembedding_func
: Embedding functionllm_model_func
: Large language model functionvector_storage
: Vector storage typegraph_storage
: Graph storage typeembedding_batch_size
: Embedding batch size (default 32)embedding_func_max_async
: Maximum concurrent embedding processes (default 16)llm_model_max_async
: Maximum concurrent LLM processes (default 4)enable_llm_cache
: Whether to enable LLM caching (default True)Supports data export in various formats:
#Export data in CSV format
rag.export_data("graph_data.csv", file_format="csv")
# Export data in Excel sheet
rag.export_data("graph_data.xlsx", file_format="excel")
# Export data in markdown format
rag.export_data("graph_data.md", file_format="md")
# Export data in Text
rag.export_data("graph_data.txt", file_format="txt")
Built-in token consumption monitoring tool:
from lightrag.utils import TokenTracker
# Create TokenTracker instance
token_tracker = TokenTracker()
# Method 1: Using context manager (Recommended)
# Suitable for scenarios requiring automatic token usage tracking
with token_tracker:
result1 = await llm_model_func("your question 1")
result2 = await llm_model_func("your question 2")
# Method 2: Manually adding token usage records
# Suitable for scenarios requiring more granular control over token statistics
token_tracker.reset()
rag.insert()
rag.query("your question 1", param=QueryParam(mode="naive"))
rag.query("your question 2", param=QueryParam(mode="mix"))
# Display total token usage (including insert and query operations)
print("Token usage:", token_tracker.get_usage())
LightRAG is a comprehensive and easy-to-use RAG framework, especially suitable for scenarios that require building intelligent question answering systems and knowledge management platforms. Its flexible architecture design and rich feature set make it an excellent open-source solution in the RAG field.