LightRAG is a simple and fast Retrieval-Augmented Generation framework that supports multiple query modes and knowledge graph construction.
LightRAG - Simple and Fast Retrieval-Augmented Generation Framework
Project Overview
LightRAG is a "Simple and Fast Retrieval-Augmented Generation" framework developed by the Department of Data Science at the University of Hong Kong (HKUDS). This project aims to provide developers with a complete RAG (Retrieval-Augmented Generation) solution, supporting document indexing, knowledge graph construction, and intelligent question answering.
Core Features
🔍 Multiple Retrieval Modes
LightRAG supports five different retrieval modes to meet various scenario requirements:
- naive mode: Basic search without advanced techniques
- local mode: Focuses on retrieving context-related information
- global mode: Utilizes global knowledge for retrieval
- hybrid mode: Combines local and global retrieval methods
- mix mode: Integrates knowledge graph and vector retrieval, providing the most comprehensive answers
🎯 Knowledge Graph Construction
- Automatically extracts entities and relationships from documents
- Supports visualization of the knowledge graph
- Provides CRUD (Create, Read, Update, Delete) operations for entities and relationships
- Supports entity merging and deduplication
🚀 Flexible Model Support
- OpenAI models: Supports OpenAI series models such as GPT-4
- Hugging Face models: Supports locally deployed open-source models
- Ollama models: Supports locally run quantized models
- LlamaIndex integration: Supports more model providers through LlamaIndex
📊 Diverse Storage Backends
- Vector databases: Supports Faiss, PGVector, etc.
- Graph databases: Supports Neo4j, PostgreSQL+Apache AGE
- Default storage: Built-in NetworkX graph storage
Installation
Install from PyPI
pip install "lightrag-hku[api]"
Install from Source
# Create a Python virtual environment (if necessary)
# Install in editable mode, including API support
pip install -e ".[api]"
Basic Usage Example
Initialization and Query
import os
import asyncio
from lightrag import LightRAG, QueryParam
from lightrag.llm.openai import gpt_4o_mini_complete, openai_embed
from lightrag.kg.shared_storage import initialize_pipeline_status
from lightrag.utils import setup_logger
setup_logger("lightrag", level="INFO")
async def initialize_rag():
rag = LightRAG(
working_dir="your/path",
embedding_func=openai_embed,
llm_model_func=gpt_4o_mini_complete
)
await rag.initialize_storages()
await initialize_pipeline_status()
return rag
def main():
rag = asyncio.run(initialize_rag())
rag.insert("Your text")
result = rag.query(
"What are the top themes in this story?",
param=QueryParam(mode="mix")
)
print(result)
if __name__ == "__main__":
main()
Advanced Features
Conversation History Support
# Create conversation history
conversation_history = [
{"role": "user", "content": "What is the main character's attitude towards Christmas?"},
{"role": "assistant", "content": "At the beginning of the story, Ebenezer Scrooge has a very negative attitude towards Christmas..."},
{"role": "user", "content": "How does his attitude change?"}
]
# Create query parameters with conversation history
query_param = QueryParam(
mode="mix", # or any other mode: "local", "global", "hybrid"
conversation_history=conversation_history, # Add the conversation history
history_turns=3 # Number of recent conversation turns to consider
)
# Make a query that takes into account the conversation history
response = rag.query(
"What causes this change in his character?",
param=query_param
)
Knowledge Graph Management
# Create new entity
entity = rag.create_entity("Google", {
"description": "Google is a multinational technology company specializing in internet-related services and products.",
"entity_type": "company"
})
# Create another entity
product = rag.create_entity("Gmail", {
"description": "Gmail is an email service developed by Google.",
"entity_type": "product"
})
# Create relation between entities
relation = rag.create_relation("Google", "Gmail", {
"description": "Google develops and operates Gmail.",
"keywords": "develops operates service",
"weight": 2.0
})
LightRAG Server
Web UI Features
LightRAG Server provides a complete web interface, including:
- Document index management
- Knowledge graph visualization
- Simple RAG query interface
- Supports gravity layout, node query, subgraph filtering, and other functions
API Interface
- Provides RESTful API interface
- Compatible with Ollama API format
- Supports AI chatbot integration (e.g., Open WebUI)
Configuration Parameters
Core Parameters
working_dir
: Working directory pathembedding_func
: Embedding functionllm_model_func
: Large language model functionvector_storage
: Vector storage typegraph_storage
: Graph storage type
Performance Tuning Parameters
embedding_batch_size
: Embedding batch size (default 32)embedding_func_max_async
: Maximum concurrent embedding processes (default 16)llm_model_max_async
: Maximum concurrent LLM processes (default 4)enable_llm_cache
: Whether to enable LLM caching (default True)
Data Export and Backup
Supports data export in various formats:
#Export data in CSV format
rag.export_data("graph_data.csv", file_format="csv")
# Export data in Excel sheet
rag.export_data("graph_data.xlsx", file_format="excel")
# Export data in markdown format
rag.export_data("graph_data.md", file_format="md")
# Export data in Text
rag.export_data("graph_data.txt", file_format="txt")
Token Usage Tracking
Built-in token consumption monitoring tool:
from lightrag.utils import TokenTracker
# Create TokenTracker instance
token_tracker = TokenTracker()
# Method 1: Using context manager (Recommended)
# Suitable for scenarios requiring automatic token usage tracking
with token_tracker:
result1 = await llm_model_func("your question 1")
result2 = await llm_model_func("your question 2")
# Method 2: Manually adding token usage records
# Suitable for scenarios requiring more granular control over token statistics
token_tracker.reset()
rag.insert()
rag.query("your question 1", param=QueryParam(mode="naive"))
rag.query("your question 2", param=QueryParam(mode="mix"))
# Display total token usage (including insert and query operations)
print("Token usage:", token_tracker.get_usage())
Applicable Scenarios
Enterprise Knowledge Management
- Internal document retrieval and question answering
- Knowledge base construction and maintenance
- Technical documentation intelligent assistant
Academic Research
- Literature retrieval and analysis
- Knowledge graph construction research
- RAG system performance evaluation
Content Creation
- Writing assistance and material retrieval
- Multi-document content integration
- Intelligent content recommendation
Project Advantages
- Easy to Integrate: Provides simple Python API and REST API
- Highly Customizable: Supports multiple models and storage backends
- Performance Optimization: Supports batch processing and asynchronous processing
- Visualization: Built-in knowledge graph visualization function
- Enterprise-Grade: Supports enterprise-grade databases such as PostgreSQL
Summary
LightRAG is a comprehensive and easy-to-use RAG framework, especially suitable for scenarios that require building intelligent question answering systems and knowledge management platforms. Its flexible architecture design and rich feature set make it an excellent open-source solution in the RAG field.