Home
Login

LightRAG is a simple and fast Retrieval-Augmented Generation framework that supports multiple query modes and knowledge graph construction.

MITPython 17.7kHKUDSLightRAG Last Updated: 2025-06-23

LightRAG - Simple and Fast Retrieval-Augmented Generation Framework

Project Overview

LightRAG is a "Simple and Fast Retrieval-Augmented Generation" framework developed by the Department of Data Science at the University of Hong Kong (HKUDS). This project aims to provide developers with a complete RAG (Retrieval-Augmented Generation) solution, supporting document indexing, knowledge graph construction, and intelligent question answering.

Core Features

🔍 Multiple Retrieval Modes

LightRAG supports five different retrieval modes to meet various scenario requirements:

  • naive mode: Basic search without advanced techniques
  • local mode: Focuses on retrieving context-related information
  • global mode: Utilizes global knowledge for retrieval
  • hybrid mode: Combines local and global retrieval methods
  • mix mode: Integrates knowledge graph and vector retrieval, providing the most comprehensive answers

🎯 Knowledge Graph Construction

  • Automatically extracts entities and relationships from documents
  • Supports visualization of the knowledge graph
  • Provides CRUD (Create, Read, Update, Delete) operations for entities and relationships
  • Supports entity merging and deduplication

🚀 Flexible Model Support

  • OpenAI models: Supports OpenAI series models such as GPT-4
  • Hugging Face models: Supports locally deployed open-source models
  • Ollama models: Supports locally run quantized models
  • LlamaIndex integration: Supports more model providers through LlamaIndex

📊 Diverse Storage Backends

  • Vector databases: Supports Faiss, PGVector, etc.
  • Graph databases: Supports Neo4j, PostgreSQL+Apache AGE
  • Default storage: Built-in NetworkX graph storage

Installation

Install from PyPI

pip install "lightrag-hku[api]"

Install from Source

# Create a Python virtual environment (if necessary)
# Install in editable mode, including API support
pip install -e ".[api]"

Basic Usage Example

Initialization and Query

import os
import asyncio
from lightrag import LightRAG, QueryParam
from lightrag.llm.openai import gpt_4o_mini_complete, openai_embed
from lightrag.kg.shared_storage import initialize_pipeline_status
from lightrag.utils import setup_logger

setup_logger("lightrag", level="INFO")

async def initialize_rag():
    rag = LightRAG(
        working_dir="your/path",
        embedding_func=openai_embed,
        llm_model_func=gpt_4o_mini_complete
    )
    await rag.initialize_storages()
    await initialize_pipeline_status()
    return rag

def main():

    rag = asyncio.run(initialize_rag())
    

    rag.insert("Your text")
    

    result = rag.query(
        "What are the top themes in this story?",
        param=QueryParam(mode="mix")
    )
    print(result)

if __name__ == "__main__":
    main()

Advanced Features

Conversation History Support

# Create conversation history
conversation_history = [
    {"role": "user", "content": "What is the main character's attitude towards Christmas?"},
    {"role": "assistant", "content": "At the beginning of the story, Ebenezer Scrooge has a very negative attitude towards Christmas..."},
    {"role": "user", "content": "How does his attitude change?"}
]

# Create query parameters with conversation history
query_param = QueryParam(
    mode="mix",  # or any other mode: "local", "global", "hybrid"
    conversation_history=conversation_history,  # Add the conversation history
    history_turns=3  # Number of recent conversation turns to consider
)

# Make a query that takes into account the conversation history
response = rag.query(
    "What causes this change in his character?",
    param=query_param
)

Knowledge Graph Management

# Create new entity
entity = rag.create_entity("Google", {
    "description": "Google is a multinational technology company specializing in internet-related services and products.",
    "entity_type": "company"
})

# Create another entity
product = rag.create_entity("Gmail", {
    "description": "Gmail is an email service developed by Google.",
    "entity_type": "product"
})

# Create relation between entities
relation = rag.create_relation("Google", "Gmail", {
    "description": "Google develops and operates Gmail.",
    "keywords": "develops operates service",
    "weight": 2.0
})

LightRAG Server

Web UI Features

LightRAG Server provides a complete web interface, including:

  • Document index management
  • Knowledge graph visualization
  • Simple RAG query interface
  • Supports gravity layout, node query, subgraph filtering, and other functions

API Interface

  • Provides RESTful API interface
  • Compatible with Ollama API format
  • Supports AI chatbot integration (e.g., Open WebUI)

Configuration Parameters

Core Parameters

  • working_dir: Working directory path
  • embedding_func: Embedding function
  • llm_model_func: Large language model function
  • vector_storage: Vector storage type
  • graph_storage: Graph storage type

Performance Tuning Parameters

  • embedding_batch_size: Embedding batch size (default 32)
  • embedding_func_max_async: Maximum concurrent embedding processes (default 16)
  • llm_model_max_async: Maximum concurrent LLM processes (default 4)
  • enable_llm_cache: Whether to enable LLM caching (default True)

Data Export and Backup

Supports data export in various formats:

#Export data in CSV format
rag.export_data("graph_data.csv", file_format="csv")

# Export data in Excel sheet
rag.export_data("graph_data.xlsx", file_format="excel")

# Export data in markdown format
rag.export_data("graph_data.md", file_format="md")

# Export data in Text
rag.export_data("graph_data.txt", file_format="txt")

Token Usage Tracking

Built-in token consumption monitoring tool:

from lightrag.utils import TokenTracker

# Create TokenTracker instance
token_tracker = TokenTracker()

# Method 1: Using context manager (Recommended)
# Suitable for scenarios requiring automatic token usage tracking
with token_tracker:
    result1 = await llm_model_func("your question 1")
    result2 = await llm_model_func("your question 2")

# Method 2: Manually adding token usage records
# Suitable for scenarios requiring more granular control over token statistics
token_tracker.reset()

rag.insert()

rag.query("your question 1", param=QueryParam(mode="naive"))
rag.query("your question 2", param=QueryParam(mode="mix"))

# Display total token usage (including insert and query operations)
print("Token usage:", token_tracker.get_usage())

Applicable Scenarios

Enterprise Knowledge Management

  • Internal document retrieval and question answering
  • Knowledge base construction and maintenance
  • Technical documentation intelligent assistant

Academic Research

  • Literature retrieval and analysis
  • Knowledge graph construction research
  • RAG system performance evaluation

Content Creation

  • Writing assistance and material retrieval
  • Multi-document content integration
  • Intelligent content recommendation

Project Advantages

  1. Easy to Integrate: Provides simple Python API and REST API
  2. Highly Customizable: Supports multiple models and storage backends
  3. Performance Optimization: Supports batch processing and asynchronous processing
  4. Visualization: Built-in knowledge graph visualization function
  5. Enterprise-Grade: Supports enterprise-grade databases such as PostgreSQL

Summary

LightRAG is a comprehensive and easy-to-use RAG framework, especially suitable for scenarios that require building intelligent question answering systems and knowledge management platforms. Its flexible architecture design and rich feature set make it an excellent open-source solution in the RAG field.

Star History Chart