HKUDS/LightRAGPlease refer to the latest official releases for information GitHub Homepage

LightRAG는 간단하고 빠른 검색 증강 생성 프레임워크로, 다양한 쿼리 모드와 지식 그래프 구축을 지원합니다.

MITPython 17.7kHKUDS Last Updated: 2025-06-19

LightRAG - 간단하고 빠른 검색 증강 생성 프레임워크

프로젝트 개요

LightRAG는 홍콩대학교 데이터과학대학(HKUDS)에서 개발한 "간단하고 빠른 검색 증강 생성" 프레임워크입니다. 이 프로젝트는 개발자에게 문서 색인, 지식 그래프 구축 및 지능형 질의응답 기능을 지원하는 완벽한 RAG(Retrieval-Augmented Generation) 솔루션을 제공하는 것을 목표로 합니다.

핵심 기능

🔍 다양한 검색 모드

LightRAG는 다양한 시나리오 요구 사항을 충족하기 위해 다섯 가지의 서로 다른 검색 모드를 지원합니다.

naive 모드: 고급 기술을 사용하지 않는 기본 검색
local 모드: 문맥 관련 정보 검색에 집중
global 모드: 전역 지식을 활용한 검색
hybrid 모드: 로컬 및 글로벌 검색 방법 결합
mix 모드: 지식 그래프와 벡터 검색을 통합하여 가장 포괄적인 답변 제공

🎯 지식 그래프 구축

문서에서 엔터티 및 관계 자동 추출
지식 그래프의 시각화 지원
엔터티 및 관계의 추가, 삭제, 수정, 검색 기능 제공
엔터티 병합 및 중복 제거 지원

🚀 유연한 모델 지원

OpenAI 모델: GPT-4 등 OpenAI 시리즈 모델 지원
Hugging Face 모델: 로컬에 배포된 오픈 소스 모델 지원
Ollama 모델: 로컬에서 실행되는 양자화 모델 지원
LlamaIndex 통합: LlamaIndex를 통해 더 많은 모델 제공업체 지원

📊 다양한 스토리지 백엔드

벡터 데이터베이스: Faiss, PGVector 등 지원
그래프 데이터베이스: Neo4j, PostgreSQL+Apache AGE 지원
기본 스토리지: 내장된 NetworkX 그래프 스토리지

설치 방법

PyPI에서 설치

pip install "lightrag-hku[api]"

소스 코드에서 설치

# Python 가상 환경 생성 (필요한 경우)
# 편집 가능한 모드로 설치, API 지원 포함
pip install -e ".[api]"

기본 사용 예제

초기화 및 쿼리

import os
import asyncio
from lightrag import LightRAG, QueryParam
from lightrag.llm.openai import gpt_4o_mini_complete, openai_embed
from lightrag.kg.shared_storage import initialize_pipeline_status
from lightrag.utils import setup_logger

setup_logger("lightrag", level="INFO")

async def initialize_rag():
    rag = LightRAG(
        working_dir="your/path",
        embedding_func=openai_embed,
        llm_model_func=gpt_4o_mini_complete
    )
    await rag.initialize_storages()
    await initialize_pipeline_status()
    return rag

def main():

    rag = asyncio.run(initialize_rag())
    

    rag.insert("Your text")
    

    result = rag.query(
        "What are the top themes in this story?",
        param=QueryParam(mode="mix")
    )
    print(result)

if __name__ == "__main__":
    main()

고급 기능

대화 기록 지원

# Create conversation history
conversation_history = [
    {"role": "user", "content": "What is the main character's attitude towards Christmas?"},
    {"role": "assistant", "content": "At the beginning of the story, Ebenezer Scrooge has a very negative attitude towards Christmas..."},
    {"role": "user", "content": "How does his attitude change?"}
]

# Create query parameters with conversation history
query_param = QueryParam(
    mode="mix",  # or any other mode: "local", "global", "hybrid"
    conversation_history=conversation_history,  # Add the conversation history
    history_turns=3  # Number of recent conversation turns to consider
)

# Make a query that takes into account the conversation history
response = rag.query(
    "What causes this change in his character?",
    param=query_param
)

지식 그래프 관리

# Create new entity
entity = rag.create_entity("Google", {
    "description": "Google is a multinational technology company specializing in internet-related services and products.",
    "entity_type": "company"
})

# Create another entity
product = rag.create_entity("Gmail", {
    "description": "Gmail is an email service developed by Google.",
    "entity_type": "product"
})

# Create relation between entities
relation = rag.create_relation("Google", "Gmail", {
    "description": "Google develops and operates Gmail.",
    "keywords": "develops operates service",
    "weight": 2.0
})

LightRAG Server

Web UI 기능

LightRAG Server는 다음과 같은 기능을 포함하는 완전한 Web 인터페이스를 제공합니다.

문서 색인 관리
지식 그래프 시각화
간단한 RAG 쿼리 인터페이스
중력 레이아웃, 노드 쿼리, 서브 그래프 필터링 등 기능 지원

API 인터페이스

RESTful API 인터페이스 제공
Ollama API 형식과 호환
AI 챗봇 통합 지원 (예: Open WebUI)

구성 매개변수

핵심 매개변수

working_dir: 작업 디렉토리 경로
embedding_func: 임베딩 함수
llm_model_func: 대규모 언어 모델 함수
vector_storage: 벡터 스토리지 유형
graph_storage: 그래프 스토리지 유형

성능 튜닝 매개변수

embedding_batch_size: 임베딩 배치 크기 (기본값 32)
embedding_func_max_async: 최대 동시 임베딩 프로세스 수 (기본값 16)
llm_model_max_async: 최대 동시 LLM 프로세스 수 (기본값 4)
enable_llm_cache: LLM 캐시 활성화 여부 (기본값 True)

데이터 내보내기 및 백업

다양한 형식의 데이터 내보내기 지원:

#Export data in CSV format
rag.export_data("graph_data.csv", file_format="csv")

# Export data in Excel sheet
rag.export_data("graph_data.xlsx", file_format="excel")

# Export data in markdown format
rag.export_data("graph_data.md", file_format="md")

# Export data in Text
rag.export_data("graph_data.txt", file_format="txt")

토큰 사용량 추적

내장된 토큰 소비 모니터링 도구:

from lightrag.utils import TokenTracker

# Create TokenTracker instance
token_tracker = TokenTracker()

# Method 1: Using context manager (Recommended)
# Suitable for scenarios requiring automatic token usage tracking
with token_tracker:
    result1 = await llm_model_func("your question 1")
    result2 = await llm_model_func("your question 2")

# Method 2: Manually adding token usage records
# Suitable for scenarios requiring more granular control over token statistics
token_tracker.reset()

rag.insert()

rag.query("your question 1", param=QueryParam(mode="naive"))
rag.query("your question 2", param=QueryParam(mode="mix"))

# Display total token usage (including insert and query operations)
print("Token usage:", token_tracker.get_usage())

적용 분야

기업 지식 관리

내부 문서 검색 및 질의응답
지식 베이스 구축 및 유지 관리
기술 문서 지능형 도우미

학술 연구

문헌 검색 및 분석
지식 그래프 구축 연구
RAG 시스템 성능 평가

콘텐츠 제작

작문 보조 및 자료 검색
다중 문서 콘텐츠 통합
지능형 콘텐츠 추천

프로젝트 장점

쉬운 통합: 간단한 Python API 및 REST API 제공
높은 사용자 정의: 다양한 모델 및 스토리지 백엔드 지원
성능 최적화: 배치 처리 및 비동기 처리 지원
시각화: 내장된 지식 그래프 시각화 기능
엔터프라이즈급: PostgreSQL 등 엔터프라이즈급 데이터베이스 지원

요약

LightRAG는 포괄적인 기능과 사용하기 쉬운 RAG 프레임워크로, 특히 지능형 질의응답 시스템 및 지식 관리 플랫폼을 구축해야 하는 시나리오에 적합합니다. 유연한 아키텍처 설계와 풍부한 기능은 RAG 분야에서 뛰어난 오픈 소스 솔루션으로 만듭니다.