gabrielchua/RAGxplorerPlease refer to the latest official releases for information GitHub Homepage
An open-source RAG visualization tool that helps users intuitively understand and debug Retrieval Augmented Generation systems.
MITJupyter Notebook 1.1kgabrielchuaRAGxplorer Last Updated: 2025-01-03
RAGxplorer - Open-source RAG Visualization Tool 🔮
Project Overview
RAGxplorer is an open-source tool specifically designed for visualizing Retrieval Augmented Generation (RAG) systems. Developed by Gabriel Chua, this project aims to help developers and researchers better understand and debug the document retrieval and semantic similarity matching processes within RAG applications.
Key Features
1. Document Processing and Loading
- PDF Document Support: Directly load PDF files for processing
- Document Chunking: Automatically split documents into text chunks suitable for vectorization
- Multiple Document Format Support: Extended support for various document formats
2. Vector Embedding Visualization
- Embedding Space Visualization: Visualize the representation of document chunks in vector space
- Semantic Similarity Exploration: Intuitively display semantic relationships between document chunks
- Multiple Embedding Model Support: Support for various pre-trained embedding models
3. Query Visualization
- Query Matching Visualization: Display the matching process between queries and document chunks
- Similarity Score Display: Intuitively show the relevance scores of retrieval results
- Interactive Query: Support real-time querying and result visualization
Technical Features
Core Technology Stack
- Python: Primary development language
- Streamlit: Web interface framework
- Vector Embeddings: Supports various embedding models
- Visualization Libraries: Used for data visualization and interaction
Supported Embedding Models
thenlper/gte-large
: Default recommended model- Other Hugging Face Models: Supports custom embedding models
Installation and Usage
Installation Method
pip install ragxplorer
Basic Usage Example
from ragxplorer import RAGxplorer
# Initialize the client
client = RAGxplorer(embedding_model="thenlper/gte-large")
# Load PDF document
client.load_pdf("presentation.pdf", verbose=True)
# Visualize query results
client.visualize_query("What are the top revenue drivers for Microsoft?")
Quick Start
The project provides complete Jupyter notebook tutorials:
Online Demo
Application Scenarios
1. RAG System Debugging
- Retrieval Quality Assessment: Evaluate the accuracy and relevance of document retrieval
- Parameter Tuning: Adjust RAG system parameters through visualization results
- Performance Analysis: Analyze system performance under different configurations
2. Education and Research
- RAG Concept Teaching: Help learners understand RAG's working principles
- Academic Research: Provide visualization tools for RAG-related research
- Prototype Development: Quickly validate RAG system designs
3. Enterprise Applications
- Document Search Optimization: Optimize internal enterprise document search systems
- Knowledge Management: Visualize the organizational structure of enterprise knowledge bases
- Customer Service: Optimize RAG-based customer service systems
Project Advantages
1. Open Source and Community
- MIT License: Fully open-source, free to use and modify
- Community Support: Active developer community and contributors
- Continuous Updates: Regular updates and feature improvements
2. Ease of Use
- Simple API: Intuitive Python API design
- Web Interface: User-friendly interface based on Streamlit
- Detailed Documentation: Comprehensive usage tutorials and examples
3. Scalability
- Modular Design: Easy to extend and customize
- Multi-model Support: Supports various embedding models
- Plugin Mechanism: Can integrate other tools and libraries
Technical Architecture
Core Components
- Document Processor: Responsible for document loading and preprocessing
- Vectorization Engine: Handles text vectorization and embedding
- Visualization Engine: Generates interactive visualization interfaces
- Query Processor: Processes user queries and similarity calculations
Data Flow
- Document Input → Text Chunking → Vectorization → Storage
- Query Input → Vectorization → Similarity Calculation → Result Visualization
Summary
RAGxplorer is a powerful and easy-to-use RAG visualization tool, providing developers with an effective means to deeply understand and optimize RAG systems. Through its intuitive visualization interface, users can better debug and improve AI applications based on Retrieval Augmented Generation.