gabrielchua/RAGxplorerPlease refer to the latest official releases for information GitHub Homepage

An open-source RAG visualization tool that helps users intuitively understand and debug Retrieval Augmented Generation systems.

MITJupyter Notebook 1.1kgabrielchuaRAGxplorer Last Updated: 2025-01-03

RAGxplorer - Open-source RAG Visualization Tool 🔮

Project Overview

RAGxplorer is an open-source tool specifically designed for visualizing Retrieval Augmented Generation (RAG) systems. Developed by Gabriel Chua, this project aims to help developers and researchers better understand and debug the document retrieval and semantic similarity matching processes within RAG applications.

Key Features

1. Document Processing and Loading

PDF Document Support: Directly load PDF files for processing
Document Chunking: Automatically split documents into text chunks suitable for vectorization
Multiple Document Format Support: Extended support for various document formats

2. Vector Embedding Visualization

Embedding Space Visualization: Visualize the representation of document chunks in vector space
Semantic Similarity Exploration: Intuitively display semantic relationships between document chunks
Multiple Embedding Model Support: Support for various pre-trained embedding models

3. Query Visualization

Query Matching Visualization: Display the matching process between queries and document chunks
Similarity Score Display: Intuitively show the relevance scores of retrieval results
Interactive Query: Support real-time querying and result visualization

Technical Features

Core Technology Stack

Python: Primary development language
Streamlit: Web interface framework
Vector Embeddings: Supports various embedding models
Visualization Libraries: Used for data visualization and interaction

Supported Embedding Models

thenlper/gte-large: Default recommended model
Other Hugging Face Models: Supports custom embedding models

Installation and Usage

Installation Method

pip install ragxplorer

Basic Usage Example

from ragxplorer import RAGxplorer

# Initialize the client
client = RAGxplorer(embedding_model="thenlper/gte-large")

# Load PDF document
client.load_pdf("presentation.pdf", verbose=True)

# Visualize query results
client.visualize_query("What are the top revenue drivers for Microsoft?")

Quick Start

The project provides complete Jupyter notebook tutorials:

Online Demo

Application Scenarios

1. RAG System Debugging

Retrieval Quality Assessment: Evaluate the accuracy and relevance of document retrieval
Parameter Tuning: Adjust RAG system parameters through visualization results
Performance Analysis: Analyze system performance under different configurations

2. Education and Research

RAG Concept Teaching: Help learners understand RAG's working principles
Academic Research: Provide visualization tools for RAG-related research
Prototype Development: Quickly validate RAG system designs

3. Enterprise Applications

Document Search Optimization: Optimize internal enterprise document search systems
Knowledge Management: Visualize the organizational structure of enterprise knowledge bases
Customer Service: Optimize RAG-based customer service systems

Project Advantages

1. Open Source and Community

MIT License: Fully open-source, free to use and modify
Community Support: Active developer community and contributors
Continuous Updates: Regular updates and feature improvements

2. Ease of Use

Simple API: Intuitive Python API design
Web Interface: User-friendly interface based on Streamlit
Detailed Documentation: Comprehensive usage tutorials and examples

3. Scalability

Modular Design: Easy to extend and customize
Multi-model Support: Supports various embedding models
Plugin Mechanism: Can integrate other tools and libraries

Technical Architecture

Core Components

Document Processor: Responsible for document loading and preprocessing
Vectorization Engine: Handles text vectorization and embedding
Visualization Engine: Generates interactive visualization interfaces
Query Processor: Processes user queries and similarity calculations

Data Flow

Document Input → Text Chunking → Vectorization → Storage
Query Input → Vectorization → Similarity Calculation → Result Visualization

Summary

RAGxplorer is a powerful and easy-to-use RAG visualization tool, providing developers with an effective means to deeply understand and optimize RAG systems. Through its intuitive visualization interface, users can better debug and improve AI applications based on Retrieval Augmented Generation.