infiniflow/ragflowView GitHub Homepage for Latest Official Releases

An open-source Retrieval-Augmented Generation engine based on deep document understanding, providing accurate and reliable question answering capabilities for businesses of all sizes.

Apache-2.0TypeScriptragflowinfiniflow 61.9k Last Updated: August 07, 2025

RAGFlow - Open-Source RAG Engine Based on Deep Document Understanding

Project Overview

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. It provides streamlined RAG workflows for businesses of all sizes, combining large language models (LLMs) to deliver factual and reliable question answering, and providing verifiable citations from data in various complex formats.

Core Features

🧠 Deep Document Understanding

Knowledge extraction from unstructured data in complex formats
Precise search in "data needle-in-a-haystack" scenarios with unlimited tokens
Intelligent and explainable processing results

📄 Multi-Format Document Support

Supported Formats: Word documents, PPT presentations, Excel spreadsheets, text files, images, scans, structured data, web pages, etc.
Processing Capabilities: Multi-modal models support image understanding in PDF or DOCX files
Visual Chunking: Text chunking visualization, allowing for manual intervention and optimization

🎯 Precise Retrieval and Citation

Provides quick access to key references
Traceable source citations support fact-based answers
Multiple recall strategies combined with fusion re-ranking
Keyword extraction and related question generation to improve retrieval accuracy

🔧 Flexible Configuration

Configurable LLMs and embedding models
Rich template options
Multiple configuration options for knowledge graph extraction and application
Supports text-to-SQL conversion

🌐 Enterprise-Grade Applications

Streamlined RAG orchestration for individuals and large enterprises
Intuitive API for seamless integration with business systems
Combines with internet search (Tavily) to support deep research reasoning for any LLM

System Architecture

RAGFlow adopts a modular design, mainly including the following components:

Frontend Interface: React-based user interface
Backend Service: Python-built core processing engine
Document Processing Engine: DeepDoc deep document understanding module
Vector Storage: Supports Elasticsearch and Infinity
Data Storage: MySQL, Redis, MinIO, etc.
Model Service: Supports various LLMs and embedding models

Technical Requirements

Minimum System Configuration

CPU: ≥ 4 cores
Memory: ≥ 16 GB
Disk Space: ≥ 50 GB
Docker: ≥ 24.0.0
Docker Compose: ≥ v2.26.1

Supported Platforms

Primarily supports x86 platforms
ARM64 platforms require building Docker images independently

Installation and Deployment

Docker Quick Deployment

# Clone the repository
git clone https://github.com/infiniflow/ragflow.git

# Enter the docker directory
cd ragflow/docker

# Start the service (CPU version)
docker compose -f docker-compose.yml up -d

# Start the service (GPU accelerated version)
docker compose -f docker-compose-gpu.yml up -d

Image Version Description

Image Tag	Size	Includes Embedding Model	Stability
v0.18.0	~9GB	✔️	Stable Version
v0.18.0-slim	~2GB	❌	Stable Version
nightly	~9GB	✔️	Development Version
nightly-slim	~2GB	❌	Development Version

Source Code Development Deployment

Supports starting the development environment from source code, including Python environment configuration, dependent service startup, frontend and backend service startup, and other complete processes.

Configuration Management

The system is managed through the following configuration files:

.env: Basic system configuration (HTTP port, database password, etc.)
service_conf.yaml.template: Backend service configuration
docker-compose.yml: Docker container orchestration configuration

Application Scenarios

Enterprise Knowledge Management: Building internal knowledge base question answering systems
Document Intelligent Analysis: Intelligent parsing and querying of complex format documents
Customer Service: Intelligent customer service systems based on enterprise documents
Research Assistance: Intelligent retrieval of academic literature and research materials
Data Analysis: Unified querying of structured and unstructured data

RAGFlow, with its powerful document understanding capabilities and flexible configuration options, provides a reliable RAG solution for various industries, making it an ideal choice for building intelligent question answering systems.