Home
Login

An open-source Retrieval-Augmented Generation engine based on deep document understanding, providing accurate and reliable question answering capabilities for businesses of all sizes.

Apache-2.0Python 57.0kinfiniflow Last Updated: 2025-06-19

RAGFlow - Open-Source RAG Engine Based on Deep Document Understanding

Project Overview

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. It provides streamlined RAG workflows for businesses of all sizes, combining large language models (LLMs) to deliver factual and reliable question answering, and providing verifiable citations from data in various complex formats.

Core Features

🧠 Deep Document Understanding

  • Knowledge extraction from unstructured data in complex formats
  • Precise search in "data needle-in-a-haystack" scenarios with unlimited tokens
  • Intelligent and explainable processing results

📄 Multi-Format Document Support

  • Supported Formats: Word documents, PPT presentations, Excel spreadsheets, text files, images, scans, structured data, web pages, etc.
  • Processing Capabilities: Multi-modal models support image understanding in PDF or DOCX files
  • Visual Chunking: Text chunking visualization, allowing for manual intervention and optimization

🎯 Precise Retrieval and Citation

  • Provides quick access to key references
  • Traceable source citations support fact-based answers
  • Multiple recall strategies combined with fusion re-ranking
  • Keyword extraction and related question generation to improve retrieval accuracy

🔧 Flexible Configuration

  • Configurable LLMs and embedding models
  • Rich template options
  • Multiple configuration options for knowledge graph extraction and application
  • Supports text-to-SQL conversion

🌐 Enterprise-Grade Applications

  • Streamlined RAG orchestration for individuals and large enterprises
  • Intuitive API for seamless integration with business systems
  • Combines with internet search (Tavily) to support deep research reasoning for any LLM

System Architecture

RAGFlow adopts a modular design, mainly including the following components:

  • Frontend Interface: React-based user interface
  • Backend Service: Python-built core processing engine
  • Document Processing Engine: DeepDoc deep document understanding module
  • Vector Storage: Supports Elasticsearch and Infinity
  • Data Storage: MySQL, Redis, MinIO, etc.
  • Model Service: Supports various LLMs and embedding models

Technical Requirements

Minimum System Configuration

  • CPU: ≥ 4 cores
  • Memory: ≥ 16 GB
  • Disk Space: ≥ 50 GB
  • Docker: ≥ 24.0.0
  • Docker Compose: ≥ v2.26.1

Supported Platforms

  • Primarily supports x86 platforms
  • ARM64 platforms require building Docker images independently

Installation and Deployment

Docker Quick Deployment

# Clone the repository
git clone https://github.com/infiniflow/ragflow.git

# Enter the docker directory
cd ragflow/docker

# Start the service (CPU version)
docker compose -f docker-compose.yml up -d

# Start the service (GPU accelerated version)
docker compose -f docker-compose-gpu.yml up -d

Image Version Description

Image Tag Size Includes Embedding Model Stability
v0.18.0 ~9GB ✔️ Stable Version
v0.18.0-slim ~2GB Stable Version
nightly ~9GB ✔️ Development Version
nightly-slim ~2GB Development Version

Source Code Development Deployment

Supports starting the development environment from source code, including Python environment configuration, dependent service startup, frontend and backend service startup, and other complete processes.

Configuration Management

The system is managed through the following configuration files:

  • .env: Basic system configuration (HTTP port, database password, etc.)
  • service_conf.yaml.template: Backend service configuration
  • docker-compose.yml: Docker container orchestration configuration

Application Scenarios

  • Enterprise Knowledge Management: Building internal knowledge base question answering systems
  • Document Intelligent Analysis: Intelligent parsing and querying of complex format documents
  • Customer Service: Intelligent customer service systems based on enterprise documents
  • Research Assistance: Intelligent retrieval of academic literature and research materials
  • Data Analysis: Unified querying of structured and unstructured data

RAGFlow, with its powerful document understanding capabilities and flexible configuration options, provides a reliable RAG solution for various industries, making it an ideal choice for building intelligent question answering systems.