A multi-agent web search engine framework based on large language models, simulating human thinking processes to achieve deep AI search.
MindSearch - An Open-Source AI Multi-Agent Search Engine Framework
Project Overview
MindSearch is an open-source, Large Language Model (LLM)-based multi-agent web search engine framework designed to simulate human cognitive processes in web information search and integration. Developed jointly by Shanghai AI Lab and the University of Science and Technology of China, this project offers a search experience comparable to Perplexity.ai Pro and SearchGPT.
Core Features
🤔 Answer Any Question
MindSearch solves various problems you encounter in life through searching, capable of handling complex query requirements.
📚 Deep Knowledge Exploration
MindSearch provides broader and deeper answers by browsing hundreds of web pages. The system can process information from 300+ web pages in parallel within 3 minutes, equivalent to 3 hours of human expert work.
🔍 Transparent Solution Path
MindSearch provides the complete content, including thought processes and search keywords, enhancing the credibility and usability of responses.
💻 Multiple User Interfaces
Provides various interfaces for users, including React, Gradio, Streamlit, and local debugging.
🧠 Dynamic Graph Construction Process
MindSearch decomposes user queries into sub-problem nodes in a graph and gradually expands the graph based on the search results from WebSearcher.
Technical Architecture
Core Components
MindSearch adopts a multi-agent architecture, consisting of two main components: WebPlanner and WebSearcher.
WebPlanner
- Acts as a high-level planner, coordinating reasoning steps and multiple WebSearchers.
- Decomposes complex user queries into atomic sub-problems as nodes in a graph.
- Gradually expands the graph based on WebSearcher's search results.
- Focuses on query decomposition and analysis, undisturbed by lengthy search results.
WebSearcher
- Executes fine-grained web searches and summarizes valuable information back to the planner.
- Performs hierarchical information retrieval, processing each sub-problem.
- Includes 4 main steps: query rewriting, search content aggregation, detailed page selection, and final summarization.
Workflow
- Query Decomposition: WebPlanner decomposes complex queries into multiple sub-queries.
- Parallel Search: Multiple WebSearchers process different sub-queries in parallel.
- Information Integration: WebPlanner collects and integrates results from various WebSearchers.
- Dynamic Expansion: Dynamically adjusts and expands the search graph based on search results.
Technical Implementation
Supported Models
- Open-source Models: InternLM2.5-7b-chat (specially optimized)
- Closed-source Models: GPT-4, Claude, etc.
- Deployment Methods: Supports various deployment methods including local server, client, and HuggingFace.
Supported Search Engines
- DuckDuckGo Search (no API key required)
- Bing Search
- Brave Search
- Google Serper
- Tencent Search
Frontend Interfaces
- React: Modern web interface
- Gradio: Easy-to-use Python interface
- Streamlit: Data science-friendly interface
- Terminal: Command-line interface
Installation and Usage
Basic Installation
git clone https://github.com/InternLM/MindSearch
cd MindSearch
pip install -r requirements.txt
Environment Configuration
mv .env.example .env
# Edit the .env file to add API keys and model configurations
Start Service
python -m mindsearch.app --lang cn --model_format internlm_server --search_engine DuckDuckGoSearch --asy
Parameter Description
--lang
: Model language,cn
for Chinese,en
for English.--model_format
: Model format, e.g.,internlm_server
,gpt4
.--search_engine
: Type of search engine.--asy
: Deploy asynchronous agents.
React Frontend Startup
# Configure backend URL
HOST="127.0.0.1"
PORT=8002
sed -i -r "s/target:\s*\"\"/target: \"${HOST}:${PORT}\"/" frontend/React/vite.config.ts
# Install dependencies
cd frontend/React
npm install
npm start
Docker Deployment
The project provides the MSDL (MindSearch Docker Launcher) tool to simplify the Docker deployment process:
cd MindSearch/docker
# Run the interactive configuration tool
It supports both local and cloud model deployment, with GPU acceleration.
Performance Benchmarks
Benchmark Results
The performance of ChatGPT-Web, Perplexity.ai (Pro), and MindSearch was compared across three dimensions: depth, breadth, and accuracy of generated responses. The evaluation was based on 100 real-world problems meticulously designed by human experts and scored by 5 experts.
Key Advantages
- Efficiency Improvement: Processes 300+ web pages in 3 minutes, equivalent to 3 hours of human expert work.
- Quality Improvement: Significantly improves response quality in terms of depth and breadth.
- Competitiveness: MindSearch, based on InternLM2.5-7B, outperforms ChatGPT-Web and Perplexity.ai in response quality.
Project Highlights
Open-Source Advantages
- Fully Open-Source: All code is open-sourced under the Apache 2.0 license.
- Community-Driven: Active GitHub community and continuous updates.
- Customizability: Flexible configuration supporting various models and search engines.
Technical Innovations
- Multi-Agent Collaboration: Innovative WebPlanner + WebSearcher architecture.
- Dynamic Graph Construction: Graph construction method simulating human cognitive processes.
- Parallel Processing: Efficient parallel information retrieval and integration.
- Context Management: Intelligent long-context management mechanism.
Related Projects
MindSearch is an important component of the InternLM ecosystem, working synergistically with the following projects:
- Lagent: Lightweight LLM agent framework
- AgentLego: Multi-functional tool API library
- InternLM2.5: Optimized Large Language Model
- LMDeploy: Model deployment toolkit
Conclusion
MindSearch represents a significant breakthrough in the field of AI search engines. By simulating human cognitive processes, it achieves efficient and accurate web information search and integration. Its open-source nature and excellent performance make it an ideal choice for building custom AI search engines.