A document understanding and semantic retrieval framework based on large language models, designed for enterprise knowledge bases and RAG applications.
WeKnora - Tencent Open-Source Enterprise-Grade Document Q&A Framework
Project Overview
WeKnora is an open-source document understanding and semantic retrieval framework developed by Tencent, based on Large Language Models (LLMs). It is specifically designed for document scenarios with complex structures and heterogeneous content. The framework adopts a modular architecture, integrating multimodal preprocessing, semantic vector indexing, intelligent retrieval, and large model generative inference to build an efficient and controllable document Q&A process.
Official Website: https://weknora.weixin.qq.com GitHub Address: https://github.com/Tencent/WeKnora Open Source License: MIT License
Core Features
🔍 Accurate Understanding
- Supports structured content extraction from various document formats such as PDF, Word, and images.
- Uniformly constructs semantic views, supporting mixed text and image layouts and OCR text recognition.
- Intelligent document parsing to handle complex structures and heterogeneous content.
🧠 Intelligent Reasoning
- Based on RAG (Retrieval-Augmented Generation) technology.
- Leverages Large Language Models to understand document context and user intent.
- Supports precise Q&A and multi-turn conversations.
🔧 Flexible Extensibility
- Decoupling of the entire process from parsing, embedding, retrieval to generation.
- Modular design, allowing each component to be flexibly configured and extended.
- Easy integration and custom development.
⚡ Efficient Retrieval
- Combines multiple retrieval strategies: keyword, vector, knowledge graph.
- Supports retrieval mechanisms such as BM25, Dense Retrieve, and GraphRAG.
- Allows free combination of retrieval-reranking-generation pipelines.
🎯 Easy to Use
- Intuitive Web interface and standard RESTful API.
- Quick start with zero technical barrier.
- Drag-and-drop document upload, one-click service deployment.
🔒 Secure and Controllable
- Supports localized and private cloud deployment.
- Complete autonomy and control over data.
- Meets enterprise-grade security requirements.
Application Scenarios
Application Scenario | Specific Application | Core Value |
---|---|---|
Enterprise Knowledge Management | Internal document retrieval, Q&A on rules and regulations, operation manual queries | Improves knowledge lookup efficiency, reduces training costs |
Scientific Literature Analysis | Paper retrieval, research report analysis, academic material organization | Accelerates literature review, assists research decisions |
Product Technical Support | Product manual Q&A, technical document retrieval, troubleshooting | Enhances customer service quality, reduces technical support burden |
Legal Compliance Review | Contract clause retrieval, regulatory policy queries, case analysis | Improves compliance efficiency, reduces legal risks |
Medical Knowledge Assistance | Medical literature retrieval, treatment guideline queries, case analysis | Aids clinical decision-making, enhances diagnosis and treatment quality |
Functional Modules Explained
Document Processing Capabilities
- Supported Formats: PDF, Word, Txt, Markdown, Images (including OCR and Caption)
- Intelligent Parsing: Automatically identifies document structure and extracts core content.
- Multimodal Processing: Unified understanding of mixed text and image content.
Vectorization and Retrieval
- Embedding Models: Supports local models, BGE, GTE API, etc.
- Vector Databases: PostgreSQL (pgvector), Elasticsearch.
- Retrieval Strategies: BM25 sparse retrieval, Dense Retrieve, GraphRAG knowledge graph retrieval.
Large Model Integration
- Model Support: Mainstream large models such as Qwen (Tongyi Qianwen), DeepSeek.
- Deployment Methods: Local deployment (Ollama) or external API calls.
- Inference Modes: Supports switching between thinking/non-thinking modes.
Knowledge Graph Functionality
WeKnora supports converting documents into knowledge graphs, illustrating the relationships between different paragraphs within a document. When the knowledge graph function is enabled, the system analyzes and constructs an internal semantic association network of the document, which not only helps users understand the document content but also provides structured support for indexing and retrieval.
Technical Architecture
Project Structure
WeKnora/
├── cmd/ # Application entry point
├── internal/ # Core business logic
├── config/ # Configuration files
├── migrations/ # Database migration scripts
├── scripts/ # Startup and utility scripts
├── services/ # Implementations of various sub-services
├── frontend/ # Frontend project
└── docs/ # Project documentation
Core Modules
- Document Parsing Module: Extracts and structures content from various document formats.
- Vectorization Processing Module: Converts document content into semantic vectors.
- Retrieval Engine Module: Implements multi-strategy retrieval and recall.
- Large Model Inference Module: Generates intelligent answers based on context.
Quick Start
Environment Requirements
- Docker
- Docker Compose
- Git
Installation Steps
Clone the repository
git clone https://github.com/Tencent/WeKnora.git cd WeKnora
Configure environment variables
cp .env.example .env # Edit the .env file and fill in relevant configurations according to the comments
Start services
# One-command to start all services ./scripts/start_all.sh # Or use the make command make start-all
Access the service After successful startup, you can access the following addresses:
- Web UI: http://localhost
- Backend API: http://localhost:8080
- Distributed Tracing (Jaeger): http://localhost:16686
Stop services
./scripts/start_all.sh --stop
# Or
make stop-all
WeChat Ecosystem Integration
As a core technical framework of the WeChat Conversational AI Platform, WeKnora provides the following capabilities:
- Zero-code Deployment: Simply upload knowledge to quickly deploy intelligent Q&A services within the WeChat ecosystem.
- Efficient Question Management: Supports independent classification and management of high-frequency questions.
- WeChat Ecosystem Coverage: Seamless integration into WeChat Official Accounts, Mini Programs, and other WeChat scenarios.
API Interface
WeKnora provides a complete RESTful API interface, supporting:
- Document upload and management
- Knowledge base operations
- Q&A queries
- System configuration
For detailed API documentation, please refer to: API Documentation
Development and Contribution
Contribution Types
- 🐛 Bug Fixes: Discover and fix system defects.
- ✨ New Features: Propose and implement new functionalities.
- 📚 Documentation Improvements: Enhance project documentation.
- 🧪 Test Cases: Write unit tests and integration tests.
- 🎨 UI/UX Optimization: Improve user interface and experience.
Development Guidelines
- Follow Go Code Review Comments.
- Use
gofmt
to format code. - Add necessary unit tests.
- Update relevant documentation.
- Adhere to Conventional Commits specification.
Submission Process
- Fork the project to your personal GitHub account.
- Create a feature branch:
git checkout -b feature/amazing-feature
. - Commit your changes:
git commit -m 'Add amazing feature'
. - Push to the branch:
git push origin feature/amazing-feature
. - Create a Pull Request and describe the changes in detail.
Advantages
- Enterprise-grade Stability: Developed by Tencent team, validated in large-scale production environments.
- Out-of-the-box: One-click Docker deployment, intuitive Web interface operation.
- Advanced Technology: Based on the latest RAG technology and large model capabilities.
- Highly Customizable: Modular design, supporting flexible extension and integration.
- Data Security: Supports private deployment, with complete autonomy and control over data.
- Ecosystem Integration: Deeply integrated with the WeChat ecosystem, supporting multi-scenario applications.
Summary
WeKnora is a powerful and technologically advanced enterprise-grade document Q&A framework. It not only provides a complete RAG technology stack but also boasts excellent usability and extensibility. Whether for enterprise internal knowledge management, scientific literature analysis, or customer service support, WeKnora can provide efficient and accurate solutions.
By being open-sourced, WeKnora offers a high-quality starting point for a wide range of developers and enterprises, making the construction of intelligent document Q&A systems simple and efficient.