A document understanding and semantic retrieval framework based on large language models, designed for enterprise knowledge bases and RAG applications.

NOASSERTIONGoWeKnoraTencent 5.2k Last Updated: September 11, 2025

WeKnora - Tencent Open-Source Enterprise-Grade Document Q&A Framework

Project Overview

WeKnora is an open-source document understanding and semantic retrieval framework developed by Tencent, based on Large Language Models (LLMs). It is specifically designed for document scenarios with complex structures and heterogeneous content. The framework adopts a modular architecture, integrating multimodal preprocessing, semantic vector indexing, intelligent retrieval, and large model generative inference to build an efficient and controllable document Q&A process.

Official Website: https://weknora.weixin.qq.com GitHub Address: https://github.com/Tencent/WeKnora Open Source License: MIT License

Core Features

🔍 Accurate Understanding

  • Supports structured content extraction from various document formats such as PDF, Word, and images.
  • Uniformly constructs semantic views, supporting mixed text and image layouts and OCR text recognition.
  • Intelligent document parsing to handle complex structures and heterogeneous content.

🧠 Intelligent Reasoning

  • Based on RAG (Retrieval-Augmented Generation) technology.
  • Leverages Large Language Models to understand document context and user intent.
  • Supports precise Q&A and multi-turn conversations.

🔧 Flexible Extensibility

  • Decoupling of the entire process from parsing, embedding, retrieval to generation.
  • Modular design, allowing each component to be flexibly configured and extended.
  • Easy integration and custom development.

⚡ Efficient Retrieval

  • Combines multiple retrieval strategies: keyword, vector, knowledge graph.
  • Supports retrieval mechanisms such as BM25, Dense Retrieve, and GraphRAG.
  • Allows free combination of retrieval-reranking-generation pipelines.

🎯 Easy to Use

  • Intuitive Web interface and standard RESTful API.
  • Quick start with zero technical barrier.
  • Drag-and-drop document upload, one-click service deployment.

🔒 Secure and Controllable

  • Supports localized and private cloud deployment.
  • Complete autonomy and control over data.
  • Meets enterprise-grade security requirements.

Application Scenarios

Application Scenario Specific Application Core Value
Enterprise Knowledge Management Internal document retrieval, Q&A on rules and regulations, operation manual queries Improves knowledge lookup efficiency, reduces training costs
Scientific Literature Analysis Paper retrieval, research report analysis, academic material organization Accelerates literature review, assists research decisions
Product Technical Support Product manual Q&A, technical document retrieval, troubleshooting Enhances customer service quality, reduces technical support burden
Legal Compliance Review Contract clause retrieval, regulatory policy queries, case analysis Improves compliance efficiency, reduces legal risks
Medical Knowledge Assistance Medical literature retrieval, treatment guideline queries, case analysis Aids clinical decision-making, enhances diagnosis and treatment quality

Functional Modules Explained

Document Processing Capabilities

  • Supported Formats: PDF, Word, Txt, Markdown, Images (including OCR and Caption)
  • Intelligent Parsing: Automatically identifies document structure and extracts core content.
  • Multimodal Processing: Unified understanding of mixed text and image content.

Vectorization and Retrieval

  • Embedding Models: Supports local models, BGE, GTE API, etc.
  • Vector Databases: PostgreSQL (pgvector), Elasticsearch.
  • Retrieval Strategies: BM25 sparse retrieval, Dense Retrieve, GraphRAG knowledge graph retrieval.

Large Model Integration

  • Model Support: Mainstream large models such as Qwen (Tongyi Qianwen), DeepSeek.
  • Deployment Methods: Local deployment (Ollama) or external API calls.
  • Inference Modes: Supports switching between thinking/non-thinking modes.

Knowledge Graph Functionality

WeKnora supports converting documents into knowledge graphs, illustrating the relationships between different paragraphs within a document. When the knowledge graph function is enabled, the system analyzes and constructs an internal semantic association network of the document, which not only helps users understand the document content but also provides structured support for indexing and retrieval.

Technical Architecture

Project Structure

WeKnora/
├── cmd/           # Application entry point
├── internal/      # Core business logic
├── config/        # Configuration files
├── migrations/    # Database migration scripts
├── scripts/       # Startup and utility scripts
├── services/      # Implementations of various sub-services
├── frontend/      # Frontend project
└── docs/          # Project documentation

Core Modules

  1. Document Parsing Module: Extracts and structures content from various document formats.
  2. Vectorization Processing Module: Converts document content into semantic vectors.
  3. Retrieval Engine Module: Implements multi-strategy retrieval and recall.
  4. Large Model Inference Module: Generates intelligent answers based on context.

Quick Start

Environment Requirements

  • Docker
  • Docker Compose
  • Git

Installation Steps

  1. Clone the repository

    git clone https://github.com/Tencent/WeKnora.git
    cd WeKnora
    
  2. Configure environment variables

    cp .env.example .env
    # Edit the .env file and fill in relevant configurations according to the comments
    
  3. Start services

    # One-command to start all services
    ./scripts/start_all.sh
    # Or use the make command
    make start-all
    
  4. Access the service After successful startup, you can access the following addresses:

Stop services

./scripts/start_all.sh --stop
# Or
make stop-all

WeChat Ecosystem Integration

As a core technical framework of the WeChat Conversational AI Platform, WeKnora provides the following capabilities:

  • Zero-code Deployment: Simply upload knowledge to quickly deploy intelligent Q&A services within the WeChat ecosystem.
  • Efficient Question Management: Supports independent classification and management of high-frequency questions.
  • WeChat Ecosystem Coverage: Seamless integration into WeChat Official Accounts, Mini Programs, and other WeChat scenarios.

API Interface

WeKnora provides a complete RESTful API interface, supporting:

  • Document upload and management
  • Knowledge base operations
  • Q&A queries
  • System configuration

For detailed API documentation, please refer to: API Documentation

Development and Contribution

Contribution Types

  • 🐛 Bug Fixes: Discover and fix system defects.
  • ✨ New Features: Propose and implement new functionalities.
  • 📚 Documentation Improvements: Enhance project documentation.
  • 🧪 Test Cases: Write unit tests and integration tests.
  • 🎨 UI/UX Optimization: Improve user interface and experience.

Development Guidelines

Submission Process

  1. Fork the project to your personal GitHub account.
  2. Create a feature branch: git checkout -b feature/amazing-feature.
  3. Commit your changes: git commit -m 'Add amazing feature'.
  4. Push to the branch: git push origin feature/amazing-feature.
  5. Create a Pull Request and describe the changes in detail.

Advantages

  1. Enterprise-grade Stability: Developed by Tencent team, validated in large-scale production environments.
  2. Out-of-the-box: One-click Docker deployment, intuitive Web interface operation.
  3. Advanced Technology: Based on the latest RAG technology and large model capabilities.
  4. Highly Customizable: Modular design, supporting flexible extension and integration.
  5. Data Security: Supports private deployment, with complete autonomy and control over data.
  6. Ecosystem Integration: Deeply integrated with the WeChat ecosystem, supporting multi-scenario applications.

Summary

WeKnora is a powerful and technologically advanced enterprise-grade document Q&A framework. It not only provides a complete RAG technology stack but also boasts excellent usability and extensibility. Whether for enterprise internal knowledge management, scientific literature analysis, or customer service support, WeKnora can provide efficient and accurate solutions.

By being open-sourced, WeKnora offers a high-quality starting point for a wide range of developers and enterprises, making the construction of intelligent document Q&A systems simple and efficient.

Star History Chart