Home
Login

The Apache Doris MCP Server is a backend service based on Python and FastAPI that supports the Model Context Protocol (MCP) for connecting to the Apache Doris database, enabling Natural Language to SQL (NL2SQL), query execution, and metadata management.

Apache-2.0Python 94apachedoris-mcp-server Last Updated: 2025-06-23

Apache Doris MCP Server Project Detailed Introduction

Project Overview

Apache Doris MCP (Model Context Protocol) Server is a backend service built on Python and FastAPI, designed to seamlessly integrate with the Apache Doris database through the Model Context Protocol (MCP). It provides standardized interfaces for AI agents and clients, supporting Natural Language to SQL (NL2SQL), SQL query execution, metadata management, and data analysis. This project is a significant extension of the Apache Doris community, particularly suitable for business scenarios requiring efficient data interaction and intelligent analysis.

Since the release of version 0.3.0 in 2025, the project has undergone a major architectural update, migrating from SSE (Server-Sent Events) based communication to a unified Streamable HTTP protocol, and adding enterprise-grade security frameworks and performance optimization features. The core goal of the project is to bridge AI systems with the Apache Doris database through the MCP protocol, enabling intelligent data exploration and analysis.

Core Features

Apache Doris MCP Server provides the following main features:

  1. Natural Language to SQL (NL2SQL)

    • Supports generating SQL queries from natural language input, leveraging Large Language Models (LLMs) for intelligent query generation.
    • Suitable for scenarios where business users need to query data without writing complex SQL.
  2. SQL Query Execution

    • Supports direct execution of SQL commands through the exec_query tool, with customizable database selection, row limits, and timeout settings.
    • Built-in security checks (such as SQL injection protection) and automatic LIMIT addition ensure query security and efficiency.
  3. Metadata Management

    • Provides rich metadata extraction tools, including:
      • Listing all databases and tables (get_all_databases, get_database_tables).
      • Retrieving table schema, comments, and index information (get_table_schema, get_table_comment, get_column_comments, get_table_indexes).
      • Supporting multi-catalog discovery (get_catalog_list).
  4. Audit Log Query

    • Retrieves recent audit records through get_recent_audit_logs, supporting custom time ranges and record limits.
  5. Enterprise-Grade Security Framework

    • Supports multiple authentication methods (Token, Basic Auth, OAuth).
    • Role-Based Access Control (RBAC) providing four security levels.
    • Built-in SQL injection protection, query validation, and data masking features.
  6. Performance Optimization

    • Query execution optimization, improving performance through enhanced caching mechanisms and connection pool management.
    • Supports performance monitoring and statistical analysis (performance_stats).
  7. Multiple Communication Modes

    • Streamable HTTP: Supports request/response and streaming through a unified /mcp endpoint (src/streamable_server.py).
    • SSE (Deprecated): Earlier versions supported communication through /sse and /mcp/messages endpoints (src/sse_server.py).
    • Stdio (Optional): Interacts through standard input/output (src/stdio_server.py).
  8. Experimental Features

    • Column statistics analysis (column_analysis), providing data insights.
    • Catalog federation support, suitable for multi-catalog environments.

Technical Architecture

The project's architecture is modular and efficient, mainly including the following components:

  • Core Framework: Built on Python 3.12 and FastAPI, providing high-performance API services.
  • MCP Protocol Implementation:
    • Provides standardized tool invocation, resource management, and prompt interaction interfaces.
    • Handles all requests through a unified /mcp endpoint, simplifying integration.
  • Database Interaction:
    • doris_mcp_server/utils/db.py: Provides database connection (get_db_connection) and query execution (execute_query, execute_query_df) functions.
    • doris_mcp_server/utils/schema_extractor.py: The MetadataExtractor class is responsible for metadata extraction, including caching mechanisms.
    • doris_mcp_server/utils/sql_executor_tools.py: The execute_sql_query function encapsulates query logic, including security checks and result serialization.
  • Security Management:
    • Supports multiple authentication methods and RBAC.
    • Provides SQL injection protection, query validation, and audit logging.
  • Deployment Support:
    • Docker image-based deployment, unified port configuration (3000, 3001, 3002).
    • Environment variable configuration (e.g., DB_HOST, DB_PORT, DB_USER, DB_PASSWORD).

Architectural Changes: Version 0.3.0 removed approximately 300 lines of legacy SSE code, migrated to Streamable HTTP, unified tool naming (removed the mcp_doris_ prefix), and improved modularity.

Installation and Usage

Environment Requirements

  • Python 3.12 or above
  • Apache Doris database (connected via MySQL protocol)
  • Package management tools (such as uv or pip)
  • Optional: Docker (for containerized deployment)

Installation Steps

  1. Clone the Repository:

    git clone https://github.com/apache/doris-mcp-server.git
    cd doris-mcp-server
    
  2. Install Dependencies: Use uv (recommended) or pip to install:

    uv sync
    

    Or

    pip install -r requirements.txt
    
  3. Configure Environment Variables: Create a .env file or set environment variables directly:

    export DORIS_HOST=<doris-host>
    export DORIS_PORT=<port>
    export DORIS_USER=<doris-user>
    export DORIS_PASSWORD=<doris-pwd>
    export SERVER_PORT=3000
    
  4. Start the Server:

    uv run --with mcp-doris --python 3.13 mcp-doris
    

    Or

    python -m mcp_doris.mcp_server
    
  5. Verify Startup: After successful startup, you can interact through an MCP client (such as Cursor) or the built-in MCP browser (http://localhost:5173).

Test Connection

Run the test script to verify the database connection:

python src/doris-mcp-server/test.py

Expected output:

🚀 Doris MCP Server is starting...
[DorisConnector] Connected to 127.0.0.1:9030
✅ Database connection successful.
[DorisConnector] Connection closed.

Application Scenarios

  1. Real-time Data Analysis

    • Combined with NL2SQL functionality, business users can query real-time data through natural language to generate reports or dashboards.
    • Suitable for real-time decision support in industries such as retail, finance, and telecommunications.
  2. Metadata Exploration

    • Data engineers can quickly obtain database schema, table structure, and index information to accelerate data modeling and optimization.
  3. AI-Driven Business Intelligence

    • Integrate AI agents (such as Claude, Cursor) to implement automated data query and analysis workflows through the MCP protocol.
  4. Security Compliance Management

    • Enterprise-grade security framework and audit logging functions meet the data security and compliance requirements of industries such as finance and healthcare.
  5. Multi-Catalog Environment

    • Supports multi-catalog federation, suitable for metadata management and querying in complex data warehouse environments.

Project Advantages and Limitations

Advantages

  • Intelligent: NL2SQL and LLM integration lower the barrier to data querying.
  • Security: Multi-authentication, RBAC, and SQL protection ensure enterprise-grade security.
  • High Performance: Caching, connection pools, and query optimization improve efficiency.
  • Flexibility: Supports multiple communication modes and modular tool extensions.

Limitations

  • Early Version Stability: Some features (such as column_analysis) are experimental and may contain bugs.
  • Dependency on Doris Database: Primarily designed for Apache Doris, with limited compatibility with other MySQL protocol databases.
  • SSE Deprecation: Version 0.3.0 removed SSE support, requiring migration to Streamable HTTP.

Summary

Apache Doris MCP Server is a powerful and flexible tool that bridges the Apache Doris database with AI-driven analysis needs. Its NL2SQL, metadata management, security, and performance optimization features make it widely applicable in real-time data analysis, business intelligence, and enterprise-grade applications.

Star History Chart