Home
Login

An open-source Python framework based on RAG technology that enables natural language interaction with SQL databases, achieving accurate text-to-SQL query generation.

MITPython 18.3kvanna-ai Last Updated: 2025-04-10

Vanna.ai - Intelligent SQL Generation Framework Detailed Introduction

Project Overview

Vanna is an MIT-licensed open-source Python RAG (Retrieval-Augmented Generation) framework specifically designed for SQL generation and related functionalities. This project allows users to interact with SQL databases using natural language, enabling accurate text-to-SQL query generation.

GitHub Address: https://github.com/vanna-ai/vanna

Core Features

🤖 Intelligent Dialogue

  • Natural Language Interaction: Users can ask questions in plain language, and the system automatically generates corresponding SQL queries.
  • Real-time Feedback: Supports user feedback on generated results, continuously improving accuracy.
  • Self-Learning: Can automatically learn from successfully executed queries, improving the accuracy of future results.

📊 RAG Technology Architecture

Vanna is based on Retrieval-Augmented Generation (RAG) technology. The workflow is divided into two simple steps:

  1. Training Phase: Train the RAG "model" on your data.
  2. Query Phase: Ask a question and return a SQL query that can be automatically run on the database.

🔄 Wide Database Support

Supports various mainstream databases:

  • PostgreSQL
  • MySQL
  • Microsoft SQL Server
  • Oracle
  • Snowflake
  • BigQuery
  • ClickHouse
  • Apache Hive
  • PrestoDB
  • SQLite
  • DuckDB

Technical Advantages

Advantages Compared to Traditional Fine-tuning Methods

  • High Portability: Can be used across different LLMs.
  • Flexible Data Management: Easy to remove outdated training data.
  • Cost-Effective: Lower running costs than fine-tuning.
  • Future-Proof: Easily replaceable with better LLMs.

Security Assurance

  • Data Privacy: Database content is never sent to the LLM or vector database.
  • Local Execution: SQL execution is performed in the local environment.
  • Private Deployment: Supports fully private deployment.

Usage

Installation

pip install vanna

Basic Configuration

from vanna.openai.openai_chat import OpenAI_Chat
from vanna.chromadb.chromadb_vector import ChromaDB_VectorStore

class MyVanna(ChromaDB_VectorStore, OpenAI_Chat):
    def __init__(self, config=None):
        ChromaDB_VectorStore.__init__(self, config=config)
        OpenAI_Chat.__init__(self, config=config)

vn = MyVanna(config={'api_key': 'sk-...', 'model': 'gpt-4-...'})

Training the Model

The model can be trained in several ways:

DDL Statement Training:

vn.train(ddl="""
CREATE TABLE IF NOT EXISTS my-table (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    age INT
)
""")

Business Documentation Training:

vn.train(documentation="Our business defines XYZ as ...")

SQL Query Training:

vn.train(sql="SELECT name, age FROM my-table WHERE name = 'John Doe'")

Query Usage

vn.ask("What are the top 10 customers by sales?")

The system will return:

  • The generated SQL query statement
  • The execution result table
  • Automatically generated Plotly charts

User Interface Options

Vanna offers various user interfaces:

  • Jupyter Notebook: Suitable for data analysts and developers.
  • Web Application: Suitable for end-users.
  • Streamlit Application: For rapid prototyping.
  • Slackbot: For team collaboration.
  • Custom Frontend: Fully customized solutions.

Architecture Design

Vanna adopts a modular design, based on the abstract base class VannaBase, which can be easily extended to use different:

  • Large Language Models: OpenAI, Claude, local models, etc.
  • Vector Databases: ChromaDB, Pinecone, Weaviate, etc.
  • Database Connectors: Supports any SQL database that Python can connect to.

Application Scenarios

Business Analysis

  • Non-technical personnel can directly query data using natural language.
  • Quickly generate business reports and analyses.
  • Automate common data query tasks.

Development Efficiency

  • Accelerate the SQL development process.
  • Reduce the time to write complex queries.
  • Provide query optimization suggestions.

Education and Training

  • Help learn SQL syntax.
  • Understand complex query structures.
  • Learn database best practices.

Summary

Vanna.ai is a powerful and easy-to-use text-to-SQL framework. By combining RAG technology and large language models, it allows anyone to interact with databases using natural language. Whether you are a data analyst, developer, or business user, you can benefit from this tool and greatly improve the efficiency of data querying and analysis.