h2oai/h2ogptPlease refer to the latest official releases for information GitHub Homepage

A private, local GPT chat tool that supports various functions such as document Q&A, image and video processing, and 100% private deployment.

Apache-2.0Python 11.9kh2oaih2ogpt Last Updated: 2025-05-25

h2oGPT Project Detailed Introduction

Project Overview

h2oGPT is an open-source project developed by H2O.ai, aiming to provide a fully private, local GPT chat experience that supports various functionalities such as document Q&A, image and video processing. The project is based on the Apache V2 license, ensuring 100% private deployment and usage for users.

Project Address: https://github.com/h2oai/h2ogpt

Demo Address: https://gpt.h2o.ai/

Core Features

1. Document Processing Capabilities

h2oGPT supports private, offline databases for various document types, including PDF, Excel, Word, images, video frames, YouTube, audio, code, text, Markdown, and more. Key features include:

Persistent Database: Uses Chroma, Weaviate, or in-memory FAISS for document storage.
Accurate Embeddings: Supports embedding models like instructor-large, all-MiniLM-L6-v2.
Efficient Context Utilization: Uses instruction-tuned LLMs, eliminating the need for LangChain's few-shot methods.
Parallel Processing: Parallel summarization and extraction, with 13B LLaMa2 models achieving an output speed of 80 tokens per second.
HYDE Technology: Hypothetical Document Embedding technology based on LLM responses, enhancing retrieval capabilities.
Semantic Chunking: Better document segmentation (requires GPU support).

2. Model Support

h2oGPT supports various models, including LLaMa2, Mistral, Falcon, Vicuna, WizardLM, etc., and technologies like AutoGPTQ, 4-bit/8-bit quantization, and LORA:

GPU Support: From HuggingFace and LLaMa.cpp GGML models.
CPU Support: Uses HF, LLaMa.cpp, and GPT4ALL models.
Attention Mechanism: Supports arbitrary length generation (LLaMa-2, Mistral, MPT, Pythia, Falcon, etc.).

3. User Interface

Gradio UI: Provides an intuitive web interface with streaming output.
CLI: Command-line interface, supporting streaming for all models.
Document Upload and Viewing: Upload and view documents via the UI (supports multiple collaborative or personal collections).

4. Multimodal Capabilities

Vision Models

Supports vision models such as LLaVa, Claude-3, Gemini-Pro-Vision, GPT-4-Vision.

Image Generation

Supports image generation models like Stable Diffusion (sdxl-turbo, sdxl, SD3), PlaygroundAI (playv2), and Flux.

Speech Processing

STT (Speech-to-Text): Uses Whisper for streaming audio conversion.
TTS (Text-to-Speech):
- MIT-licensed Microsoft Speech T5, supporting multiple voices and streaming audio conversion.
- MPL2-licensed TTS, including voice cloning and streaming audio conversion.
AI Assistant Voice Control: Supports hands-free control for h2oGPT chat mode.

5. Enterprise-Grade Features

Authentication and State Management

UI Authentication: Authenticates via username/password or Google OAuth.
State Persistence: Maintains state in the UI via username/password.
Open Web UI Integration: Uses h2oGPT as a backend via an OpenAI proxy.

API and Integration

OpenAI Compatible API: h2oGPT can serve as an alternative to OpenAI servers.
Inference Server Support: Supports oLLaMa, HF TGI server, vLLM, Gradio, ExLLaMa, Replicate, Together.ai, OpenAI, Azure OpenAI, Anthropic, MistralAI, Google, and Groq.

Server Proxy API Features

Chat and text completion (streaming and non-streaming)
Audio transcription (STT)
Audio generation (TTS)
Image generation
Embeddings
Function tool calling and auto tool selection
AutoGen code execution agents

6. Advanced Features

JSON Mode and Structured Output

Strict Schema Control: Uses outlines with vLLM for strict schema control.
Multi-platform Support: Supports strict schema control for OpenAI, Anthropic, Google Gemini, and MistralAI models.
JSON Mode: Provides JSON mode for some older OpenAI or Gemini models.

Web Search and Agents

Web Search Integration: Web search integrated with chat and document Q&A.
Intelligent Agents: Supports agents for search, document Q&A, Python code, CSV frameworks, etc.
High-Quality Agents: Provides high-quality agents via an OpenAI proxy server on an independent port.
Code-First Agents: Generates charts, conducts research, evaluates images via vision models, etc.

Performance Evaluation

Reward Models: Uses reward models to evaluate performance.
Quality Assurance: Maintains quality through over 1000 unit tests and integration tests (over 24 GPU hours).

Installation and Deployment

Recommended Deployment Method

Docker is recommended for full-featured deployment on Linux, Windows, and Mac. Platform support:

Docker: Full functionality on Linux, Windows, Mac.
Linux Script: Full functionality.
Windows and Mac Scripts: Relatively limited functionality.

Supported Installation Methods

Docker Build and Run: For Linux, Windows, Mac.
Linux Install and Run: Native Linux support.
Windows 10/11 Installation Script: Windows platform support.
Mac Install and Run: macOS platform support.
Quick Start: For any platform.

Technical Specifications

Hardware Requirements

GPU Support: CUDA, AutoGPTQ, exllama.
CPU Support: Supports pure CPU operation.
Memory Optimization: Provides low-memory mode.

Offline Installation

Supports full offline installation.
Offline document processing capabilities.
Local model deployment.

Development and Extension

Development Environment

Follow installation instructions to create a development environment for training and generation.
Supports fine-tuning any LLM model on custom data.
Provides a complete test suite.

Testing

pip install requirements-parser pytest-instafail pytest-random-order playsound==1.3.0
conda install -c conda-forge gst-python -y
sudo apt-get install gstreamer-1.0
pip install pygame
GPT_H2O_AI=0 CONCURRENCY_COUNT=1 pytest --instafail -s -v tests

# For OpenAI server tests on an already running local server
pytest -s -v -n 4 openai_server/test_openai_server.py::test_openai_client

Client APIs

Gradio Client API
OpenAI Compatible Client API
Python Client Library

Technical Architecture

Core Technology Stack

Base Models: LLaMa2, Mistral, Falcon, etc.
Embedding Technology: instructor-large, all-MiniLM-L6-v2.
Vector Databases: Chroma, Weaviate, FAISS.
UI Framework: Gradio.
Backend Technology: Python, PyTorch, Transformers.

Data Processing Workflow

Document Ingestion: Uses advanced OCR technology (DocTR).
Document Segmentation: Semantic chunking technology.
Vectorization: Uses accurate embedding models.
Retrieval Augmentation: HYDE technology enhances retrieval.
Answer Generation: Context-based intelligent answering.

Business Applications

Enterprise-Grade Solutions

h2oGPT provides enterprise-grade generative AI solutions with key features:

Fully Private: 100% private deployment, data remains within the enterprise.
Scalability: Supports large-scale deployment.
Security: Enterprise-grade security assurance.
Customization: Supports model fine-tuning and customization.

Application Scenarios

Document Q&A System: Enterprise internal knowledge base Q&A.
Code Assistance: Code generation and review.
Data Analysis: CSV data processing and analysis.
Multimedia Processing: Image, video, audio processing.
Customer Service: Intelligent customer service system.

H2O.ai Ecosystem

h2oGPT is part of H2O.ai's complete AI platform. H2O.ai also offers:

H2O-3: Open-source machine learning platform.
H2O Driverless AI: World-leading AutoML platform.
H2O Hydrogen Torch: No-code deep learning platform.
Document AI: Document processing deep learning platform.
H2O MLOps: Model deployment and monitoring platform.
H2O Feature Store: Feature store platform.

Summary

h2oGPT is a powerful open-source private GPT solution, particularly suitable for enterprises and individual users who require full control over data privacy. It not only offers functionalities similar to commercial GPT services but also adds features like document processing, multimodal support, and enterprise-grade security, making it an ideal choice for building private AI applications.