A private, local GPT chat tool that supports various functions such as document Q&A, image and video processing, and 100% private deployment.
h2oGPT Project Detailed Introduction
Project Overview
h2oGPT is an open-source project developed by H2O.ai, aiming to provide a fully private, local GPT chat experience that supports various functionalities such as document Q&A, image and video processing. The project is based on the Apache V2 license, ensuring 100% private deployment and usage for users.
Project Address: https://github.com/h2oai/h2ogpt
Demo Address: https://gpt.h2o.ai/
Core Features
1. Document Processing Capabilities
h2oGPT supports private, offline databases for various document types, including PDF, Excel, Word, images, video frames, YouTube, audio, code, text, Markdown, and more. Key features include:
- Persistent Database: Uses Chroma, Weaviate, or in-memory FAISS for document storage.
- Accurate Embeddings: Supports embedding models like instructor-large, all-MiniLM-L6-v2.
- Efficient Context Utilization: Uses instruction-tuned LLMs, eliminating the need for LangChain's few-shot methods.
- Parallel Processing: Parallel summarization and extraction, with 13B LLaMa2 models achieving an output speed of 80 tokens per second.
- HYDE Technology: Hypothetical Document Embedding technology based on LLM responses, enhancing retrieval capabilities.
- Semantic Chunking: Better document segmentation (requires GPU support).
2. Model Support
h2oGPT supports various models, including LLaMa2, Mistral, Falcon, Vicuna, WizardLM, etc., and technologies like AutoGPTQ, 4-bit/8-bit quantization, and LORA:
- GPU Support: From HuggingFace and LLaMa.cpp GGML models.
- CPU Support: Uses HF, LLaMa.cpp, and GPT4ALL models.
- Attention Mechanism: Supports arbitrary length generation (LLaMa-2, Mistral, MPT, Pythia, Falcon, etc.).
3. User Interface
- Gradio UI: Provides an intuitive web interface with streaming output.
- CLI: Command-line interface, supporting streaming for all models.
- Document Upload and Viewing: Upload and view documents via the UI (supports multiple collaborative or personal collections).
4. Multimodal Capabilities
Vision Models
Supports vision models such as LLaVa, Claude-3, Gemini-Pro-Vision, GPT-4-Vision.
Image Generation
Supports image generation models like Stable Diffusion (sdxl-turbo, sdxl, SD3), PlaygroundAI (playv2), and Flux.
Speech Processing
- STT (Speech-to-Text): Uses Whisper for streaming audio conversion.
- TTS (Text-to-Speech):
- MIT-licensed Microsoft Speech T5, supporting multiple voices and streaming audio conversion.
- MPL2-licensed TTS, including voice cloning and streaming audio conversion.
- AI Assistant Voice Control: Supports hands-free control for h2oGPT chat mode.
5. Enterprise-Grade Features
Authentication and State Management
- UI Authentication: Authenticates via username/password or Google OAuth.
- State Persistence: Maintains state in the UI via username/password.
- Open Web UI Integration: Uses h2oGPT as a backend via an OpenAI proxy.
API and Integration
- OpenAI Compatible API: h2oGPT can serve as an alternative to OpenAI servers.
- Inference Server Support: Supports oLLaMa, HF TGI server, vLLM, Gradio, ExLLaMa, Replicate, Together.ai, OpenAI, Azure OpenAI, Anthropic, MistralAI, Google, and Groq.
Server Proxy API Features
- Chat and text completion (streaming and non-streaming)
- Audio transcription (STT)
- Audio generation (TTS)
- Image generation
- Embeddings
- Function tool calling and auto tool selection
- AutoGen code execution agents
6. Advanced Features
JSON Mode and Structured Output
- Strict Schema Control: Uses outlines with vLLM for strict schema control.
- Multi-platform Support: Supports strict schema control for OpenAI, Anthropic, Google Gemini, and MistralAI models.
- JSON Mode: Provides JSON mode for some older OpenAI or Gemini models.
Web Search and Agents
- Web Search Integration: Web search integrated with chat and document Q&A.
- Intelligent Agents: Supports agents for search, document Q&A, Python code, CSV frameworks, etc.
- High-Quality Agents: Provides high-quality agents via an OpenAI proxy server on an independent port.
- Code-First Agents: Generates charts, conducts research, evaluates images via vision models, etc.
Performance Evaluation
- Reward Models: Uses reward models to evaluate performance.
- Quality Assurance: Maintains quality through over 1000 unit tests and integration tests (over 24 GPU hours).
Installation and Deployment
Recommended Deployment Method
Docker is recommended for full-featured deployment on Linux, Windows, and Mac. Platform support:
- Docker: Full functionality on Linux, Windows, Mac.
- Linux Script: Full functionality.
- Windows and Mac Scripts: Relatively limited functionality.
Supported Installation Methods
- Docker Build and Run: For Linux, Windows, Mac.
- Linux Install and Run: Native Linux support.
- Windows 10/11 Installation Script: Windows platform support.
- Mac Install and Run: macOS platform support.
- Quick Start: For any platform.
Technical Specifications
Hardware Requirements
- GPU Support: CUDA, AutoGPTQ, exllama.
- CPU Support: Supports pure CPU operation.
- Memory Optimization: Provides low-memory mode.
Offline Installation
- Supports full offline installation.
- Offline document processing capabilities.
- Local model deployment.
Development and Extension
Development Environment
- Follow installation instructions to create a development environment for training and generation.
- Supports fine-tuning any LLM model on custom data.
- Provides a complete test suite.
Testing
pip install requirements-parser pytest-instafail pytest-random-order playsound==1.3.0
conda install -c conda-forge gst-python -y
sudo apt-get install gstreamer-1.0
pip install pygame
GPT_H2O_AI=0 CONCURRENCY_COUNT=1 pytest --instafail -s -v tests
# For OpenAI server tests on an already running local server
pytest -s -v -n 4 openai_server/test_openai_server.py::test_openai_client
Client APIs
- Gradio Client API
- OpenAI Compatible Client API
- Python Client Library
Technical Architecture
Core Technology Stack
- Base Models: LLaMa2, Mistral, Falcon, etc.
- Embedding Technology: instructor-large, all-MiniLM-L6-v2.
- Vector Databases: Chroma, Weaviate, FAISS.
- UI Framework: Gradio.
- Backend Technology: Python, PyTorch, Transformers.
Data Processing Workflow
- Document Ingestion: Uses advanced OCR technology (DocTR).
- Document Segmentation: Semantic chunking technology.
- Vectorization: Uses accurate embedding models.
- Retrieval Augmentation: HYDE technology enhances retrieval.
- Answer Generation: Context-based intelligent answering.
Business Applications
Enterprise-Grade Solutions
h2oGPT provides enterprise-grade generative AI solutions with key features:
- Fully Private: 100% private deployment, data remains within the enterprise.
- Scalability: Supports large-scale deployment.
- Security: Enterprise-grade security assurance.
- Customization: Supports model fine-tuning and customization.
Application Scenarios
- Document Q&A System: Enterprise internal knowledge base Q&A.
- Code Assistance: Code generation and review.
- Data Analysis: CSV data processing and analysis.
- Multimedia Processing: Image, video, audio processing.
- Customer Service: Intelligent customer service system.
H2O.ai Ecosystem
h2oGPT is part of H2O.ai's complete AI platform. H2O.ai also offers:
- H2O-3: Open-source machine learning platform.
- H2O Driverless AI: World-leading AutoML platform.
- H2O Hydrogen Torch: No-code deep learning platform.
- Document AI: Document processing deep learning platform.
- H2O MLOps: Model deployment and monitoring platform.
- H2O Feature Store: Feature store platform.
Summary
h2oGPT is a powerful open-source private GPT solution, particularly suitable for enterprises and individual users who require full control over data privacy. It not only offers functionalities similar to commercial GPT services but also adds features like document processing, multimodal support, and enterprise-grade security, making it an ideal choice for building private AI applications.