Unified LLM API gateway, supporting OpenAI format calls for 100+ large language model providers
LiteLLM - Unified Large Language Model API Call Gateway
Project Overview
LiteLLM is an open-source Python SDK and proxy server (LLM Gateway) that enables calling over 100 large language model APIs in OpenAI format, including major providers like Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, and Groq.
GitHub: https://github.com/BerriAI/litellm
Core Features
1. Unified API Format
- Standardized Input Format: Converts input from all providers into a unified format.
- Consistent Output Format: Text responses are always available at
['choices'][0]['message']['content']
. - Multi-Endpoint Support: Supports completion, embedding, and image_generation endpoints.
2. High Availability Assurance
- Retry/Fallback Logic: Supports automatic retries and fallbacks between multiple deployments (e.g., Azure/OpenAI).
- Routing Functionality: Intelligent routing to the best available model.
- Load Balancing: Distributes request load across multiple deployments.
3. Cost and Permission Control
- Budget Management: Sets budget limits by project, API key, and model.
- Rate Limiting: Prevents excessive API usage.
- Usage Tracking: Provides detailed call statistics and cost analysis.
Key Features
Python SDK Usage Examples
Basic Call
from litellm import completion
import os
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"
messages = [{"content": "Hello, how are you?", "role": "user"}]
# OpenAI
response = completion(model="openai/gpt-4o", messages=messages)
# Anthropic
response = completion(model="anthropic/claude-3-sonnet-20240229", messages=messages)
Asynchronous Call Support
from litellm import acompletion
import asyncio
async def test_get_response():
user_message = "Hello, how are you?"
messages = [{"content": user_message, "role": "user"}]
response = await acompletion(model="openai/gpt-4o", messages=messages)
return response
response = asyncio.run(test_get_response())
Streaming Response
from litellm import completion
response = completion(model="openai/gpt-4o", messages=messages, stream=True)
for part in response:
print(part.choices[0].delta.content or "")
Proxy Server Functionality
Quick Start
pip install 'litellm[proxy]'
litellm --model huggingface/bigcode/starcoder
# INFO: Proxy running on http://0.0.0.0:4000
Client Call
import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{
"role": "user",
"content": "this is a test request, write a short poem"
}]
)
Supported Providers
LiteLLM supports over 30 major LLM providers, including:
Mainstream Cloud Providers
- OpenAI - GPT series models
- Azure - Azure OpenAI Service
- AWS - Bedrock and SageMaker
- Google - Vertex AI, PaLM, Gemini
- Anthropic - Claude series models
Open Source and Professional Platforms
- HuggingFace - Open-source model hosting
- Replicate - Model API service
- Together AI - Open-source model inference
- Groq - High-speed inference chips
- Ollama - Local model execution
Specialized Feature Platforms
- Cohere - Enterprise-grade NLP
- AI21 - Jurassic models
- Perplexity - Search-augmented generation
- DeepInfra - High-performance inference
Observability and Logging
LiteLLM has built-in support for various monitoring and logging platforms:
import litellm
import os
os.environ["LUNARY_PUBLIC_KEY"] = "your-lunary-public-key"
os.environ["HELICONE_API_KEY"] = "your-helicone-auth-key"
os.environ["LANGFUSE_PUBLIC_KEY"] = "your-langfuse-public-key"
litellm.success_callback = [
"lunary", "mlflow", "langfuse",
"athina", "helicone"
]
response = completion(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hi 👋"}]
)
Supported monitoring platforms:
- Lunary - LLM application monitoring
- MLflow - Machine learning experiment tracking
- Langfuse - LLM application tracing
- Helicone - API call monitoring
- Athina - AI application evaluation
Enterprise-Grade Features
Key Management System
curl 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data-raw '{
"models": ["gpt-3.5-turbo", "gpt-4", "claude-2"],
"duration": "20m",
"metadata": {
"user": "user@company.com",
"team": "core-infra"
}
}'
Docker Deployment
git clone https://github.com/BerriAI/litellm
cd litellm
echo 'LITELLM_MASTER_KEY="sk-1234"' > .env
echo 'LITELLM_SALT_KEY="your-salt-key"' > .env
docker-compose up
Web Management Interface
- Access
/ui
for a visual management interface. - Set multi-project budgets and rate limits.
- Monitor API usage in real-time.
- User and team management.
Technical Specifications
System Requirements
- Python Version: Requires Python 3.7+
- Dependency Requirements:
openai>=1.0.0
(v1.0.0+ required)pydantic>=2.0.0
(v1.40.14+ required)
Code Quality Standards
- Code Style: Follows the Google Python Style Guide.
- Formatting Tools: Uses Black and isort.
- Type Checking: MyPy and Pyright.
- Code Inspection: Ruff for linting.
Stability Guarantee
- Stable Version: Use Docker images with the
-stable
tag. - Load Testing: 12-hour load testing before release.
- Continuous Integration: Complete CI/CD process.
Commercial Support
Enterprise Edition Features
- Advanced Security Features: Single Sign-On (SSO) integration.
- Professional Support: Dedicated Discord and Slack support.
- Custom Integrations: Customized LLM provider integrations.
- SLA Guarantee: Service Level Agreement.
- Feature Prioritization: Prioritized development of enterprise-requested features.
Community Support
- GitHub Issues: Feature requests and issue reporting.
- Discord Community: Real-time communication and support.
- Comprehensive Documentation: Detailed API documentation and tutorials.
Use Cases
1. Multi-Cloud LLM Deployment
- Avoid vendor lock-in.
- Enable cross-platform model calls.
- Reduce migration costs.
2. Cost Optimization
- Intelligent routing to the cheapest available model.
- Budget control and usage monitoring.
- Batch API call optimization.
3. High Availability Architecture
- Automatic failover.
- Load balancing.
- Multi-region deployment support.
4. Development Efficiency Improvement
- Unified API interface.
- Simplified model switching.
- Rich SDK support.
Installation and Quick Start
Basic Installation
pip install litellm
Proxy Server Installation
pip install 'litellm[proxy]'
Development Environment Setup
git clone https://github.com/BerriAI/litellm
cd litellm
python -m venv .venv
source .venv/bin/activate
pip install -e ".[all]"
uvicorn litellm.proxy.proxy_server:app --host localhost --port 4000 --reload
Summary
LiteLLM has been adopted by well-known companies such as Rocket Money, Samsara, Lemonade, and Adobe. By providing a unified API interface, powerful routing capabilities, and enterprise-grade management features, it significantly simplifies the management complexity of multi-LLM environments, making it an ideal choice for modern AI application development.