BerriAI/litellmView GitHub Homepage for Latest Official Releases

Unified LLM API gateway, supporting OpenAI format calls for 100+ large language model providers

NOASSERTIONPythonlitellmBerriAI 27.0k Last Updated: August 07, 2025

LiteLLM - Unified Large Language Model API Call Gateway

Project Overview

LiteLLM is an open-source Python SDK and proxy server (LLM Gateway) that enables calling over 100 large language model APIs in OpenAI format, including major providers like Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, and Groq.

GitHub: https://github.com/BerriAI/litellm

Core Features

1. Unified API Format

Standardized Input Format: Converts input from all providers into a unified format.
Consistent Output Format: Text responses are always available at ['choices'][0]['message']['content'].
Multi-Endpoint Support: Supports completion, embedding, and image_generation endpoints.

2. High Availability Assurance

Retry/Fallback Logic: Supports automatic retries and fallbacks between multiple deployments (e.g., Azure/OpenAI).
Routing Functionality: Intelligent routing to the best available model.
Load Balancing: Distributes request load across multiple deployments.

3. Cost and Permission Control

Budget Management: Sets budget limits by project, API key, and model.
Rate Limiting: Prevents excessive API usage.
Usage Tracking: Provides detailed call statistics and cost analysis.

Key Features

Python SDK Usage Examples

Basic Call

from litellm import completion
import os


os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"

messages = [{"content": "Hello, how are you?", "role": "user"}]

# OpenAI
response = completion(model="openai/gpt-4o", messages=messages)

# Anthropic
response = completion(model="anthropic/claude-3-sonnet-20240229", messages=messages)

Asynchronous Call Support

from litellm import acompletion
import asyncio

async def test_get_response():
    user_message = "Hello, how are you?"
    messages = [{"content": user_message, "role": "user"}]
    response = await acompletion(model="openai/gpt-4o", messages=messages)
    return response

response = asyncio.run(test_get_response())

Streaming Response

from litellm import completion

response = completion(model="openai/gpt-4o", messages=messages, stream=True)
for part in response:
    print(part.choices[0].delta.content or "")

Proxy Server Functionality

Quick Start

pip install 'litellm[proxy]'
litellm --model huggingface/bigcode/starcoder
# INFO: Proxy running on http://0.0.0.0:4000

Client Call

import openai

client = openai.OpenAI(
    api_key="anything",
    base_url="http://0.0.0.0:4000"
)

response = client.chat.completions.create(
    model="gpt-3.5-turbo", 
    messages=[{
        "role": "user",
        "content": "this is a test request, write a short poem"
    }]
)

Supported Providers

LiteLLM supports over 30 major LLM providers, including:

Mainstream Cloud Providers

OpenAI - GPT series models
Azure - Azure OpenAI Service
AWS - Bedrock and SageMaker
Google - Vertex AI, PaLM, Gemini
Anthropic - Claude series models

Open Source and Professional Platforms

HuggingFace - Open-source model hosting
Replicate - Model API service
Together AI - Open-source model inference
Groq - High-speed inference chips
Ollama - Local model execution

Specialized Feature Platforms

Cohere - Enterprise-grade NLP
AI21 - Jurassic models
Perplexity - Search-augmented generation
DeepInfra - High-performance inference

Observability and Logging

LiteLLM has built-in support for various monitoring and logging platforms:

import litellm
import os


os.environ["LUNARY_PUBLIC_KEY"] = "your-lunary-public-key"
os.environ["HELICONE_API_KEY"] = "your-helicone-auth-key"
os.environ["LANGFUSE_PUBLIC_KEY"] = "your-langfuse-public-key"


litellm.success_callback = [
    "lunary", "mlflow", "langfuse", 
    "athina", "helicone"
]


response = completion(
    model="openai/gpt-4o", 
    messages=[{"role": "user", "content": "Hi 👋"}]
)

Supported monitoring platforms:

Lunary - LLM application monitoring
MLflow - Machine learning experiment tracking
Langfuse - LLM application tracing
Helicone - API call monitoring
Athina - AI application evaluation

Enterprise-Grade Features

Key Management System


curl 'http://0.0.0.0:4000/key/generate' \
  --header 'Authorization: Bearer sk-1234' \
  --header 'Content-Type: application/json' \
  --data-raw '{
    "models": ["gpt-3.5-turbo", "gpt-4", "claude-2"], 
    "duration": "20m",
    "metadata": {
      "user": "user@company.com", 
      "team": "core-infra"
    }
  }'

Docker Deployment


git clone https://github.com/BerriAI/litellm
cd litellm


echo 'LITELLM_MASTER_KEY="sk-1234"' > .env
echo 'LITELLM_SALT_KEY="your-salt-key"' > .env


docker-compose up

Web Management Interface

Access /ui for a visual management interface.
Set multi-project budgets and rate limits.
Monitor API usage in real-time.
User and team management.

Technical Specifications

System Requirements

Python Version: Requires Python 3.7+
Dependency Requirements:
- openai>=1.0.0 (v1.0.0+ required)
- pydantic>=2.0.0 (v1.40.14+ required)

Code Quality Standards

Code Style: Follows the Google Python Style Guide.
Formatting Tools: Uses Black and isort.
Type Checking: MyPy and Pyright.
Code Inspection: Ruff for linting.

Stability Guarantee

Stable Version: Use Docker images with the -stable tag.
Load Testing: 12-hour load testing before release.
Continuous Integration: Complete CI/CD process.

Commercial Support

Enterprise Edition Features

Advanced Security Features: Single Sign-On (SSO) integration.
Professional Support: Dedicated Discord and Slack support.
Custom Integrations: Customized LLM provider integrations.
SLA Guarantee: Service Level Agreement.
Feature Prioritization: Prioritized development of enterprise-requested features.

Community Support

GitHub Issues: Feature requests and issue reporting.
Discord Community: Real-time communication and support.
Comprehensive Documentation: Detailed API documentation and tutorials.

Use Cases

1. Multi-Cloud LLM Deployment

Avoid vendor lock-in.
Enable cross-platform model calls.
Reduce migration costs.

2. Cost Optimization

Intelligent routing to the cheapest available model.
Budget control and usage monitoring.
Batch API call optimization.

3. High Availability Architecture

Automatic failover.
Load balancing.
Multi-region deployment support.

4. Development Efficiency Improvement

Unified API interface.
Simplified model switching.
Rich SDK support.

Installation and Quick Start

Basic Installation

pip install litellm

Proxy Server Installation

pip install 'litellm[proxy]'

Development Environment Setup


git clone https://github.com/BerriAI/litellm
cd litellm


python -m venv .venv
source .venv/bin/activate


pip install -e ".[all]"


uvicorn litellm.proxy.proxy_server:app --host localhost --port 4000 --reload

Summary

LiteLLM has been adopted by well-known companies such as Rocket Money, Samsara, Lemonade, and Adobe. By providing a unified API interface, powerful routing capabilities, and enterprise-grade management features, it significantly simplifies the management complexity of multi-LLM environments, making it an ideal choice for modern AI application development.