Home
Login

Unified LLM API gateway, supporting OpenAI format calls for 100+ large language model providers

NOASSERTIONPython 24.4kBerriAI Last Updated: 2025-06-21

LiteLLM - Unified Large Language Model API Call Gateway

Project Overview

LiteLLM is an open-source Python SDK and proxy server (LLM Gateway) that enables calling over 100 large language model APIs in OpenAI format, including major providers like Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, and Groq.

GitHub: https://github.com/BerriAI/litellm

Core Features

1. Unified API Format

  • Standardized Input Format: Converts input from all providers into a unified format.
  • Consistent Output Format: Text responses are always available at ['choices'][0]['message']['content'].
  • Multi-Endpoint Support: Supports completion, embedding, and image_generation endpoints.

2. High Availability Assurance

  • Retry/Fallback Logic: Supports automatic retries and fallbacks between multiple deployments (e.g., Azure/OpenAI).
  • Routing Functionality: Intelligent routing to the best available model.
  • Load Balancing: Distributes request load across multiple deployments.

3. Cost and Permission Control

  • Budget Management: Sets budget limits by project, API key, and model.
  • Rate Limiting: Prevents excessive API usage.
  • Usage Tracking: Provides detailed call statistics and cost analysis.

Key Features

Python SDK Usage Examples

Basic Call

from litellm import completion
import os


os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"

messages = [{"content": "Hello, how are you?", "role": "user"}]

# OpenAI
response = completion(model="openai/gpt-4o", messages=messages)

# Anthropic
response = completion(model="anthropic/claude-3-sonnet-20240229", messages=messages)

Asynchronous Call Support

from litellm import acompletion
import asyncio

async def test_get_response():
    user_message = "Hello, how are you?"
    messages = [{"content": user_message, "role": "user"}]
    response = await acompletion(model="openai/gpt-4o", messages=messages)
    return response

response = asyncio.run(test_get_response())

Streaming Response

from litellm import completion

response = completion(model="openai/gpt-4o", messages=messages, stream=True)
for part in response:
    print(part.choices[0].delta.content or "")

Proxy Server Functionality

Quick Start

pip install 'litellm[proxy]'
litellm --model huggingface/bigcode/starcoder
# INFO: Proxy running on http://0.0.0.0:4000

Client Call

import openai

client = openai.OpenAI(
    api_key="anything",
    base_url="http://0.0.0.0:4000"
)

response = client.chat.completions.create(
    model="gpt-3.5-turbo", 
    messages=[{
        "role": "user",
        "content": "this is a test request, write a short poem"
    }]
)

Supported Providers

LiteLLM supports over 30 major LLM providers, including:

Mainstream Cloud Providers

  • OpenAI - GPT series models
  • Azure - Azure OpenAI Service
  • AWS - Bedrock and SageMaker
  • Google - Vertex AI, PaLM, Gemini
  • Anthropic - Claude series models

Open Source and Professional Platforms

  • HuggingFace - Open-source model hosting
  • Replicate - Model API service
  • Together AI - Open-source model inference
  • Groq - High-speed inference chips
  • Ollama - Local model execution

Specialized Feature Platforms

  • Cohere - Enterprise-grade NLP
  • AI21 - Jurassic models
  • Perplexity - Search-augmented generation
  • DeepInfra - High-performance inference

Observability and Logging

LiteLLM has built-in support for various monitoring and logging platforms:

import litellm
import os


os.environ["LUNARY_PUBLIC_KEY"] = "your-lunary-public-key"
os.environ["HELICONE_API_KEY"] = "your-helicone-auth-key"
os.environ["LANGFUSE_PUBLIC_KEY"] = "your-langfuse-public-key"


litellm.success_callback = [
    "lunary", "mlflow", "langfuse", 
    "athina", "helicone"
]


response = completion(
    model="openai/gpt-4o", 
    messages=[{"role": "user", "content": "Hi 👋"}]
)

Supported monitoring platforms:

  • Lunary - LLM application monitoring
  • MLflow - Machine learning experiment tracking
  • Langfuse - LLM application tracing
  • Helicone - API call monitoring
  • Athina - AI application evaluation

Enterprise-Grade Features

Key Management System


curl 'http://0.0.0.0:4000/key/generate' \
  --header 'Authorization: Bearer sk-1234' \
  --header 'Content-Type: application/json' \
  --data-raw '{
    "models": ["gpt-3.5-turbo", "gpt-4", "claude-2"], 
    "duration": "20m",
    "metadata": {
      "user": "user@company.com", 
      "team": "core-infra"
    }
  }'

Docker Deployment


git clone https://github.com/BerriAI/litellm
cd litellm


echo 'LITELLM_MASTER_KEY="sk-1234"' > .env
echo 'LITELLM_SALT_KEY="your-salt-key"' > .env


docker-compose up

Web Management Interface

  • Access /ui for a visual management interface.
  • Set multi-project budgets and rate limits.
  • Monitor API usage in real-time.
  • User and team management.

Technical Specifications

System Requirements

  • Python Version: Requires Python 3.7+
  • Dependency Requirements:
    • openai>=1.0.0 (v1.0.0+ required)
    • pydantic>=2.0.0 (v1.40.14+ required)

Code Quality Standards

  • Code Style: Follows the Google Python Style Guide.
  • Formatting Tools: Uses Black and isort.
  • Type Checking: MyPy and Pyright.
  • Code Inspection: Ruff for linting.

Stability Guarantee

  • Stable Version: Use Docker images with the -stable tag.
  • Load Testing: 12-hour load testing before release.
  • Continuous Integration: Complete CI/CD process.

Commercial Support

Enterprise Edition Features

  • Advanced Security Features: Single Sign-On (SSO) integration.
  • Professional Support: Dedicated Discord and Slack support.
  • Custom Integrations: Customized LLM provider integrations.
  • SLA Guarantee: Service Level Agreement.
  • Feature Prioritization: Prioritized development of enterprise-requested features.

Community Support

  • GitHub Issues: Feature requests and issue reporting.
  • Discord Community: Real-time communication and support.
  • Comprehensive Documentation: Detailed API documentation and tutorials.

Use Cases

1. Multi-Cloud LLM Deployment

  • Avoid vendor lock-in.
  • Enable cross-platform model calls.
  • Reduce migration costs.

2. Cost Optimization

  • Intelligent routing to the cheapest available model.
  • Budget control and usage monitoring.
  • Batch API call optimization.

3. High Availability Architecture

  • Automatic failover.
  • Load balancing.
  • Multi-region deployment support.

4. Development Efficiency Improvement

  • Unified API interface.
  • Simplified model switching.
  • Rich SDK support.

Installation and Quick Start

Basic Installation

pip install litellm

Proxy Server Installation

pip install 'litellm[proxy]'

Development Environment Setup


git clone https://github.com/BerriAI/litellm
cd litellm


python -m venv .venv
source .venv/bin/activate


pip install -e ".[all]"


uvicorn litellm.proxy.proxy_server:app --host localhost --port 4000 --reload

Summary

LiteLLM has been adopted by well-known companies such as Rocket Money, Samsara, Lemonade, and Adobe. By providing a unified API interface, powerful routing capabilities, and enterprise-grade management features, it significantly simplifies the management complexity of multi-LLM environments, making it an ideal choice for modern AI application development.