Multi-LLM collaboration tool that queries multiple AI models, enables peer review, and synthesizes responses through a chairman model
LLM Council - Multi-Model AI Collaboration Platform
Project Overview
LLM Council is an innovative open-source project created by Andrej Karpathy that transforms single-model AI interactions into collaborative, multi-model consensus systems. Instead of relying on a single LLM provider, this tool orchestrates multiple frontier AI models to work together, review each other's outputs, and produce synthesized responses through a democratic process.
Core Concept
The fundamental idea behind LLM Council is to leverage the strengths of different AI models while minimizing individual model biases. By creating an "AI advisory board," users receive more comprehensive, peer-reviewed answers to complex questions rather than depending on a single model's perspective.
Architecture & Workflow
Three-Stage Process
Stage 1: First Opinions
- User query is dispatched simultaneously to all council member models via OpenRouter API
- Each LLM generates its independent response without seeing others' outputs
- Individual responses are displayed in a tab view for side-by-side comparison
- Default council includes: GPT-5.1, Gemini 3.0 Pro, Claude Sonnet 4.5, and Grok 4
Stage 2: Anonymous Peer Review
- Each model receives anonymized responses from all other council members
- Models evaluate and rank each response based on accuracy and insight
- Identity anonymization prevents bias and favoritism in evaluations
- Cross-model evaluation reveals surprising patterns (models often rank competitors higher)
Stage 3: Chairman Synthesis
- A designated Chairman LLM (configurable) reviews all original responses
- Considers peer review rankings and evaluations
- Produces a final synthesized answer incorporating the best elements
- Delivers a comprehensive response to the user
Technical Stack
Backend
- Framework: FastAPI (Python 3.10+)
- HTTP Client: async httpx for non-blocking API calls
- API Integration: OpenRouter API for multi-model access
- Storage: JSON-based conversation persistence in
data/conversations/ - Package Management: uv for modern Python dependency management
Frontend
- Framework: React with Vite for fast development and builds
- Rendering: react-markdown for formatted output
- UI: ChatGPT-like interface with tab views for model comparison
- Dev Server: Vite dev server on port 5173
Key Features
Multi-Model Dispatching
- Simultaneous query execution across multiple frontier models
- Configurable council membership through
backend/config.py - Support for models from OpenAI, Google, Anthropic, xAI, and more
Objective Peer Review
- Anonymized response evaluation prevents model bias
- Quantitative ranking system for accuracy and insight
- Reveals interesting patterns in model preferences and strengths
Synthesized Consensus
- Chairman model aggregates diverse perspectives
- Produces coherent final answers incorporating multiple viewpoints
- Balances verbosity, insight, and conciseness
Transparent Comparison
- Side-by-side view of all individual responses
- Complete visibility into peer review rankings
- Users can form their own judgments alongside AI consensus
Conversation Persistence
- Automatic saving of conversation history
- JSON-based storage for easy data portability
- Ability to review and analyze past council sessions
Installation & Setup
Prerequisites
- Python 3.10 or higher
- Node.js and npm
- OpenRouter API key (requires purchased credits)
Backend Setup
# Install dependencies using uv
uv sync
Frontend Setup
# Navigate to frontend directory
cd frontend
# Install npm dependencies
npm install
cd ..
Configuration
- Create
.envfile in project root:
OPENROUTER_API_KEY=sk-or-v1-your-key-here
- Configure Council in
backend/config.py:
COUNCIL_MODELS = [
"openai/gpt-5.1",
"google/gemini-3-pro-preview",
"anthropic/claude-sonnet-4.5",
"x-ai/grok-4",
]
CHAIRMAN_MODEL = "google/gemini-3-pro-preview"
Running the Application
Option 1: Quick Start Script
./start.sh
Option 2: Manual Start
# Terminal 1 - Backend
uv run python -m backend.main
# Terminal 2 - Frontend
cd frontend
npm run dev
Access the application at: http://localhost:5173
Use Cases
Reading & Literature Analysis
- Karpathy's original use case: reading books with multiple AI perspectives
- Different models emphasize different literary aspects
- Comparative analysis of interpretation styles
Research & Analysis
- Complex questions requiring multiple viewpoints
- Technical documentation evaluation
- Business strategy assessment
Content Evaluation
- Legal document analysis
- Scientific paper interpretation
- Code review and technical writing
Model Comparison
- Benchmarking different LLM capabilities
- Understanding model strengths and weaknesses
- Identifying bias patterns across providers
Interesting Findings
Model Self-Assessment
- Models frequently select competitors' responses as superior to their own
- Demonstrates surprising objectivity in peer review process
- Reveals genuine differences in approach and quality
Ranking Patterns
In Karpathy's testing with book chapters:
- Consensus Winner: GPT-5.1 consistently rated as most insightful
- Consensus Loser: Claude consistently ranked lowest
- Middle Tier: Gemini 3 Pro and Grok-4 between extremes
Human vs. AI Judgment Divergence
- AI consensus may not align with human preferences
- GPT-5.1 praised for insight but criticized by Karpathy as "too wordy"
- Claude ranked lowest by peers but preferred by creator for terseness
- Gemini appreciated for condensed, processed outputs
- Suggests models may favor verbosity over conciseness
Project Philosophy
"Vibe Coded" Approach
- Described as "99% vibe coded" Saturday hack project
- Rapid development with AI assistance
- No long-term support commitment from creator
- "Code is ephemeral now and libraries are over" philosophy
Open Source & Inspiration
- Provided as-is for community inspiration
- Users encouraged to modify via their own LLMs
- Represents reference architecture for AI orchestration
- Demonstrates ensemble learning applied to language models
Enterprise Implications
Orchestration Middleware
- Reveals the architecture of multi-model coordination
- Addresses vendor lock-in concerns
- Demonstrates feasibility of model-agnostic applications
Quality Control Layer
- Peer review adds validation absent in single-model systems
- Reduces individual model biases
- Provides transparency in AI decision-making
Reference Implementation
- Shows minimum viable architecture for ensemble AI
- Guides build vs. buy decisions for enterprise platforms
- Demystifies multi-model orchestration complexity
Limitations & Considerations
Cost
- Requires OpenRouter API credits for all council members plus chairman
- Multiple model calls per query increase operational costs
- No free tier operation available
Speed
- Three-stage process slower than single-model queries
- Multiple API calls add latency
- Trade-off between speed and quality/consensus
Model Availability
- Dependent on OpenRouter model catalog
- Requires active API keys and credits
- Subject to model provider rate limits
Maintenance
- Creator explicitly states no ongoing support
- Community-driven improvements only
- Users responsible for adaptations and updates
Technical Considerations
Anonymization Strategy
- Random IDs (A, B, C, D) assigned to responses
- Prevents identity-based bias in peer review
- Maintains objectivity in evaluation process
API Integration
- Single point of integration via OpenRouter
- Abstracts away individual provider APIs
- Simplifies multi-model coordination
Data Privacy
- Local web application runs on user's machine
- Conversations stored locally as JSON
- API calls go through OpenRouter (third-party)
Community & Ecosystem
Related Projects
- Swarms Framework: Implements LLMCouncil class inspired by this project
- Hugging Face Spaces: Community deployments available
- Medium/VentureBeat Coverage: Enterprise analysis and implications
Similar Approaches
- Ensemble learning in machine learning
- Mixture of Experts architectures
- Multi-agent AI systems
- Consensus protocols in distributed systems
Future Directions
While Karpathy explicitly states no planned improvements, potential community extensions could include:
- Extended Model Support: Adding more council members from emerging providers
- Custom Ranking Criteria: User-defined evaluation dimensions
- Streaming Responses: Real-time display of model outputs
- Advanced Synthesis: More sophisticated chairman algorithms
- Cost Optimization: Intelligent model selection based on query type
- Performance Analytics: Tracking model accuracy and preference patterns
- Integration APIs: Embedding council functionality in other applications
Getting Started
- Clone the repository:
git clone https://github.com/karpathy/llm-council - Follow installation instructions above
- Configure your preferred council models
- Start querying and compare perspectives
- Experiment with different model combinations
- Analyze peer review patterns
Conclusion
LLM Council represents a pragmatic approach to addressing single-model limitations through ensemble orchestration. While presented as a casual weekend project, it offers valuable insights into multi-model architecture, peer review mechanisms, and the future of AI orchestration middleware. For developers, researchers, and enterprises exploring beyond single-provider solutions, this project provides both inspiration and a concrete reference implementation for building more robust, consensus-driven AI systems.
The project's minimalist approach—a few hundred lines of code achieving sophisticated multi-model coordination—demonstrates that the technical barriers to ensemble AI are lower than many assume. The real challenges lie not in routing prompts, but in governance, cost management, and determining when consensus truly improves outcomes over individual model responses.