karpathy/llm-council View GitHub Homepage for Latest Official Releases

Multi-LLM collaboration tool that queries multiple AI models, enables peer review, and synthesizes responses through a chairman model

Pythonllm-councilkarpathy 14.1k Last Updated: November 22, 2025

LLM Council - Multi-Model AI Collaboration Platform

Project Overview

LLM Council is an innovative open-source project created by Andrej Karpathy that transforms single-model AI interactions into collaborative, multi-model consensus systems. Instead of relying on a single LLM provider, this tool orchestrates multiple frontier AI models to work together, review each other's outputs, and produce synthesized responses through a democratic process.

Core Concept

The fundamental idea behind LLM Council is to leverage the strengths of different AI models while minimizing individual model biases. By creating an "AI advisory board," users receive more comprehensive, peer-reviewed answers to complex questions rather than depending on a single model's perspective.

Architecture & Workflow

Three-Stage Process

Stage 1: First Opinions

User query is dispatched simultaneously to all council member models via OpenRouter API
Each LLM generates its independent response without seeing others' outputs
Individual responses are displayed in a tab view for side-by-side comparison
Default council includes: GPT-5.1, Gemini 3.0 Pro, Claude Sonnet 4.5, and Grok 4

Stage 2: Anonymous Peer Review

Each model receives anonymized responses from all other council members
Models evaluate and rank each response based on accuracy and insight
Identity anonymization prevents bias and favoritism in evaluations
Cross-model evaluation reveals surprising patterns (models often rank competitors higher)

Stage 3: Chairman Synthesis

A designated Chairman LLM (configurable) reviews all original responses
Considers peer review rankings and evaluations
Produces a final synthesized answer incorporating the best elements
Delivers a comprehensive response to the user

Technical Stack

Backend

Framework: FastAPI (Python 3.10+)
HTTP Client: async httpx for non-blocking API calls
API Integration: OpenRouter API for multi-model access
Storage: JSON-based conversation persistence in data/conversations/
Package Management: uv for modern Python dependency management

Frontend

Framework: React with Vite for fast development and builds
Rendering: react-markdown for formatted output
UI: ChatGPT-like interface with tab views for model comparison
Dev Server: Vite dev server on port 5173

Key Features

Multi-Model Dispatching

Simultaneous query execution across multiple frontier models
Configurable council membership through backend/config.py
Support for models from OpenAI, Google, Anthropic, xAI, and more

Objective Peer Review

Anonymized response evaluation prevents model bias
Quantitative ranking system for accuracy and insight
Reveals interesting patterns in model preferences and strengths

Synthesized Consensus

Chairman model aggregates diverse perspectives
Produces coherent final answers incorporating multiple viewpoints
Balances verbosity, insight, and conciseness

Transparent Comparison

Side-by-side view of all individual responses
Complete visibility into peer review rankings
Users can form their own judgments alongside AI consensus

Conversation Persistence

Automatic saving of conversation history
JSON-based storage for easy data portability
Ability to review and analyze past council sessions

Installation & Setup

Prerequisites

Python 3.10 or higher
Node.js and npm
OpenRouter API key (requires purchased credits)

Backend Setup

# Install dependencies using uv
uv sync

Frontend Setup

# Navigate to frontend directory
cd frontend

# Install npm dependencies
npm install

cd ..

Configuration

Create .env file in project root:

OPENROUTER_API_KEY=sk-or-v1-your-key-here

Configure Council in backend/config.py:

COUNCIL_MODELS = [
    "openai/gpt-5.1",
    "google/gemini-3-pro-preview",
    "anthropic/claude-sonnet-4.5",
    "x-ai/grok-4",
]
CHAIRMAN_MODEL = "google/gemini-3-pro-preview"

Running the Application

Option 1: Quick Start Script

./start.sh

Option 2: Manual Start

# Terminal 1 - Backend
uv run python -m backend.main

# Terminal 2 - Frontend
cd frontend
npm run dev

Access the application at: http://localhost:5173

Use Cases

Reading & Literature Analysis

Karpathy's original use case: reading books with multiple AI perspectives
Different models emphasize different literary aspects
Comparative analysis of interpretation styles

Research & Analysis

Complex questions requiring multiple viewpoints
Technical documentation evaluation
Business strategy assessment

Content Evaluation

Legal document analysis
Scientific paper interpretation
Code review and technical writing

Model Comparison

Benchmarking different LLM capabilities
Understanding model strengths and weaknesses
Identifying bias patterns across providers

Interesting Findings

Model Self-Assessment

Models frequently select competitors' responses as superior to their own
Demonstrates surprising objectivity in peer review process
Reveals genuine differences in approach and quality

Ranking Patterns

In Karpathy's testing with book chapters:

Consensus Winner: GPT-5.1 consistently rated as most insightful
Consensus Loser: Claude consistently ranked lowest
Middle Tier: Gemini 3 Pro and Grok-4 between extremes

Human vs. AI Judgment Divergence

AI consensus may not align with human preferences
GPT-5.1 praised for insight but criticized by Karpathy as "too wordy"
Claude ranked lowest by peers but preferred by creator for terseness
Gemini appreciated for condensed, processed outputs
Suggests models may favor verbosity over conciseness

Project Philosophy

"Vibe Coded" Approach

Described as "99% vibe coded" Saturday hack project
Rapid development with AI assistance
No long-term support commitment from creator
"Code is ephemeral now and libraries are over" philosophy

Open Source & Inspiration

Provided as-is for community inspiration
Users encouraged to modify via their own LLMs
Represents reference architecture for AI orchestration
Demonstrates ensemble learning applied to language models

Enterprise Implications

Orchestration Middleware

Reveals the architecture of multi-model coordination
Addresses vendor lock-in concerns
Demonstrates feasibility of model-agnostic applications

Quality Control Layer

Peer review adds validation absent in single-model systems
Reduces individual model biases
Provides transparency in AI decision-making

Reference Implementation

Shows minimum viable architecture for ensemble AI
Guides build vs. buy decisions for enterprise platforms
Demystifies multi-model orchestration complexity

Limitations & Considerations

Cost

Requires OpenRouter API credits for all council members plus chairman
Multiple model calls per query increase operational costs
No free tier operation available

Speed

Three-stage process slower than single-model queries
Multiple API calls add latency
Trade-off between speed and quality/consensus

Model Availability

Dependent on OpenRouter model catalog
Requires active API keys and credits
Subject to model provider rate limits

Maintenance

Creator explicitly states no ongoing support
Community-driven improvements only
Users responsible for adaptations and updates

Technical Considerations

Anonymization Strategy

Random IDs (A, B, C, D) assigned to responses
Prevents identity-based bias in peer review
Maintains objectivity in evaluation process

API Integration

Single point of integration via OpenRouter
Abstracts away individual provider APIs
Simplifies multi-model coordination

Data Privacy

Local web application runs on user's machine
Conversations stored locally as JSON
API calls go through OpenRouter (third-party)

Community & Ecosystem

Related Projects

Swarms Framework: Implements LLMCouncil class inspired by this project
Hugging Face Spaces: Community deployments available
Medium/VentureBeat Coverage: Enterprise analysis and implications

Similar Approaches

Ensemble learning in machine learning
Mixture of Experts architectures
Multi-agent AI systems
Consensus protocols in distributed systems

Future Directions

While Karpathy explicitly states no planned improvements, potential community extensions could include:

Extended Model Support: Adding more council members from emerging providers
Custom Ranking Criteria: User-defined evaluation dimensions
Streaming Responses: Real-time display of model outputs
Advanced Synthesis: More sophisticated chairman algorithms
Cost Optimization: Intelligent model selection based on query type
Performance Analytics: Tracking model accuracy and preference patterns
Integration APIs: Embedding council functionality in other applications

Getting Started

Clone the repository: git clone https://github.com/karpathy/llm-council
Follow installation instructions above
Configure your preferred council models
Start querying and compare perspectives
Experiment with different model combinations
Analyze peer review patterns

Conclusion

LLM Council represents a pragmatic approach to addressing single-model limitations through ensemble orchestration. While presented as a casual weekend project, it offers valuable insights into multi-model architecture, peer review mechanisms, and the future of AI orchestration middleware. For developers, researchers, and enterprises exploring beyond single-provider solutions, this project provides both inspiration and a concrete reference implementation for building more robust, consensus-driven AI systems.

The project's minimalist approach—a few hundred lines of code achieving sophisticated multi-model coordination—demonstrates that the technical barriers to ensemble AI are lower than many assume. The real challenges lie not in routing prompts, but in governance, cost management, and determining when consensus truly improves outcomes over individual model responses.