An open-source, Claude-powered autonomous AI scientist capable of automatically executing the full scientific research cycle: literature analysis, hypothesis generation, experimental design, execution, analysis, and iterative improvement.

PythonKosmosjimmc414 308 Last Updated: December 12, 2025

Kosmos - Autonomous AI Scientist Platform Detailed Introduction

Project Overview

Kosmos is an open-source implementation of an autonomous AI scientist capable of executing the full scientific research cycle: from literature analysis and hypothesis generation, to experimental design, execution, analysis, and iterative improvement. The project is based on the Kosmos AI paper released in November 2025 (https://arxiv.org/abs/2511.02824) and adapted to be driven by Claude Code or Anthropic API.

Core Features

🔬 Autonomous Research Cycle

  • End-to-end Scientific Workflow: Full automation of the research cycle
  • Multi-discipline Support: Biology, Physics, Chemistry, Neuroscience, Materials Science
  • Iterative Improvement: Automatic optimization of hypotheses and experimental designs based on results

🤖 AI-Driven Intelligent System

  • Claude Sonnet 4 Driven: For hypothesis generation and advanced analysis
  • Multi-model Support: Supports Anthropic Claude, OpenAI GPT, and local models (Ollama, LM Studio)
  • Intelligent Model Selection: Automatically selects the optimal model based on task complexity

🔧 Flexible Integration Options

  • Dual Integration Options:
    • Option A: Anthropic API (Pay-as-you-go)
    • Option B: Claude Code CLI (Requires Max Subscription)
  • Mature Analysis Patterns: Integrates battle-tested statistical methods from kosmos-figures

📚 Literature Integration

  • Automated Paper Search: Supports arXiv, Semantic Scholar, PubMed
  • Literature Summarization: Automatically extracts key information
  • Novelty Check: Verifies the novelty of research hypotheses

🏗️ Agent Architecture

  • Modular Design: Each research task corresponds to an independent agent
  • Parallel Execution: Simultaneously runs multiple research tasks
  • Collaborative Work: Agents share information through a structured world model

🛡️ Safety First

  • Sandboxed Execution: Isolated code execution environment
  • Verification Mechanisms: Result verification and reproducibility checks
  • Human Approval Gate: Optional human review step

💰 Cost Optimization

  • Multi-layer Caching System: Reduces API costs by 30-40%
  • Smart Prompt Caching: Significantly saves costs when using Anthropic
  • Model Selection Optimization: Intelligently selects models based on task complexity, reducing costs by 15-20%

System Architecture

Core Components

┌─────────────────────────────────────────────────────────────┐
│                    Research Director                         │
│              (Main controller coordinating the               │
│                 autonomous research cycle)                   │
└──────────────┬──────────────────────────────────────────────┘
               │
┌──────────────┴────────┬───────────────┬──────────────┐
│                       │               │              │
┌───▼────┐   ┌─────────▼──────────┐  ┌▼──────────┐ ┌▼─────────────┐
│Literature│  │Hypothesis Generator│  │Experiment │ │Data Analyst  │
│Analyzer  │  │     (Claude)       │  │Designer   │ │   (Claude)   │
└───┬────┘   └─────────┬──────────┘  └┬──────────┘ └┬─────────────┘
    │                  │               │             │
    └──────────────────┴───────────────┴─────────────┘
                       │
            ┌──────────▼──────────┐
            │  Execution Engine   │
            │   (kosmos-figures   │
            │   proven patterns)  │
            └─────────────────────┘

Agent Descriptions

  • Research Director (Research Manager): Main coordinator managing the research workflow
  • Literature Analyzer (Literature Analyzer): Searches and analyzes scientific papers (arXiv, Semantic Scholar, PubMed)
  • Hypothesis Generator (Hypothesis Generator): Generates testable hypotheses using Claude
  • Experiment Designer (Experiment Designer): Designs computational experiments
  • Execution Engine (Execution Engine): Runs experiments using verified statistical methods
  • Data Analyst (Data Analyst): Interprets results using Claude
  • Feedback Loop (Feedback Loop): Iteratively improves hypotheses based on results

Technical Requirements

Basic Requirements

  • Python 3.11 or 3.12
  • One of the following:
    • Option A: Anthropic API key (Pay-as-you-go)
    • Option B: Claude Code CLI installed (requires Max subscription)

Installation Guide

Basic Installation

# Clone the repository
git clone https://github.com/jimmc414/Kosmos.git
cd Kosmos

# Create a virtual environment
python3.11 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -e .

# Support for Claude Code CLI (Option B)
pip install -e ".[router]"

Option A: Anthropic API Configuration

# Copy the example configuration
cp .env.example .env

# Edit .env and set your API key
# ANTHROPIC_API_KEY=sk-ant-api03-your-actual-key-here

Get your API key from console.anthropic.com

Pros:

  • Pay-as-you-go
  • No CLI installation required
  • Works anywhere

Cons:

  • Charged per token
  • Rate limits apply

Option B: Claude Code CLI Configuration

# 1. Install Claude Code CLI
# Visit https://claude.ai/download and follow the instructions

# 2. Authenticate Claude CLI
claude auth

# 3. Copy the example configuration
cp .env.example .env

# 4. Edit .env and set the API key to all 9s (triggers CLI routing)
# ANTHROPIC_API_KEY=999999999999999999999999999999999999999999999999

This will route all API calls to the local Claude Code CLI, using your Max subscription without per-token charges.

Pros:

  • No per-token cost
  • Unlimited usage
  • Latest Claude models
  • Local execution

Cons:

  • Requires Claude CLI installation
  • Requires Max subscription

Database Initialization

# Run database migrations
alembic upgrade head

# Verify the database has been created
ls -la kosmos.db

Quick Start

Basic Usage Example

from kosmos import ResearchDirector

# Initialize the research director
director = ResearchDirector()

# Propose a research question
question = "What is the relationship between sleep deprivation and memory consolidation?"

# Run the autonomous research
results = director.conduct_research(
    question=question,
    domain="neuroscience",
    max_iterations=5
)

# View the results
print(results.summary)
print(results.key_findings)

Configuration Options

All configurations are done via environment variables (see .env.example):

Core Configuration

  • ANTHROPIC_API_KEY: API key or 999... for CLI mode
  • CLAUDE_MODEL: Model to use (API mode only)
  • DATABASE_URL: Database connection string
  • LOG_LEVEL: Log verbosity

Research Configuration

  • MAX_RESEARCH_ITERATIONS: Maximum number of autonomous iterations
  • ENABLED_DOMAINS: Which scientific domains are supported
  • ENABLED_EXPERIMENT_TYPES: Allowed experiment types
  • MIN_NOVELTY_SCORE: Minimum novelty threshold

Security Configuration

  • ENABLE_SAFETY_CHECKS: Code safety verification
  • MAX_EXPERIMENT_EXECUTION_TIME: Experiment timeout
  • ENABLE_SANDBOXING: Sandboxed code execution
  • REQUIRE_HUMAN_APPROVAL: Human approval gate

Performance Optimization

Caching System

Kosmos includes a multi-layer caching system that can reduce API costs by 30-40%:

# View cache performance
kosmos cache --stats

# Example output:
# Overall cache performance:
# Total requests: 500
# Cache hits: 175 (35%)
# Estimated cost savings: $15.75

Note: Significant cost-saving prompt caching is currently available only when using Anthropic Claude. OpenAI and local providers use in-memory response caching only.

Intelligent Model Selection

When using Anthropic as the LLM provider, Kosmos intelligently selects Claude models based on task complexity:

  • Claude Sonnet 4.5: Complex reasoning, hypothesis generation, analysis
  • Claude Haiku 4: Simple tasks, data extraction, formatting

This reduces costs by 15-20% while maintaining quality.

Note: This feature is specific to Anthropic Claude.

Project Structure

kosmos/
├── core/              # Core infrastructure (LLM, configuration, logging)
├── agents/            # Agent implementations
├── db/                # Database models and operations
├── execution/         # Experiment execution engine
├── analysis/          # Result analysis and visualization
├── hypothesis/        # Hypothesis generation and management
├── experiments/       # Experiment templates
├── literature/        # Literature search and analysis
├── knowledge/         # Knowledge graph and semantic search
├── domains/           # Domain-specific tools (Biology, Physics, etc.)
├── safety/            # Safety checks and verification
└── cli/               # Command-line interface

tests/
├── unit/              # Unit tests
├── integration/       # Integration tests
└── e2e/               # End-to-end tests

docs/
├── kosmos-figures-analysis.md      # Analysis patterns from kosmos-figures
├── integration-plan.md             # Integration strategy
└── domain-roadmaps/                # Domain-specific guides

Development Testing

# Install development dependencies
pip install -e ".[dev]"

# Run all tests
pytest

# Run coverage tests
pytest --cov=kosmos --cov-report=html

# Run specific test suites
pytest tests/unit/
pytest tests/integration/
pytest tests/e2e/

# Code formatting
black kosmos/ tests/

# Code linting
ruff check kosmos/ tests/

# Type checking
mypy kosmos/

Development Roadmap

✅ Completed (10 phases)

Phase 1: Project Foundation ✅

  • Project structure
  • Claude integration (API + CLI)
  • Configuration system
  • Agent framework
  • Database setup

Phase 2: Literature Capabilities ✅

  • Literature APIs (arXiv, Semantic Scholar, PubMed)
  • Literature analysis agent
  • Semantic search vector database
  • Knowledge graph

Phase 3: Hypothesis Generation ✅

  • Hypothesis generator agent
  • Novelty check
  • Hypothesis prioritization

Phase 4: Experimental Design ✅

  • Experiment designer agent
  • Protocol templates
  • Resource estimation

Phase 5: Experiment Execution ✅

  • Sandbox execution environment
  • Integration of kosmos-figures patterns
  • Statistical analysis

Phase 6: Result Analysis ✅

  • Data analysis agent
  • Visualization generation
  • Result summarization

Phase 7: Research Orchestration ✅

  • Research director agent
  • Feedback loop
  • Convergence detection

Phase 8: Safety and Verification ✅

  • Safety verification
  • Domain-specific tools

Phase 9: Production Deployment ✅

  • Performance optimization (20-40× improvement)
  • Multi-layer caching system
  • Comprehensive testing (90%+ coverage)

Phase 10: Documentation and Polishing ✅

  • Extensive documentation (10,000+ lines)
  • User guide
  • API documentation
  • Example code

Scientific Discovery Cases

Kosmos has generated several validated scientific discoveries across various fields:

1. Metabolomics - Brain Hypothermia Protection

Independently reproduced findings from an unpublished manuscript, identifying nucleotide metabolism as the primary altered pathway in hypothermic mouse brains.

2. Materials Science - Solar Cell Efficiency

Discovered that humidity during thermal treatment is a key determinant of solar cell efficiency and identified critical humidity thresholds.

3. Neuroscience - Neural Network Connectivity

Showed that brain networks across species follow a log-normal pattern rather than a power-law pattern.

4. Heart Disease - SOD2 Protective Factor

Found that the SOD2 protein appears to protect the heart by reducing fibrosis.

5. Diabetes - SSR1 Gene Variants

Discovered that genetic variants near the SSR1 gene may have a protective effect against type 2 diabetes.

6. Alzheimer's Disease - Temporal Analysis Method

Proposed a new analytical technique for tracking protein changes over time in disease-affected brain cells.

7. Neurodegenerative Diseases - Phosphatidylserine Exposure

Identified that neurons expose "eat me" signals due to age-related loss of flippase expression.

Research Efficiency: Independent scientists found 79.4% of statements in Kosmos reports to be accurate, and collaborators reported that a single 20-cycle Kosmos run was equivalent to 6 months of their own research time.

Inspiration

This project is inspired by the following works:

Contribution Guidelines

Contributions are welcome! See CONTRIBUTING.md for guidelines.

Welcome Contribution Areas:

  • Domain-specific tools and APIs
  • Experiment templates for different domains
  • Literature API integrations
  • Safety verifications
  • Documentation
  • Testing

License

MIT License - see LICENSE

Citation

If you use Kosmos in your research, please cite:

@software{kosmos_ai_scientist,
  title={Kosmos AI Scientist: Multi-Provider Autonomous Scientific Discovery},
  author={Kosmos Contributors},
  year={2025},
  url={https://github.com/jimmc414/Kosmos}
}

Acknowledgments

  • Anthropic for providing Claude and Claude Code
  • Edison Scientific for providing the kosmos-figures analysis patterns
  • The open science community for providing literature APIs and tools

Support and Community

Project Status

Status: Production-ready (v0.2.0) - All 10 development phases completed

Last Updated: 2025-11-07


Documentation Resources

Summary of Core Advantages

  1. Fully Autonomous: End-to-end automation from hypothesis to discovery
  2. Multi-discipline Support: Across Biology, Physics, Chemistry, and more
  3. Validated: Has produced 7 validated scientific discoveries
  4. Cost-Optimized: Multi-layer caching reduces costs by 30-40%
  5. Flexible Integration: Supports both API and CLI options
  6. Safe and Reliable: Sandboxed execution, verification mechanisms, 90%+ test coverage
  7. Production-Ready: v0.2.0 version, all development phases completed

Star History Chart