Build a ChatGPT Clone for $100 - A Single-File Full-Stack LLM Implementation

MITPythonnanochatkarpathy 33.3k Last Updated: October 25, 2025

nanochat Project Detailed Introduction

Project Overview

nanochat is a full-stack Large Language Model (LLM) implementation project developed by renowned AI researcher Andrej Karpathy. Its core philosophy is to demonstrate how to build a ChatGPT-like chatbot from scratch with the least amount of code and the lowest cost.

Project Slogan: "The best ChatGPT that $100 can buy."

Core Features

1. Minimalist Design Philosophy

  • A single, clear, minimal, and modifiable codebase
  • Low dependency design
  • Approximately 8,300 lines of code, 44 files
  • Totaling about 83,497 tokens (approx. 334KB)

2. End-to-End Complete Process

nanochat covers all stages of building an LLM:

  • Tokenization
  • Pretraining
  • Finetuning
  • Evaluation
  • Inference
  • Web Serving

3. Low-Cost Training

  • Basic version ($100 level): Completed in approximately 4 hours on an 8×H100 node
  • Mid-tier version ($300 level): Approximately 12 hours, slightly outperforming GPT-2
  • Advanced version ($1000 level): Approximately 41.6 hours

4. Educational Positioning

  • Designed as the capstone project for Eureka Labs' LLM101n course
  • Highly readable code, easy to learn and understand
  • Suitable for developers who want to deeply understand the entire LLM training process

Quick Start

Run the Quick Training Script

Run on an 8×H100 GPU node:

bash speedrun.sh

Or run in a screen session (recommended):

screen -L -Logfile speedrun.log -S speedrun bash speedrun.sh

After approximately 4 hours, you will have a usable LLM model.

Launch the Web Interface

After training is complete, activate the virtual environment and start the service:

source .venv/bin/activate
python -m scripts.chat_web

Then visit the displayed URL (e.g., http://209.20.xxx.xxx:8000/) to chat with your own LLM, just like using ChatGPT.

Training Results Example

After training, a report.md file will be generated, containing the model's evaluation metrics:

Metric         | BASE   | MID    | SFT    | RL
---------------|--------|--------|--------|-------
CORE           | 0.2219 | -      | -      | -
ARC-Challenge  | -      | 0.2875 | 0.2807 | -
ARC-Easy       | -      | 0.3561 | 0.3876 | -
GSM8K          | -      | 0.0250 | 0.0455 | 0.0758
HumanEval      | -      | 0.0671 | 0.0854 | -
MMLU           | -      | 0.3111 | 0.3151 | -
ChatCORE       | -      | 0.0730 | 0.0884 | -

Total wall clock time: 3h51m

Note: The model trained for $100 has limited performance (approx. 4e19 FLOPs), equivalent to "kindergarten level" language ability, but it is sufficient to demonstrate the complete training process.

Scaling to Larger Models

To train larger models (e.g., GPT-2 level d26 model), only minor modifications to speedrun.sh are needed:

# 1. Download more data shards
python -m nanochat.dataset -n 450 &

# 2. Increase model depth, decrease batch size to fit memory
torchrun --standalone --nproc_per_node=8 -m scripts.base_train -- --depth=26 --device_batch_size=16

# 3. Maintain the same configuration for mid-training
torchrun --standalone --nproc_per_node=8 -m scripts.mid_train -- --device_batch_size=16

Runtime Environment Requirements

Recommended Configuration

Compatibility

  • 8×A100 GPU: Can run, but slower
  • Single GPU: Can run, but training time increases 8-fold
  • Small VRAM GPU (<80GB): Requires adjusting the --device_batch_size parameter (from 32 down to 16, 8, 4, 2, or 1)
  • Other platforms: Based on PyTorch, theoretically supports xpu, mps, etc., but requires additional configuration

Tech Stack

  • Deep Learning Framework: PyTorch
  • Distributed Training: torchrun
  • Package Management: uv
  • Datasets: HuggingFace Fineweb, Smoltalk
  • Tokenizer: Custom Rust implementation (rustbpe)

Explore Code with AI Chat

Due to the concise codebase (~330KB), the entire project can be packaged and provided to an LLM for analysis:

files-to-prompt . -e py -e md -e rs -e html -e toml -e sh --ignore "*target*" --cxml > packaged.txt

Or use DeepWiki to explore the codebase directly online.

Testing

The project includes unit tests, especially for the tokenizer:

python -m pytest tests/test_rustbpe.py -v -s

Project Positioning and Goals

nanochat is NOT:

  • A feature-rich LLM framework
  • A highly configurable model factory
  • A production-grade solution

nanochat IS:

  • An educational reference implementation
  • A strong, modifiable, and forkable baseline
  • Aimed at micro-model research for budgets <$1000
  • Designed to reduce the cognitive complexity of LLM development

Project History and Acknowledgements

  • Inspiration Sources:

  • Acknowledgements:

    • HuggingFace - For providing the fineweb and smoltalk datasets
    • Lambda - For providing the compute power needed for development
    • Alec Radford - Chief LLM Advisor

Open Source License

MIT License

Citation Format

@misc{nanochat,
  author = {Andrej Karpathy},
  title = {nanochat: The best ChatGPT that $100 can buy},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/karpathy/nanochat}
}

Project Status

The project is under active development, with the goal of continuously improving the state-of-the-art for micro-models, allowing more people to experience the full LLM training process at an affordable cost.


GitHub Address: https://github.com/karpathy/nanochat

Star History Chart