karpathy/nanochat View GitHub Homepage for Latest Official Releases

Build a ChatGPT Clone for $100 - A Single-File Full-Stack LLM Implementation

MITPythonnanochatkarpathy 33.3k Last Updated: October 25, 2025

nanochat Project Detailed Introduction

Project Overview

nanochat is a full-stack Large Language Model (LLM) implementation project developed by renowned AI researcher Andrej Karpathy. Its core philosophy is to demonstrate how to build a ChatGPT-like chatbot from scratch with the least amount of code and the lowest cost.

Project Slogan: "The best ChatGPT that $100 can buy."

Core Features

1. Minimalist Design Philosophy

A single, clear, minimal, and modifiable codebase
Low dependency design
Approximately 8,300 lines of code, 44 files
Totaling about 83,497 tokens (approx. 334KB)

2. End-to-End Complete Process

nanochat covers all stages of building an LLM:

Tokenization
Pretraining
Finetuning
Evaluation
Inference
Web Serving

3. Low-Cost Training

Basic version ($100 level): Completed in approximately 4 hours on an 8×H100 node
Mid-tier version ($300 level): Approximately 12 hours, slightly outperforming GPT-2
Advanced version ($1000 level): Approximately 41.6 hours

4. Educational Positioning

Designed as the capstone project for Eureka Labs' LLM101n course
Highly readable code, easy to learn and understand
Suitable for developers who want to deeply understand the entire LLM training process

Quick Start

Run the Quick Training Script

Run on an 8×H100 GPU node:

bash speedrun.sh

Or run in a screen session (recommended):

screen -L -Logfile speedrun.log -S speedrun bash speedrun.sh

After approximately 4 hours, you will have a usable LLM model.

Launch the Web Interface

After training is complete, activate the virtual environment and start the service:

source .venv/bin/activate
python -m scripts.chat_web

Then visit the displayed URL (e.g., http://209.20.xxx.xxx:8000/) to chat with your own LLM, just like using ChatGPT.

Training Results Example

After training, a report.md file will be generated, containing the model's evaluation metrics:

Metric         | BASE   | MID    | SFT    | RL
---------------|--------|--------|--------|-------
CORE           | 0.2219 | -      | -      | -
ARC-Challenge  | -      | 0.2875 | 0.2807 | -
ARC-Easy       | -      | 0.3561 | 0.3876 | -
GSM8K          | -      | 0.0250 | 0.0455 | 0.0758
HumanEval      | -      | 0.0671 | 0.0854 | -
MMLU           | -      | 0.3111 | 0.3151 | -
ChatCORE       | -      | 0.0730 | 0.0884 | -

Total wall clock time: 3h51m

Note: The model trained for $100 has limited performance (approx. 4e19 FLOPs), equivalent to "kindergarten level" language ability, but it is sufficient to demonstrate the complete training process.

Scaling to Larger Models

To train larger models (e.g., GPT-2 level d26 model), only minor modifications to speedrun.sh are needed:

# 1. Download more data shards
python -m nanochat.dataset -n 450 &

# 2. Increase model depth, decrease batch size to fit memory
torchrun --standalone --nproc_per_node=8 -m scripts.base_train -- --depth=26 --device_batch_size=16

# 3. Maintain the same configuration for mid-training
torchrun --standalone --nproc_per_node=8 -m scripts.mid_train -- --device_batch_size=16

Runtime Environment Requirements

Recommended Configuration

8×H100 80GB GPU node (approx. $24/hour)
Suggested provider: Lambda GPU Cloud

Compatibility

8×A100 GPU: Can run, but slower
Single GPU: Can run, but training time increases 8-fold
Small VRAM GPU (<80GB): Requires adjusting the --device_batch_size parameter (from 32 down to 16, 8, 4, 2, or 1)
Other platforms: Based on PyTorch, theoretically supports xpu, mps, etc., but requires additional configuration

Tech Stack

Deep Learning Framework: PyTorch
Distributed Training: torchrun
Package Management: uv
Datasets: HuggingFace Fineweb, Smoltalk
Tokenizer: Custom Rust implementation (rustbpe)

Explore Code with AI Chat

Due to the concise codebase (~330KB), the entire project can be packaged and provided to an LLM for analysis:

files-to-prompt . -e py -e md -e rs -e html -e toml -e sh --ignore "*target*" --cxml > packaged.txt

Or use DeepWiki to explore the codebase directly online.

Testing

The project includes unit tests, especially for the tokenizer:

python -m pytest tests/test_rustbpe.py -v -s

Project Positioning and Goals

nanochat is NOT:

A feature-rich LLM framework
A highly configurable model factory
A production-grade solution

nanochat IS:

An educational reference implementation
A strong, modifiable, and forkable baseline
Aimed at micro-model research for budgets <$1000
Designed to reduce the cognitive complexity of LLM development

Project History and Acknowledgements

Inspiration Sources:
- nanoGPT - Karpathy's earlier pretraining project
- modded-nanoGPT - A gamified nanoGPT variant
Acknowledgements:
- HuggingFace - For providing the fineweb and smoltalk datasets
- Lambda - For providing the compute power needed for development
- Alec Radford - Chief LLM Advisor

Open Source License

MIT License

Citation Format

@misc{nanochat,
  author = {Andrej Karpathy},
  title = {nanochat: The best ChatGPT that $100 can buy},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/karpathy/nanochat}
}

Project Status

The project is under active development, with the goal of continuously improving the state-of-the-art for micro-models, allowing more people to experience the full LLM training process at an affordable cost.

GitHub Address: https://github.com/karpathy/nanochat