Build a ChatGPT Clone for $100 - A Single-File Full-Stack LLM Implementation
nanochat Project Detailed Introduction
Project Overview
nanochat is a full-stack Large Language Model (LLM) implementation project developed by renowned AI researcher Andrej Karpathy. Its core philosophy is to demonstrate how to build a ChatGPT-like chatbot from scratch with the least amount of code and the lowest cost.
Project Slogan: "The best ChatGPT that $100 can buy."
Core Features
1. Minimalist Design Philosophy
- A single, clear, minimal, and modifiable codebase
- Low dependency design
- Approximately 8,300 lines of code, 44 files
- Totaling about 83,497 tokens (approx. 334KB)
2. End-to-End Complete Process
nanochat covers all stages of building an LLM:
- Tokenization
- Pretraining
- Finetuning
- Evaluation
- Inference
- Web Serving
3. Low-Cost Training
- Basic version ($100 level): Completed in approximately 4 hours on an 8×H100 node
- Mid-tier version ($300 level): Approximately 12 hours, slightly outperforming GPT-2
- Advanced version ($1000 level): Approximately 41.6 hours
4. Educational Positioning
- Designed as the capstone project for Eureka Labs' LLM101n course
- Highly readable code, easy to learn and understand
- Suitable for developers who want to deeply understand the entire LLM training process
Quick Start
Run the Quick Training Script
Run on an 8×H100 GPU node:
bash speedrun.sh
Or run in a screen session (recommended):
screen -L -Logfile speedrun.log -S speedrun bash speedrun.sh
After approximately 4 hours, you will have a usable LLM model.
Launch the Web Interface
After training is complete, activate the virtual environment and start the service:
source .venv/bin/activate
python -m scripts.chat_web
Then visit the displayed URL (e.g., http://209.20.xxx.xxx:8000/) to chat with your own LLM, just like using ChatGPT.
Training Results Example
After training, a report.md file will be generated, containing the model's evaluation metrics:
Metric | BASE | MID | SFT | RL
---------------|--------|--------|--------|-------
CORE | 0.2219 | - | - | -
ARC-Challenge | - | 0.2875 | 0.2807 | -
ARC-Easy | - | 0.3561 | 0.3876 | -
GSM8K | - | 0.0250 | 0.0455 | 0.0758
HumanEval | - | 0.0671 | 0.0854 | -
MMLU | - | 0.3111 | 0.3151 | -
ChatCORE | - | 0.0730 | 0.0884 | -
Total wall clock time: 3h51m
Note: The model trained for $100 has limited performance (approx. 4e19 FLOPs), equivalent to "kindergarten level" language ability, but it is sufficient to demonstrate the complete training process.
Scaling to Larger Models
To train larger models (e.g., GPT-2 level d26 model), only minor modifications to speedrun.sh are needed:
# 1. Download more data shards
python -m nanochat.dataset -n 450 &
# 2. Increase model depth, decrease batch size to fit memory
torchrun --standalone --nproc_per_node=8 -m scripts.base_train -- --depth=26 --device_batch_size=16
# 3. Maintain the same configuration for mid-training
torchrun --standalone --nproc_per_node=8 -m scripts.mid_train -- --device_batch_size=16
Runtime Environment Requirements
Recommended Configuration
- 8×H100 80GB GPU node (approx. $24/hour)
- Suggested provider: Lambda GPU Cloud
Compatibility
- 8×A100 GPU: Can run, but slower
- Single GPU: Can run, but training time increases 8-fold
- Small VRAM GPU (<80GB): Requires adjusting the
--device_batch_sizeparameter (from 32 down to 16, 8, 4, 2, or 1) - Other platforms: Based on PyTorch, theoretically supports xpu, mps, etc., but requires additional configuration
Tech Stack
- Deep Learning Framework: PyTorch
- Distributed Training: torchrun
- Package Management: uv
- Datasets: HuggingFace Fineweb, Smoltalk
- Tokenizer: Custom Rust implementation (rustbpe)
Explore Code with AI Chat
Due to the concise codebase (~330KB), the entire project can be packaged and provided to an LLM for analysis:
files-to-prompt . -e py -e md -e rs -e html -e toml -e sh --ignore "*target*" --cxml > packaged.txt
Or use DeepWiki to explore the codebase directly online.
Testing
The project includes unit tests, especially for the tokenizer:
python -m pytest tests/test_rustbpe.py -v -s
Project Positioning and Goals
nanochat is NOT:
- A feature-rich LLM framework
- A highly configurable model factory
- A production-grade solution
nanochat IS:
- An educational reference implementation
- A strong, modifiable, and forkable baseline
- Aimed at micro-model research for budgets <$1000
- Designed to reduce the cognitive complexity of LLM development
Project History and Acknowledgements
Inspiration Sources:
- nanoGPT - Karpathy's earlier pretraining project
- modded-nanoGPT - A gamified nanoGPT variant
Acknowledgements:
- HuggingFace - For providing the fineweb and smoltalk datasets
- Lambda - For providing the compute power needed for development
- Alec Radford - Chief LLM Advisor
Open Source License
MIT License
Citation Format
@misc{nanochat,
author = {Andrej Karpathy},
title = {nanochat: The best ChatGPT that $100 can buy},
year = {2025},
publisher = {GitHub},
url = {https://github.com/karpathy/nanochat}
}
Project Status
The project is under active development, with the goal of continuously improving the state-of-the-art for micro-models, allowing more people to experience the full LLM training process at an affordable cost.
GitHub Address: https://github.com/karpathy/nanochat