Karpathy's autoresearch Lets AI Agents Conduct Machine Learning Research While You Sleep

March 08, 2026
Andrej Karpathy
3 min

News Summary

March 2026 (ET) — Andrej Karpathy, the celebrated AI researcher and founder of Eureka Labs, has released a new open-source project called autoresearch on GitHub. The project, announced in March 2026, puts an AI agent in charge of running machine learning experiments autonomously on a single GPU — effectively replacing the human researcher in the experimental loop during overnight runs.

What Is autoresearch?

The concept is deceptively simple: give an AI agent a small but fully functional large language model (LLM) training environment and let it iterate independently. The agent modifies the training code, runs a 5-minute experiment, checks whether performance improved on the validation metric, and then keeps or discards the change — repeating this cycle through the night. By morning, the user wakes up to a complete log of experiments and, ideally, a meaningfully better model.

The project is built on top of Karpathy's earlier work, nanochat, a single-GPU LLM training implementation. The codebase is deliberately minimal: only three files matter. prepare.py handles data preparation and utilities and is never touched by the agent. train.py is the single file the agent edits freely — modifying anything from model architecture and hyperparameters to the optimizer and batch size. program.md is a Markdown-format instruction file written by the human researcher to guide the agent's behavior, effectively acting as a "research org specification."

Fixed Time Budget: The Clever Core Design

One of the most notable design decisions in autoresearch is the strict 5-minute wall-clock time budget for each experiment. Regardless of what the agent changes — model size, batch size, architecture — every run takes exactly 5 minutes. This yields roughly 12 experiments per hour and approximately 100 experiments during a single night's sleep.

The benefit is that all runs become directly comparable, since they compete on the same time budget rather than on floating compute. The tradeoff is that results are platform-specific: a run on an NVIDIA H100 will not be comparable to a run on a different GPU. The evaluation metric is val_bpb (validation bits per byte), a vocabulary-size-independent measure, ensuring fair comparisons even when the agent changes model architecture.

Minimal Dependencies, Maximum Autonomy

Karpathy has kept the project self-contained with no external infrastructure dependencies beyond PyTorch and a handful of small packages. There is no distributed training, no complex configuration system, and no cloud requirements. A single NVIDIA GPU is all that is needed, with Python 3.10+ and the uv package manager.

To enter autonomous research mode, users simply point their AI agent of choice — Claude, Codex, or any other — at the repository and instruct it to read program.md and begin experimenting. Karpathy notes that the program.md file is a "super lightweight skill" — a plain-text interface for programming research intent.

Community Response and Early Momentum

Since its release, the repository has attracted significant community attention, garnering over 1,800 stars and 200 forks on GitHub as of early March 2026. Multiple community-driven forks have already appeared, including a macOS-compatible variant. The project had 20 commits and active issues filed within days of launch, signaling strong developer interest.

A Glimpse at the Future of Research

Karpathy accompanied the project with a characteristically witty philosophical framing, writing: "One day, frontier AI research used to be done by meat computers in between eating, sleeping, having other fun… That era is long gone." While tongue-in-cheek, the statement reflects a broader shift in how the AI community is beginning to think about automated research pipelines and agent-driven scientific discovery.