Karpathy’s NanoChat brings affordable LLM training to the masses

For years, training an AI chatbot has been the domain of billion-dollar labs. NanoChat changes that by giving developers a transparent, affordable way to build their own ChatGPT-style model from scratch.

9 mins read

Nov 03, 2025

For most of us, training a large language model has been an opaque box. It’s something that happens behind the closed doors of big tech, requiring millions of dollars and massive GPU clusters. We can utilize the final product through an API, but the intricate, end-to-end process of transforming raw web text into a conversational agent has remained completely out of reach for individual developers.

Andrej Karpathy’s new project, NanoChat, is a direct challenge to that status quo.

For about $100 in cloud credits, he’s open-sourced not just a model, but the entire LLM factory, giving us the complete blueprint for building a chatbot from scratch. This project demystifies the creation of user experiences provided by ChatGPT. It finally provides developers with a tangible, hackable, and affordable way to learn how these powerful systems are actually built.

What is NanoChat?#

NanoChat is an open-source LLM chatbot system created by Andrej Karpathy. It implements a basic ChatGPT-like model and interface in one slim package. The project’s tagline is, the best ChatGPT that $100 can buy, meaning you can train a toy chatbot on just $100 of cloud compute. It runs on a single multi-GPU server (e.g., 8 NVIDIA H100 GPUs) and performs the full pipeline end-to-end. Everything, from data processing to training and inference, is done by a single script.

In practice, NanoChat is a small neural network (~1–2 billion parameters) that generates text. For example, the demo model d32 has 32 Transformer layers and ~1.9 billion parameters, trained on ~38 billion words of web text. This is enough to surpass old models like GPT-2, but it’s far smaller than today’s giants.

How NanoChat works#

At a high level, NanoChat follows the same recipe as big LLMs, but on a much smaller scale. Here’s the high-level pipeline:

The Educative Newsletter

Speedrun your learning with the Educative Newsletter

Level up every day in just 5 minutes!

Level up every day in just 5 minutes. Your new skill-building hack, curated exclusively for Educative subscribers.

Tech news essentials – from a dev's perspective

In-depth case studies for an insider's edge

The latest in AI, System Design, and Cloud Computing

Essential tech news & industry insights – all from a dev's perspective

Battle-tested guides & in-depth case studies for an insider's edge

The latest in AI, System Design, and Cloud Computing

Written By:

Fahim ul Haq

Free Edition

OpenAI's o3-mini: Is it worth trying as a developer?

Is the o3-mini a worthwhile alternative to DeepSeek's accuracy and performance? We break down its strength and compare it with R1.

7 mins read

Feb 24, 2025