For most of us, training a large language model has been an opaque box. It’s something that happens behind the closed doors of big tech, requiring millions of dollars and massive GPU clusters. We can utilize the final product through an API, but the intricate, end-to-end process of transforming raw web text into a conversational agent has remained completely out of reach for individual developers.
Andrej Karpathy’s new project, NanoChat, is a direct challenge to that status quo.
For about $100 in cloud credits, he’s open-sourced not just a model, but the entire LLM factory, giving us the complete blueprint for building a chatbot from scratch. This project demystifies the creation of user experiences provided by ChatGPT. It finally provides developers with a tangible, hackable, and affordable way to learn how these powerful systems are actually built.
NanoChat is an open-source LLM chatbot system created by Andrej Karpathy. It implements a basic ChatGPT-like model and interface in one slim package. The project’s tagline is, the best ChatGPT that $100 can buy, meaning you can train a toy chatbot on just $100 of cloud compute. It runs on a single multi-GPU server (e.g., 8 NVIDIA H100 GPUs) and performs the full pipeline end-to-end. Everything, from data processing to training and inference, is done by a single script.
In practice, NanoChat is a small neural network (~1–2 billion parameters) that generates text. For example, the demo model d32 has 32 Transformer layers and ~1.9 billion parameters, trained on ~38 billion words of web text. This is enough to surpass old models like GPT-2, but it’s far smaller than today’s giants.
At a high level, NanoChat follows the same recipe as big LLMs, but on a much smaller scale. Here’s the high-level pipeline: