Karpathy’s NanoChat brings affordable LLM training to the masses

Karpathy’s NanoChat brings affordable LLM training to the masses

For years, training an AI chatbot has been the domain of billion-dollar labs. NanoChat changes that by giving developers a transparent, affordable way to build their own ChatGPT-style model from scratch.
9 mins read
Nov 03, 2025
Share

For most of us, training a large language model has been an opaque box. It’s something that happens behind the closed doors of big tech, requiring millions of dollars and massive GPU clusters. We can utilize the final product through an API, but the intricate, end-to-end process of transforming raw web text into a conversational agent has remained completely out of reach for individual developers.

Andrej Karpathy’s new project, NanoChat, is a direct challenge to that status quo.

For about $100 in cloud credits, he’s open-sourced not just a model, but the entire LLM factory, giving us the complete blueprint for building a chatbot from scratch. This project demystifies the creation of user experiences provided by ChatGPT. It finally provides developers with a tangible, hackable, and affordable way to learn how these powerful systems are actually built.

What is NanoChat?#

NanoChat is an open-source LLM chatbot system created by Andrej Karpathy. It implements a basic ChatGPT-like model and interface in one slim package. The project’s tagline is, the best ChatGPT that $100 can buy, meaning you can train a toy chatbot on just $100 of cloud compute. It runs on a single multi-GPU server (e.g., 8 NVIDIA H100 GPUs) and performs the full pipeline end-to-end. Everything, from data processing to training and inference, is done by a single script.

In practice, NanoChat is a small neural network (~1–2 billion parameters) that generates text. For example, the demo model d32 has 32 Transformer layers and ~1.9 billion parameters, trained on ~38 billion words of web text. This is enough to surpass old models like GPT-2, but it’s far smaller than today’s giants.

How NanoChat works#

At a high level, NanoChat follows the same recipe as big LLMs, but on a much smaller scale. Here’s the high-level pipeline:

  • Data and tokenization: NanoChat uses a large text corpus (web pages) called FineWeb-EDU for training. First, it learns a vocabulary: a custom Rust-based tokenizer quickly breaks text into tokens (pieces of words). This Rust tokenizer is lean and fast (preferred over bulky libraries) and compresses text efficiently for training.

  • Pretraining: Next, the model is trained in the usual unsupervised way: it scans billions of words and learns to predict the next token. The core model is a Transformer neural network (like GPT). The default NanoChat “d32” with ~1.9 billion parameters can be trained in ~4 hours for around $100 or ~33 hours on 8×H100, costing about $800 for a larger run. This stage typically takes the most time.

  • Fine-tuning (chat mode): After base training, NanoChat is fine-tuned on conversation data to make it chatty. The code loads dialogue datasets (such as 460K example conversations) so that the model learns multi-turn chat patterns. At this point, it can answer questions and continue a conversation, albeit simply and with errors (much like a well-meaning novice).

  • Optional RL and tools: The pipeline even includes optional reinforcement learning (PPOProximal Policy Optimization: An on-policy reinforcement learning algorithm that stabilizes training by limiting how much the policy can change in each update, balancing exploration and reliability./GRPOGenerative Proximal Policy Optimization: An adaptation of PPO for large language models that optimizes generation quality and alignment by combining reinforcement learning with generative objectives.) steps to refine behavior. There’s also an efficient inference engine that caches past tokens (KV cachingKey-Value caching: A technique used in transformer models to store previously computed key and value tensors from attention layers, so the model doesn’t need to recompute them for every new token.) for faster generation, and it can sandbox basic tools, such as a Python calculator, for math questions.

  • Serving the model: Once trained, NanoChat launches a tiny web UI for chatting. A built-in Python script lets you type questions in a browser and see the model’s answers. It all lives in ~8,000 lines of readable PyTorch code.

Where NanoChat stands in the LLM landscape#

NanoChat is small compared to the latest commercial models. Its base model has only a couple of billion parameters, whereas state-of-the-art assistants like ChatGPT (GPT-5) and Google’s Gemini run hundreds of billions of parameters on massive clusters. The NanoChat model is roughly equivalent to GPT-2 in caliber, or slightly better. 

Due to this size gap, NanoChat’s performance lags. It often hallucinates or goes off track. It can’t match the depth, knowledge, or coherence of ChatGPT or Gemini. Those big models also have features that NanoChat lacks, for example, they can handle images or connect to search engines in real time, whereas NanoChat is purely text-based. 

On the upside, NanoChat is completely open and hackable. You can inspect every parameter and even tweak the training process. By comparison, you can’t peek inside ChatGPT or ask Google how Gemini is built. For enthusiasts and researchers, this openness is gold: it lets anyone experiment with the entire LLM life cycle.

Capability/dimension

NanoChat

ChatGPT

Gemini

Accessibility/cost

Designed to train on ~US$100 in 4 hrs on 8×H100 node. 

Commercial service; large scale behind it (hundreds of millions of users).

Commercial, multimodal, high-end model from Google/DeepMind. 

Codebase/hackability

Minimal, hackable, dependency-lite full stack implementation. 

Opaque model for most users (via API/ChatGPT interface).

Proprietary, advanced capabilities, less build your own focus.

Scale/performance

1.9B parameters in published example; falls dramatically short of modern large models. 

Very large models, strong benchmarks (reasoning, language generation).

Advanced architecture, multimodal (text+image+…) and large-context capability.

Use case/niche

Learning, experimentation, building your own small LLM, low cost prototyping.

General-purpose chat assistant, commercial usage, consumer and enterprise.

Cutting edge, multimodal, large context, high capability tasks.

Limitations

Lower accuracy, more hallucinations, smaller scale compared to state-of-the-art. 

Cost per usage, less transparency into internals.

Likely higher cost, less accessible to self-hosting/training personally.

Ideal for

Developers, researchers, hobbyists who want to understand/train LLMs.

End-users, businesses who want top performance but don’t need full transparency.

Use-cases requiring multimodality, highest capability, enterprise scale.

The table highlights NanoChat’s core purpose: it’s a learning tool, not a production-ready assistant. Its value is in letting you build and understand the mechanics, not just consume the output.

Agentic features and tool integration#

While NanoChat isn’t a sophisticated agent platform out of the box, it does include some neat features for tool use and research:

  • Tool use (calculator): The chat engine can sandbox run Python code. For example, if you ask a math question, the model can actually execute a little Python calculator function behind the scenes. This means it can give exact math answers rather than just guessing.

  • Efficient inference: The code supports fast generation with things like key-value (KV) caching and parallel sampling, keeping the chat snappy. It’s built to be lean, so even on limited hardware, it does reasonably well.

  • Reinforcement learning hook: The pipeline has an optional Proximal Policy Optimization (PPO) step for fine-tuning the model on conversational feedback. 

Note: This is more of an advanced feature; most users can skip it.

  • Progress report: After each training run, NanoChat produces a markdown report card summarizing its scores on benchmarks (like math tests or common sense tasks). You literally see a table of metrics and how much time/cost each phase took, which is a transparency feature for curious learners.

  • Extensible design: Because it’s a simple codebase (no heavyweight frameworks), researchers can add their own tools if desired. For example, you could hook up a web search API or a database query during a chat session by modifying the code. In other words, the design doesn’t prevent multi-agent or retrieval-augmented tricks–it just doesn’t include them by default.

In summary, NanoChat’s agentic toolkit is minimal but illustrative. It demonstrates how a model might use a calculator or be fine-tuned, without the hidden magic of proprietary agents.

Implications and use cases#

NanoChat shines as an innovative educational and experimental tool. It lets students, hobbyists, and researchers peek under the hood of chatbots. Instead of treating ChatGPT as a magic box, NanoChat lets you see and modify each step of the pipeline. It’s the first public example of a full ChatGPT-like pipeline that anyone can run on a weekend budget.

Practical uses include: teaching AI, prototyping ideas, or building small-scale assistants. For instance, a small company could fine-tune NanoChat on its own documents to create a private FAQ bot. A student could experiment with changing the model’s code or data. Creatively, you might ask it to write poetry, brainstorm code snippets, or just play around with conversations—learning how and why it succeeds or fails. (Of course, it will make silly mistakes and confidently hallucinate at times, so it’s best used for low-stakes tasks.)

At its best, NanoChat is a stepping-stone: it reveals the mechanics of LLMs. By training and chatting with this toy model, people can develop insights that apply to bigger AI systems. It also underscores the value of openness: you can inspect, fork, and own this model, unlike closed APIs. Ultimately, NanoChat's real contribution isn't its conversational ability, but its power to showcase how these models are built from the ground up.

Next steps#

NanoChat is brand new, and the ecosystem is just getting started. Key things on the horizon:

  • Scaling up: A $300 run (depth-26) is planned to hit roughly GPT-2 level, and a $1000 run to approach GPT-3 Small. These bigger models should write better code, solve math problems, and ace more tests.

  • Community hacks: The community is already innovating. For example, someone ported NanoChat to run in a web browser using WebGPU meaning you can chat with it locally on an M1/M2/M4 Mac or Windows PC without special servers. Others may try running it on single GPUs, quantizing it for smaller machines, or adding plugins (like searching or specialized knowledge bases).

  • Ongoing improvements: Expect updates to the code (faster training, bug fixes, etc.) and possibly new demo instances. Keep an eye on the NanoChat GitHub discussions for announcements.

In essence, the story is just beginning. NanoChat is a living project, and its next chapters will show how a community can build upon a simple, transparent LLM foundation.

Get started#

Ready to try NanoChat? It’s all on GitHub under karpathy/nanochat. Here’s a quick start:

  1. Get a GPU machine: You’ll need NVIDIA GPUs. The recommended setup is an 8×H100 instance (e.g., from Lambda Labs or a cloud provider).

Clone and install: SSH into your box, then:

git clone https://github.com/karpathy/nanochat.git 
cd nanochat
  1.  Install Python and dependencies (uv sync in the repo, as in instructions).

Run the speedrun script: This single command does all training stages for the $100 model:

bash  speedrun.sh
  1.  It will run four hours on an 8×H100 GPU (at an estimated cost of about USD 24 per hour). Let it finish (you can screen-log it).

Launch the chat UI: After training, start the server with:

python -m scripts.chat_web
  1.   This will print a URL (e.g., your server’s IP with port 8000). Open it in a browser, and chat with your NanoChat model just like ChatGPT.

If you don’t have GPUs handy, you can try a demo. A pre-trained d32 NanoChat model is hosted for free at nanochat.karpathy.ai. Just open the link to chat with the model via your browser.

A new way to learn AI#

NanoChat has created a shift in the entire chatbot perspective. It proves that the fundamental concepts of modern AI are no longer the exclusive domain of billion-dollar labs. For the first time, any developer with a bit of curiosity and a weekend budget can experience the entire LLM life cycle, from raw text to a live chat interface.

The real value of NanoChat comes from getting your hands dirty. Clone the GitHub repository, dig into the PyTorch code, and if you have access to a GPU, train your own model. Seeing a model learn from scratch is an insight you can’t get from an API call.

Once you’ve peeked behind the curtain, you’ll be in a much better position to build your own AI applications. If this project inspires you to take the next step, here are two courses that will guide you through creating a polished, production-ready application using established APIs.

As NanoChat includes its own simple web UI, you might be curious about building more sophisticated frontends. Our course on how to create a chatbot with streaming in Streamlit will show you how to wrap your models in a responsive, shareable web app.

Ultimately, NanoChat is a launchpad. As Karpathy says: it’s not here to compete, it’s here to teach. So go on, get your hands dirty and start building.


Written By:
Fahim ul Haq
The AI Infrastructure Blueprint: 5 Rules to Stay Online
Whether you’re building with OpenAI’s API, fine-tuning your own model, or scaling AI features in production, these strategies will help you keep services reliable under pressure.
9 mins read
Apr 9, 2025