As a software engineer turned CEO, I’ve watched our craft evolve in leaps and bounds. We’re now entering an era where you can literally talk to your codebase. And not in the sense of some sci-fi metaphor — but engaging in genuine dialogue with an AI about your code.
In this newsletter, I’ll walk you through how we got here, what challenges to expect, and how to make the most of your new coding companion. We’ll journey from the old days of writing every line of code by hand (Software 1.0), through the rise of neural networks (Software 2.0), to today’s world of conversational programming powered by large language models: Software 3.0. Along the way, I’ll share practical tips and a candid look at the tools and models leading this new frontier.
Think of Software 1.0 as traditional programming: humans write explicit code, line by line, like chefs following a precise recipe. This was the only game in town for decades. Then came Software 2.0, which Andrej Karpathy famously described as “the beginning of a fundamental shift” where we train neural networks instead of writing all the code ourselves. In Software 2.0, developers curate data and let the computer learn the program (for example, training a model to recognize images rather than coding all the rules). Here, the “code” exists as model weights and parameters, which we don’t directly write — we optimize or “grow” them using data.
Now we stand at Software 3.0, the dawn of prompt-driven development.
Karpathy calls this “a fundamental change” because neural networks have become programmable with large language models. In this new era, your prompts are programs. Instead of coding logic line by line in Python or C++, you can write instructions in English and guide the AI to generate the code. “Remarkably, these prompts are written in English... we’re now programming computers in English,” Karpathy noted in a recent talk. It’s as if the computer’s programming language suddenly became our native language — English (or any human language), not just 0s, 1s, or strictly defined syntax.
Imagine solving a problem like sentiment analysis across different software eras to visualize evolution.
In Software 1.0, you’d manually code a solution (perhaps a lot of if/else rules).
In Software 2.0, you’d train a model on labeled examples.
In Software 3.0, you can simply prompt an LLM: “Here are a few example sentences with sentiments, now classify new sentences.” The LLM, pretrained on vast knowledge, will follow your prompt program to do the task.
This new paradigm doesn’t replace the old ones entirely—it augments them. In practice, all three paradigms coexist. As Karpathy puts it, Software 3.0 is starting to eat Software 1.0 and 2.0—a huge amount of existing software might be rewritten with AI-driven approaches.
But each approach has its strengths. Smart developers (and engineering managers) will want fluency in all three, choosing, depending on the problem, whether to write code, train a model, or prompt an LLM.
And sure, Software 3.0 can sound like a buzzword. But it’s genuinely a new way of thinking about programming. You can talk to a computer about your code problem, and it responds with working code or explanations. It’s like having an extremely knowledgeable, somewhat quirky pair programmer who speaks plain English (and loves an emdash). This brings amazing productivity potential, especially when working with large codebases, but it also comes with new challenges. Let’s talk about those next.
LLMs are incredibly smart in some ways; they’ve read all of GitHub and Stack Overflow from front to back, but occasionally can't seem to recall a conversation just minutes prior. We must understand these limitations and use strategies to compensate for them effectively.
An LLM operates within a fixed context window, meaning it can only “remember” a certain amount of text (prompt + code) simultaneously. This context window might be a few thousand tokens for some models, or even up to 100k+ tokens for other cutting-edge models. For instance, Google’s Gemini reportedly supports around 1 million tokens in special cases. That sounds huge, but consider that a real codebase can be millions of tokens: tens of thousands of lines across many files. You cannot feed a large codebase into the prompt — even models with 1 million-token contexts would choke on 10 million code tokens.
And even if you could stuff everything in, it’s not efficient or effective. Models struggle to sift relevant information from giant context dumps (or the lost in the middle problem) and may perform worse when overloaded. Research has shown that providing fewer, relevant context pieces yields better answers than dumping a whole wiki on the model. In short, LLMs are like a very smart but very forgetful co-worker. They can’t form new long-term memories of your project beyond what’s in their current input. If you tell them information in one session, they won’t recall it later unless given again. You must work within their short-term memory and occasionally remind them of important details.
LLMs don’t truly understand code or documentation; they predict the most likely text. This means they sometimes generate or hallucinate plausible-sounding but incorrect code or text. They may call a function that doesn’t exist, or use an API incorrectly (but supremely confidently). They may also misinterpret your instructions if those are vague. Remember, an LLM is not infallible — Karpathy calls them “fallible people spirits” with superhuman knowledge and bizarre unknown spots. So, they may solve a tough algorithm one minute and then fail to compare two numbers correctly the next. Knowing this, we must keep an eye on their outputs.
So, how can we help LLMs better work with our code?
One solution is using the llm.txt file, a new convention emerging in the developer community. An llm.txt is a concise, LLM-friendly text summary of your project or documentation, placed in your repo (or website) so an AI agent can easily ingest it. Think of it as a cheat sheet for AI.
It’s like robots.txt (which guides web crawlers), but for AI models. A properly written llm.txt might include a high-level overview of the project, key classes/functions, important context, and links to more detailed documentation. By providing this, you ensure the LLM doesn’t have to guess how your codebase is structured or what a custom term means; you tell it up front in a format it can easily parse plain text or Markdown, without extra HTML fluff.
For example, the Svelte web framework added an official llms.txt to help LLMs learn how to use Svelte 5, because most models were trained before it was released. By reading llms.txt, the AI can access the latest accurate information.
Many projects are now auto-generating these files. One tool, GitIngest, can take any GitHub repo and output a single text file digest of its contents — essentially generating a sort of llm.txt for that codebase. Copying that text into your AI chat or feeding it as context gives the model a memory boost for understanding your codebase. This way, the AI has a broad map of the project.
Here are two key strategies to use these files:
The simplest way to use an llm.txt file is to feed it directly into your chat with an LLM. You copy the file’s contents and paste them into the prompt, right before you start asking questions. This method primes the model, loading its short-term memory (the context window) with all the essential information about your project. Think of it as handing an expert a one-page briefing before a meeting. They now have the key names, concepts, and relationships in mind.
This is highly effective for smaller projects where the llm.txt file is manageable and can fit comfortably within the model’s context window (e.g., under 128,000 tokens for models like GPT-4o).
You would start your conversation with a prompt like:
“I will provide you with an llm.txt file that summarizes my codebase. Please use this as the primary source of truth for all my following questions.”
But what happens when your codebase is massive, and even its summary runs into hundreds of thousands or millions of tokens? The direct approach fails because you can’t fit it all into the prompt. This is where retrieval-augmented generation (RAG) comes in. Instead of stuffing the entire document into the LLM simultaneously, a RAG system acts as a smart librarian. It keeps the llm.txt (and potentially all your other documentation) in a searchable database.
When you ask a question, the system retrieves only the most relevant snippets of text from the documentation, and then you can simply feed just those snippets to the LLM along with your question. This is like giving the AI selective X-ray vision into your project — it can zoom in on the two or three files it needs for a task, rather than fumbling through everything. NotebookLM, for instance, lets you upload documents (or connect to Google Docs); it will use retrieval to answer questions by pulling information from those docs.
This approach is essential for large-scale projects. It ensures the LLM gets highly relevant, targeted information without exceeding its context limits, leading to more accurate answers and preventing it from hallucinating details about unseen parts of the code.
In practice, tools like Claude often combine both strategies: a static llm.txt for general knowledge, plus on-the-fly retrieval for specific details. As a developer, you can contribute by maintaining those resources (keep your llm.txt updated with important changes, for example) and understanding when to manually provide context to the AI. If the AI seems confused about a function, you can paste in that function’s code or comments to ground it.
When I first started using AI to assist with code, I fell into the trap of treating it like an oracle: ask a question, get an answer, done. In reality, coding with an AI is much more interactive, almost conversational. The community, including AI luminaries like Karpathy, coined the term vibe coding. What does that mean?
Simply put, vibe coding is an iterative, interactive way of programming where you guide the AI with prompts, get output, refine it, run it, debug, and loop — all in a back-and-forth dialogue. It’s coding in flow, led by intuition, curiosity, and constant feedback, rather than writing a full spec or algorithm upfront. In vibe coding, you don’t stop at one prompt; keep the conversation going until the code does what you need.
Karpathy jokingly described vibe coding as “the latest paradigm where you essentially stop typing code line by line” and “guide an AI model with a clear idea of what you want and let it work its magic.” It’s like delegating tasks to an extremely eager intern:
Imagine you’re not the chef meticulously chopping vegetables; you’re the restaurant owner instructing the waiter (the API) to tell the kitchen (the AI) how you want the dish prepared.
In other words, you focus on what you want, and the AI determines how to do it in code. But you are still in charge. The AI (your intern) can work tirelessly and produce a lot, but you must direct it clearly and check its work. Vibe coding doesn’t mean just vibing on the couch and letting the AI run the show — it means you and the AI collaborate, each bringing your respective strengths. You bring the vision, the high-level strategy, and the critical eye; the AI brings speed, knowledge, and endless patience for boilerplate and refactoring.
So, how do you prompt effectively in vibe coding? Here are some prompting techniques and tips to make your AI-assisted coding sessions productive:
Start with a plan (even a loose one): It helps to outline what you want to build or solve before diving in. You might write a brief in natural language: “I want to create a function that takes a CSV of transactions and outputs a summary of totals per category. It should handle errors like missing values gracefully.” This acts like a mini spec for the AI. In vibe coding, you can even give this outline to the model at the start. Clear goals lead to better results.
One step at a time: If the program is complex, resist asking for it all in one prompt. Instead, break the tasks down. For example: “First, read the CSV and just show me the data structure you’d use to store it.” Once that’s done, then ask for the summary logic. This incremental approach fits within context limits and makes it easier to pinpoint where the AI went wrong if something fails. You can think of it as test-driven development, but in dialogue form. Each prompt is like a new test or requirement.
Use iterative loops: A common vibe coding loop is: AI generates code → you run/test it → you feed errors back to AI → AI fixes code. Embrace this loop. For instance, you get an error “Index out of range”— tell the AI: “I ran that function and got an index out of range error. Can you fix that?” This is amazingly effective. The model will typically diagnose the problem (maybe it assumes 1-indexing) and adjust the code. It’s like having a junior programmer where you act as QA: you don’t fix the bug yourself, you point it out and let them try.
Be conversational and precise in prompts: Talk to the model as you would to a human collaborator. Provide reasoning or constraints: “Let’s use a dictionary to accumulate the totals, since that will be efficient for lookups. Also, handle if a category is missing by treating it as ‘Unknown’.” The AI can follow instructions like this well. If it misunderstands, clarify or rephrase. I often find that saying “In the previous code, you did X. Instead, do Y.” is effective to correct a specific detail. Think of steering a very literal-minded colleague — no need for terse code-speak, just say it plainly.
Ask for explanations or alternatives: You can prompt the AI to produce code and explain it. For learning purposes or when something is complex, ask: “Can you explain how this function works?” or “Give me the same function but using a different approach (e.g., using a library function if available).” This educates you and sometimes reveals whether the AI “thinks” the solution is fragile or could be improved. Remember, the AI doesn’t think but can repeat common best practices if prompted.
Maintain control with system messages or rules: Many AI coding tools allow a system prompt — a place where you can set ground rules. Use this to your advantage. For instance, you could preface the chat: “You are an expert Python developer. Follow PEP8 style. Don’t use deprecated libraries. Always include a brief comment for clarity.” This sets a tone. I often include a rule like “If you are unsure of an answer, ask me for clarification rather than guessing.” It doesn’t always prevent hallucinations, but it can reduce overly confident, wrong answers. Essentially, you can encode the AI’s coding standards or desired personality in these initial instructions.
Keep an eye on quality: After a few rounds, the AI code might solve the problem but end up messy (maybe from iterative patches). Don’t hesitate to say: “Great, it works. Now, please refactor this for clarity and efficiency.” Often, the AI will simplify the code or make the naming more consistent. Likewise, ask for edge case tests: “Write a quick test for the scenario where the CSV is empty.” By prompting for these, you enforce good development practices even in the AI’s output.
The overarching vibe here (pun intended) is to treat coding as a conversation. You wouldn’t expect a new hire to produce a perfect module from a one-line request. You’d discuss requirements, review drafts, point out mistakes, and iterate. AI is similar, albeit much faster in producing that first draft. And infinitely patient in doing exactly what you ask — even if that is a mistake.
Before we move on, it’s worth noting that vibe coding, while powerful, is still an emerging practice. Some critics point out that this can lead to superficial understanding — you might get things working through trial and error with the AI without truly knowing what’s under the hood. There’s truth to that concern. I advise using vibe coding to accelerate the grunt work, but always take time to understand the code your AI co-pilot produces. Read the explanations, follow the logic, and ensure it aligns with your mental model. That way, you get the best of both worlds: rapid development and learning.
By now, you might be wondering which AI model you should use to talk to your codebase. As of mid-2025, three models dominate most benchmarks and IDE integrations: OpenAI’s GPT-4o (and its incremental updates such as GPT-4.1), Anthropic’s Claude 4, and Google’s Gemini (for example, Gemini 2.5 Pro). Each engine has its strengths and trade-offs, yet from my personal experience, one stands out for raw coding accuracy: Claude.
Claude has been the gold standard for coding assistance since its release. It’s widely regarded as the most accurate and reliable model for producing correct code and reasoning through problems. It’s great at producing code in one go, but also particularly good at fixing mistakes in its code when prompted to review. It’s excellent at understanding nuanced questions and following step-by-step instructions.
If I have a tricky algorithm or debugging task, Claude 4 Sonnet is my go-to because it’s more likely to get it right or at least break down the solution. Claude 4 Opus, an iterative improvement, brings some optimizations and minor quality boosts, but fundamentally, it’s an evolution of the same model with some additional quirks. While browsing my X feed, I read that a senior software engineer spent almost two weeks trying to fix a bug and could not resolve it even with the help of models like GPT and Gemini, until they used Claude, which resolved it in one go!
At the tool layer, you’ll likely encounter these models through editors such as GitHub Copilot (which now lets you toggle between GPT-4o, Claude 4, and Gemini 2.5) or Cursor (whose “Max Mode” can switch any of those models into a 200K token context). Think of Copilot or Cursor as cockpits; GPT, Claude, and Gemini are the engines you slot into them.
Before moving on, a quick reality check: Remember the earlier caveats, no matter which model you use. All of them can and will make mistakes. They will occasionally say things that sound confident but aren’t quite right. Always keep yourself in the loop and use tests, code reviews, and your knowledge to validate their outputs. The good news is that they are improving with each version. Rumors for the near future talk about models that combine the best of both worlds—huge context, great accuracy, and speed. We’re not quite there yet, but we’re on a fast train of progress.
On online forums, you will find developers sharing anecdotes like “I loaded my entire codebase into an AI and found a bug that would’ve taken me days to track down” or “I can onboard a new team member by literally having them chat with our codebase AI, it explains parts of the code and architecture on the fly.” There’s an undercurrent of giddiness surrounding what many of us dreamed about: the code explaining itself, saving hours of digging through unfamiliar repositories.
However, alongside the hype, there’s healthy skepticism and some memes. One popular theme is to avoid becoming too reliant on AI to code or understand for you. The vibe coder jokes we touched on reflect a worry that developers might cargo-cult solutions without truly grasping them. Experienced engineers still caution that you can’t completely vibe-code a complex system, at least not yet.
You still need to build a solid understanding.
Building something critical requires structured, prompt engineering and testing, not just loosey-goosey prompting for everything. Essentially, it’s a reminder that software engineering principles still apply to this new context. Good system design, clear requirements, and testing are as important as ever. AI doesn’t eliminate them; it just changes how we arrive at them.
The future of software might well be written in a mix of code and conversations. And as someone deeply passionate about education and empowering developers, I find that very exciting. So grab your AI teammate and talk to your codebase! Who knows — your code might just have a few things to say.
If you’re ready for more on AI-assisted coding, check out our courses on co-pilots and prompt engineering. They’re packed with all sorts of practical examples to get you comfortable with these AI techniques.
Happy coding, and see you on the frontier of Software 3.0!