Zero-Shot vs. Few-Shot Prompting
Explore how to choose between zero-shot, one-shot, and few-shot prompting strategies to optimize large language model responses. Understand when to add examples to prompts based on task complexity and output requirements. Learn why the quality of examples, including correctness, representativeness, and consistency, critically impacts model performance.
We'll cover the following...
In the previous lesson, you learned four instruction-clarity techniques that push zero-example prompts as far as they can go: positive commands, explicit formats, scope boundaries, and assumption elimination. Those techniques work remarkably well for straightforward tasks. But some tasks need the model to see what a correct output looks like before it can reliably produce one. Consider classifying customer-support tickets into categories like “billing,” “account access,” and “technical.” You could write a perfectly clear instruction, yet the model might still label tickets inconsistently because it has never seen your specific category definitions applied to real examples. This is where the number of examples you include in a prompt becomes a strategic decision.
Three prompting strategies sit along this spectrum. Zero-shot prompting sends only an instruction with no demonstration. One-shot prompting includes a single input–output pair alongside the instruction. Few-shot prompting provides two or more input–output pairs. Each strategy trades off token cost against output reliability, and choosing the right one depends on how familiar the task is to the model and how strict your formatting requirements are. By the end of this lesson, you will know which strategy to reach for in a given situation and why the quality of your examples matters as much as their quantity.
How zero-shot prompting works
When you send a zero-shot prompt, the model receives nothing but your task description. It generates a response by relying entirely on patterns absorbed during pretraining and
When zero-shot is the right choice
Zero-shot prompting works best for tasks the model has encountered extensively during training. Translation, general summarization, simple question answering, and broad sentiment classification all fall into this category. Modern instruction-tuned models such as GPT-4 and Claude perform surprisingly well on these tasks without any examples at all.
The limitation surfaces when tasks become domain-specific, ambiguous, or format-sensitive. Without examples, the model fills gaps with its own defaults, which connects directly to the implicit-assumptions problem from the previous lesson. A zero-shot sentiment-classification prompt like “Classify the sentiment of this review as positive, negative, or neutral” usually works. But asking the model to classify legal clauses into specialized categories without examples often produces inconsistent labels.
Practical tip: AWS documentation on SageMaker JumpStart foundation models recommends starting with zero-shot and escalating to few-shot only when accuracy metrics such as precision, recall, and F1 fall short. This saves tokens and keeps latency low.
The following table summarizes the three strategies, their definitions, ideal use cases, and the primary risk each one carries.
Comparison of Prompting Strategies
Prompting Strategy | Definition | Best Suited For | Key Risk |
Zero-Shot | Providing the model with only instructions, without any examples | General tasks like translation, simple classification, and general summarization | Output format drift—responses may vary in structure or style |
One-Shot | Accompanying the instruction with a single input-output example | Tasks where format or label set is unfamiliar but the pattern is simple | Model may overfit to the single example's style |
Few-Shot | Providing two to five input-output examples alongside the instruction | Domain-specific, nuanced, or format-critical tasks requiring consistency | Poor example quality can propagate errors; increased token costs |
With the zero-shot baseline established, the next step is understanding what changes when you add one or more examples to the prompt.
One-shot and few-shot mechanics
How one-shot prompting works
In one-shot prompting, you prepend a single input–output pair before your actual query. The model uses
How few-shot prompting works
Few-shot prompting extends this idea by providing two to five examples, giving the model a richer pattern to generalize from. This approach is especially powerful for tasks with subtle distinctions. Returning to the support-ticket scenario, distinguishing “billing” from “account access” from “technical” requires the model to see representative examples of each category.
There is a practical trade-off to keep in mind. Each example consumes tokens from the
Diminishing returns typically set in after five to eight examples. Adding a tenth example rarely improves output quality and may even introduce noise if the additional example is lower quality than the rest. The next lesson will cover exactly how to select and format those examples for maximum impact.
Note: Few-shot does not mean many-shot. Keeping your example count between two and five strikes the best balance between pattern richness and token efficiency for most tasks.
The following diagram illustrates how prompt structure differs across the three strategies.
Understanding the structure is one thing. Knowing when to move from one strategy to the next is where the real skill lies.
When to escalate from zero to few-shot
A decision framework
The most efficient approach is to treat zero-shot as your default baseline. If the output already meets your accuracy and format requirements, stop there. It is the cheapest and fastest option. Escalation should be driven by measurable shortcomings, not by habit.
The following decision path covers the most common scenarios:
Zero-shot output has correct content, but the wrong format. Escalate to one-shot. A single example that demonstrates the exact output structure usually resolves formatting drift without significant token overhead.
One-shot output overfits to the single example’s phrasing. Escalate to a few shots. Two or three diverse examples break the model’s tendency to mimic one particular style.
The task involves domain-specific distinctions or edge cases. Start with a few shots. Tasks like classifying legal clauses as “liability,” “indemnity,” or “termination” require the model to see concrete boundaries between similar categories.
Output consistency across repeated calls is critical. Use a few-shot. Batch inference pipelines, such as those running on SageMaker, benefit from the anchoring effect of multiple examples that reduce variance between calls.
Metrics that guide the decision
Key metrics to monitor when evaluating whether to escalate include accuracy, precision, recall, and
Consider a concrete scenario. A team classifies legal clauses into three categories. Zero-shot confuses “liability” and “indemnity” because the two concepts overlap in general language. One-shot helps, but the model overfits to the single example’s phrasing and misclassifies clauses worded differently. Three-shot with diverse examples achieves a stable F1 above 0.90 across repeated runs.
Attention: Jumping straight to few-shot without testing zero-shot first wastes tokens and can introduce noise if examples are poorly chosen. Always validate the simpler strategy before escalating.
The following quiz checks your understanding of when each strategy is appropriate.
Lesson Quiz
A model already produces accurate two-sentence summaries of English news articles using only an instruction. What is the most token-efficient prompting strategy to continue using?
Zero-shot with a clear instruction
One-shot with a single example
Few-shot with five examples
Few-shot with ten examples
With the escalation framework in place, there is one more critical factor that determines whether few-shot prompting actually helps or backfires.
How example quality shapes output quality
The number of examples matters far less than their quality. A prompt with three excellent examples will outperform a prompt with eight mediocre ones. The model treats every example as ground truth and replicates whatever patterns it finds, including mistakes.
Example quality breaks down along three dimensions:
Correctness. The output in each example must be genuinely accurate. A single incorrect label in a few-shot prompt can poison the entire output because the model faithfully reproduces the error. This is the garbage-in-garbage-out principle applied to in-context learning.
Representativeness. Examples should cover the range of inputs the model will encounter in production. If all your examples feature short, simple inputs, the model may struggle with longer or more complex ones.
Consistency. Every example must follow the same format and style. If one example classifies a ticket as “billing issue” and another uses “billing_issue,” the model may oscillate between formats unpredictably.
Here is a concrete illustration. Suppose you provide three examples for support-ticket classification, but one example labels a password-reset request as “billing” instead of “account access.” The model now has conflicting signals. It may classify similar tickets as “billing” in some runs and “account access” in others, destroying the consistency you were trying to achieve by using few-shot in the first place.
Practical tip: Before adding an example to your prompt, verify it against the same criteria you would use to evaluate the model’s output. If the example would not pass your quality bar as an output, it should not be in your prompt.
Even in few-shot prompts, the instruction portion must still use positive commands, explicit format specifications, and scope boundaries. Examples complement instructions. They do not replace them. The next lesson dives deep into selecting, formatting, and sequencing examples, so this section focuses on establishing why quality is the prerequisite.
The following mind map summarizes the full landscape of prompting strategies and the factors that influence your choice.
Conclusion
Zero-shot prompting leverages the model’s pretraining and is ideal for well-defined, common tasks. One-shot clarifies format or label expectations with minimal token overhead. Few-shot anchors the model to a reliable pattern for domain-specific or consistency-critical tasks. The escalation principle is straightforward: start with zero-shot and add examples only when measurable metrics justify the cost. Above all, example quality along the dimensions of correctness, representativeness, and consistency determines whether few-shot prompting helps or actively hurts your results. In the next lesson, you will learn actionable techniques for selecting examples that cover your input distribution, formatting them consistently, and sequencing them to guide the model toward the output style you need.