Zero-Shot vs. Few-Shot Prompting

Explore how to choose between zero-shot, one-shot, and few-shot prompting strategies to optimize large language model responses. Understand when to add examples to prompts based on task complexity and output requirements. Learn why the quality of examples, including correctness, representativeness, and consistency, critically impacts model performance.

We'll cover the following...

How zero-shot prompting works
- When zero-shot is the right choice
One-shot and few-shot mechanics
- How one-shot prompting works
- How few-shot prompting works
When to escalate from zero to few-shot
- A decision framework
- Metrics that guide the decision
How example quality shapes output quality
Conclusion

In the previous lesson, you learned four instruction-clarity techniques that push zero-example prompts as far as they can go: positive commands, explicit formats, scope boundaries, and assumption elimination. Those techniques work remarkably well for straightforward tasks. But some tasks need the model to see what a correct output looks like before it can reliably produce one. Consider classifying customer-support tickets into categories like “billing,” “account access,” and “technical.” You could write a perfectly clear instruction, yet the model might still label tickets inconsistently because it has never seen your specific category definitions applied to real examples. This is where the number of examples you include in a prompt becomes a strategic decision.

Three prompting strategies sit along this spectrum. Zero-shot prompting sends only an instruction with no demonstration. One-shot prompting includes a single input–output pair alongside the instruction. Few-shot prompting provides two or more input–output pairs. Each strategy trades off token cost against output reliability, and choosing the right one depends on how familiar the task is to the model and how strict your formatting requirements are. By the end of this lesson, you will know which strategy to reach for in a given situation and why the quality of your examples matters as much as their quantity.

How zero-shot prompting works

When you send a zero-shot prompt, the model receives nothing but your task description. It generates a response by relying entirely on patterns absorbed during pretraining and instruction tuninga post-training process where a model is fine-tuned on instruction–response pairs so it can follow natural-language commands more reliably. Think of it like asking a well-read colleague to summarize an article. They already know what a summary looks like, so you do not need to show them one first.

When zero-shot is the right choice

Zero-shot prompting works best for tasks the model has encountered extensively during training. Translation, general summarization, simple question answering, and broad sentiment classification all fall into this category. Modern instruction-tuned models such as GPT-4 and Claude perform surprisingly well on these tasks without any examples at all.

The limitation surfaces when tasks become domain-specific, ambiguous, or format-sensitive. Without examples, the model fills gaps with its own defaults, which connects directly to the implicit-assumptions problem from the previous lesson. A zero-shot sentiment-classification prompt like “Classify the sentiment of this review as positive, negative, or neutral” usually works. But asking the model to classify legal clauses into specialized categories without examples often produces inconsistent labels.

Practical tip: AWS documentation on SageMaker JumpStart foundation models recommends starting with zero-shot and escalating to few-shot only when accuracy metrics such as precision, recall, and F1 fall short. This saves tokens and keeps latency low.

The following table summarizes the three strategies, their definitions, ideal use cases, and the primary risk each one carries.

Comparison of Prompting Strategies

Prompting Strategy	Definition	Best Suited For	Key Risk
Zero-Shot	Providing the model with only instructions, without any examples	General tasks like translation, simple classification, and general summarization	Output format drift—responses may vary in structure or style
One-Shot	Accompanying the instruction with a single input-output example	Tasks where format or label set is unfamiliar but the pattern is simple	Model may overfit to the single example's style
Few-Shot	Providing two to five input-output examples alongside the instruction	Domain-specific, nuanced, or format-critical tasks requiring consistency	Poor example quality can propagate errors; increased token costs

With the zero-shot baseline established, the next step is understanding what changes when you add one or more examples to the prompt.

One-shot and few-shot mechanics

How one-shot prompting works

In one-shot prompting, you prepend a single input–output pair before your actual query. The model uses in-context learningthe ability of a large language model to infer a task pattern from examples provided within the prompt itself, without any weight updates or fine-tuning to detect the pattern in that pair and apply it to the new input. One-shot is sufficient when the task pattern is straightforward but the desired output format or label vocabulary is non-obvious. For instance, if you need the model to output JSON with specific field names, a single example often anchors the format more effectively than a paragraph of format instructions.

How few-shot prompting works

Few-shot prompting extends this idea by providing two to five examples, giving the model a richer pattern to generalize from. This approach is especially powerful for tasks with subtle distinctions. Returning to the support-ticket scenario, distinguishing “billing” from “account access” from “technical” requires the model to see representative examples of each category.

There is a practical trade-off to keep in mind. Each example consumes tokens from the context windowthe maximum number of tokens a model can process in a single prompt-plus-response cycle, encompassing both input and output. More examples mean higher latency and a larger token budget. Amazon SageMaker JumpStart documentation advises optimizing prompt design by including clear, representative examples in few-shot scenarios to enhance model performance while managing this cost.

Diminishing returns typically set in after five to eight examples. Adding a tenth example rarely improves output quality and may even introduce noise if the additional example is lower quality than the rest. The next lesson will cover exactly how to select and format those examples for maximum impact.

Note: Few-shot does not mean many-shot. Keeping your example count between two and five strikes the best balance between pattern richness and token efficiency for most tasks.

The following diagram illustrates how prompt structure differs across the three strategies.

Understanding the structure is one thing. Knowing when to move from one strategy to the next is where the real skill lies.

When to escalate from zero to few-shot

A decision framework

The most efficient approach is to treat zero-shot as your default baseline. If the output already meets your accuracy and format requirements, stop there. It is the cheapest and fastest option. Escalation should be driven by measurable shortcomings, not by habit.

The following decision path covers the most common scenarios:

Zero-shot output has correct content, but the wrong format. Escalate to one-shot. A single example that demonstrates the exact output structure usually resolves formatting drift without significant token overhead.
One-shot output overfits to the single example’s phrasing. Escalate to a few shots. Two or three diverse examples break the model’s tendency to mimic one particular style.
The task involves domain-specific distinctions or edge cases. Start with a few shots. Tasks like classifying legal clauses as “liability,” “indemnity,” or “termination” require the model to see concrete boundaries between similar categories.
Output consistency across repeated calls is critical. Use a few-shot. Batch inference pipelines, such as those running on SageMaker, benefit from the anchoring effect of multiple examples that reduce variance between calls.

Metrics that guide the decision

Key metrics to monitor when evaluating whether to escalate include accuracy, precision, recall, and F1 scorethe harmonic mean of precision and recall, providing a single metric that balances both false positives and false negatives. AWS ML best practices recommend tracking these metrics systematically.

Consider a concrete scenario. A team classifies legal clauses into three categories. Zero-shot confuses “liability” and “indemnity” because the two concepts overlap in general language. One-shot helps, but the model overfits to the single example’s phrasing and misclassifies clauses worded differently. Three-shot with diverse examples achieves a stable F1 above 0.90 across repeated runs.

Attention: Jumping straight to few-shot without testing zero-shot first wastes tokens and can introduce noise if examples are poorly chosen. Always validate the simpler strategy before escalating.

The following quiz checks your understanding of when each strategy is appropriate.

With the escalation framework in place, there is one more critical factor that determines whether few-shot prompting actually helps or backfires.

How example quality shapes output quality

The number of examples matters far less than their quality. A prompt with three excellent examples will outperform a prompt with eight mediocre ones. The model treats every example as ground truth and replicates whatever patterns it finds, including mistakes.

Example quality breaks down along three dimensions:

Correctness. The output in each example must be genuinely accurate. A single incorrect label in a few-shot prompt can poison the entire output because the model faithfully reproduces the error. This is the garbage-in-garbage-out principle applied to in-context learning.
Representativeness. Examples should cover the range of inputs the model will encounter in production. If all your examples feature short, simple inputs, the model may struggle with longer or more complex ones.
Consistency. Every example must follow the same format and style. If one example classifies a ticket as “billing issue” and another uses “billing_issue,” the model may oscillate between formats unpredictably.

Here is a concrete illustration. Suppose you provide three examples for support-ticket classification, but one example labels a password-reset request as “billing” instead of “account access.” The model now has conflicting signals. It may classify similar tickets as “billing” in some runs and “account access” in others, destroying the consistency you were trying to achieve by using few-shot in the first place.

Practical tip: Before adding an example to your prompt, verify it against the same criteria you would use to evaluate the model’s output. If the example would not pass your quality bar as an output, it should not be in your prompt.

Even in few-shot prompts, the instruction portion must still use positive commands, explicit format specifications, and scope boundaries. Examples complement instructions. They do not replace them. The next lesson dives deep into selecting, formatting, and sequencing examples, so this section focuses on establishing why quality is the prerequisite.

The following mind map summarizes the full landscape of prompting strategies and the factors that influence your choice.

Conclusion

Zero-shot prompting leverages the model’s pretraining and is ideal for well-defined, common tasks. One-shot clarifies format or label expectations with minimal token overhead. Few-shot anchors the model to a reliable pattern for domain-specific or consistency-critical tasks. The escalation principle is straightforward: start with zero-shot and add examples only when measurable metrics justify the cost. Above all, example quality along the dimensions of correctness, representativeness, and consistency determines whether few-shot prompting helps or actively hurts your results. In the next lesson, you will learn actionable techniques for selecting examples that cover your input distribution, formatting them consistently, and sequencing them to guide the model toward the output style you need.

1.LLM Application Architectures

2.Challenges and Risks

3.Transformers and Attention

4.Vector Databases

5.Prompt Engineering

Cloud Lab

6.Fine-Tuning

Cloud Lab

7.Model Context with LangChain

8.Agentic Workflows

Cloud Lab

9.Retrieval Augmented Generation (RAG)

Cloud Lab

Cloud Lab

10.LLM Evaluation

Cloud Lab

Zero-Shot vs. Few-Shot Prompting

How zero-shot prompting works

When zero-shot is the right choice

Comparison of Prompting Strategies

One-shot and few-shot mechanics

How one-shot prompting works

How few-shot prompting works

When to escalate from zero to few-shot

A decision framework

Metrics that guide the decision

How example quality shapes output quality

Conclusion