Search⌘ K
AI Features

Types of Fine-Tuning: Supervised, Instruction, and RLHF

Explore the three main fine-tuning paradigms—supervised fine-tuning, instruction tuning, and reinforcement learning from human feedback (RLHF). Understand how each method shapes model behavior, their data requirements, strengths, limitations, and when to apply them to improve task performance and alignment with human preferences.

A pre-trained large language model arrives with an impressive ability to predict the next token in a sequence, but it has no idea how to answer your specific questions, follow your formatting rules, or avoid responses that users find unhelpful or unsafe. The gap between “understands language” and “behaves the way you need” is bridged by fine-tuning, but not all fine-tuning is the same. Three distinct paradigms have emerged, each targeting a different layer of model behavior. Supervised fine-tuning (SFT) teaches the model to perform a specific task using labeled examples. Instruction tuning broadens that capability so the model can follow arbitrary instructions it has never seen before. Reinforcement learning from human feedback (RLHF) goes further by optimizing the model’s outputs to match what humans actually prefer, not just what looks correct on paper.

Think of it like onboarding a new employee who already speaks the language fluently. SFT is job-specific training on your company’s processes. Instruction tuning is cross-functional training so the employee can handle requests from any department. RLHF is ongoing coaching from managers who rate the employee’s work and guide them toward responses that customers find genuinely helpful.

Consider a customer-support chatbot. SFT gives it domain knowledge about your product. Instruction tuning teaches it to follow formatting rules like “respond in bullet points” or “keep answers under 100 words.” RLHF ensures the responses are not just accurate but also perceived as helpful, safe, and appropriately toned by real users. Each paradigm adds a layer of alignment sophistication, and this lesson walks through the mechanics, data requirements, and trade-offs of all three.

Supervised fine-tuning on labeled data

Supervised fine-tuning (SFT) continues gradient-based training on a curated dataset where every example is an input-output pair created by domain experts. The data format is straightforward. Each example consists of a prompt paired with a ground-truth completion, such as a medical question paired with a verified answer, or a document paired with a gold-standard summary.

The training signal comes from ...