Search⌘ K
AI Features

Quantized Low-Rank Adaptation (QLoRA)

Explore how QLoRA combines low-rank adaptation with quantization to fine-tune large language models efficiently. Understand key components like 4-bit NormalFloat quantization, double quantization, and paged optimizers. This lesson helps you grasp how QLoRA enables memory-efficient neural network fine-tuning, suitable for deployment on limited-resource devices.

Quantized Low-Rank Adaptation (QLoRA), as the name suggests, combines the two most widely used methods of fine-tuning, i.e., LoRA and quantization. Where LoRA uses the low-rank matrices to reduce the number of trainable parameters, QLoRA extends it by further reducing the model size by quantizing its weights.

Overview of a single layer in QLoRA
Overview of a single layer in QLoRA

Components of QLoRA

The following are the three main components of QLoRA:

  • 4-bit NormalFloat quantization

  • Double quantization

  • Paged optimizers

Let’s dive into the details of each component

4-bit NormalFloat quantization

The ...