Search⌘ K
AI Features

Configurations of BERT

Explore the different configurations of BERT, including BERT-base and BERT-large, focusing on their architecture details such as encoder layers, attention heads, and hidden units. Understand smaller variants designed for limited resources and their impact on NLP performance.

Standard configurations of BERT

The researchers of BERT have presented the model in two standard configurations:

  • BERT-base

  • BERT-large

Let's take a look at each of these in detail.

BERT-base

BERT-base consists of 12 encoder layers, each stacked one on top of the other. All the encoders use 12 attention heads. The feedforward network in the ...