Search⌘ K

The Feedforward Network and Final Assembly

Learn about the inner workings of LLMs.

In our last lesson, we built the sophisticated multi-head attention mechanism. Its output is a matrix of deeply context-aware vectors, where each token has absorbed relevant information from its neighbors from multiple expert perspectives. The “communication” phase of our process is now complete.

However, after a productive meeting, you need to return to your desk to process what you’ve learned. Communication is not enough; you also need time to “think.” Our tokens are in the same position. They are rich with new context, but they haven’t had a chance to process it individually. How does the model perform this deep, individual processing to truly digest the information it just gathered?

This is the job of the final core component in our block: the feedforward network (FFN).

The “thinking” component

The FFN is a simple but powerful transformation. After the multi-head attention step, each vector in our matrix is passed independently through its own small, two-layer neural network. This is an important distinction: attention is an “all-to-all” communication step where tokens interact, while the FFN is a “one-to-one” processing step where each token contemplates on its own.

This is the primary “thinking” or processing phase of the block. The FFN is where the model applies a significant portion of its learned knowledge (a large number of its parameters reside here) to the context-rich vectors it just received from the attention step. It allows the model to identify and transform complex patterns within each vector, adding a layer of computational depth. Without this individual processing step, the model would be proficient at gathering information but struggle to comprehend it deeply.

A primer on the FFN

So far, we have focused on the clever architecture of attention, which is all about communication. However, we’ve used the term feedforward network (FFN) to refer to the “thinking” part. What exactly does this mean, and how is it different from the matrix math we’ve already done?

What is a neural network layer?

At its heart, a single layer of a neural network is a simple two-step mathematical operation that you are already familiar with, plus one new ingredient. For any given input vector x:

  1. Linear transformation: We multiply the input by a learned weight matrix (W) and add a ...