Let’s go over the solution for building a multi-head attention sublayer step by step.

Step 1: Initialize the input

To start off, we will initialize the input vectors given in the problem statement with x containing 4 inputs and dmodeld_{model} being equal to 3:

