The generator is trained using the MLM task.

Selecting the masking position

So, for a given input, X=[x1,x2,...,xn]X = [x_1, x_2, ..., x_n], we randomly select a few positions for masking. Let M=[m1,m2,...,mn]M = [m_1, m_2, ..., m_n] denote the selected positions for masking.

Replacing the selected positions with the [MASK] token

Then, we replace the tokens in the selected positions with the [MASK] token, which can be expressed as follows:

Get hands-on with 1200+ tech skills courses.