Search⌘ K

Noising Techniques

Learn various noising techniques used to corrupt text for pre-training BART, including token masking, deletion, infilling, sentence shuffling, and document rotation. Understand how these methods affect BART's ability to model and generate natural language text for downstream tasks.

We've learned that we corrupt the text and feed it to the encoder of BART. But how exactly do we corrupt the text? Does corrupting only include masking few tokens? Not necessarily.

The researchers have proposed several interesting noising techniques for corrupting the text:

  • Token masking

  • Token deletion

  • Token infilling

  • Sentence shuffling

  • Document rotation

Let's take a closer look at each of these methods.

Token masking

In token ...