In this section, we will learn about the details of spaCy's text classifier component TextCategorizer. Previously, we saw that the spaCy NLP pipeline consists of components. We also learned about the essential components of the spaCy NLP pipeline, which are the sentence tokenizer, POS tagger, dependency parser, and named entity recognition (NER).

TextCategorizer is an optional and trainable pipeline component. In order to train it, we need to provide examples and their class labels. We first add TextCategorizer to the NLP pipeline and then do the training procedure. The illustration below shows where exactly the TextCategorizer component lies in the NLP pipeline; this component comes after the essential components. In the following diagram, textcat refers to the TextCategorizer component.

Get hands-on with 1200+ tech skills courses.