Overview

We extracted the name entities in the previous section, but what if we want to unite or split multiword named entities? And what if the tokenizer performed this not so well on some exotic tokens, and we want to split them by hand? In this lesson, we'll cover a very practical remedy for our multiword expressions, multiword named entities, and typos.

doc.retokenize is the correct tool for merging and splitting the spans. Let's see an example of retokenization by merging a multiword named entity, as follows:

Press + to interact

Getting Started

Core Operations with spaCy

Linguistic Features

Rule-Based Matchmaking

Working with Word Vectors and Semantic Similarity

Putting Everything Together: Semantic Parsing with spaCy

Assessment: spaCy Features

Auto-Tagging System for Content Categorization

Customizing spaCy Models

Text Classification with spaCy

spaCy and Transformers

Putting Everything Together: Designing a Chatbot with spaCy

Appendix

Conclusion

Assessment - Machine Learning with spaCy

Merging and Splitting Tokens

Overview