PhraseMatcher and EntityRuler

Explore how to apply spaCy's PhraseMatcher to match long lists of phrases efficiently and use EntityRuler to add custom named entities to NLP pipelines. This lesson helps you handle domain-specific entities and improve entity recognition beyond statistical models.

We'll cover the following...

PhraseMatcher
EntityRuler

PhraseMatcher

While processing financial, medical, or legal text, often we have long lists and dictionaries, and we want to scan the text against our lists. As we saw in the previous section, Matcher patterns are quite handcrafted; we coded each token individually. If we have a long list of phrases, Matcher is not very handy. It's not possible to code all the terms one by one.

spaCy offers a solution for comparing text against long dictionaries—the PhraseMatcher class. The PhraseMatcher class helps us match long dictionaries. Let's get started with an example:

Python 3.5

import spacy 
from spacy.matcher import PhraseMatcher 
nlp = spacy.load("en_core_web_md") 
matcher = PhraseMatcher(nlp.vocab) 
terms = ["Angela Merkel", "Donald Trump", "Alexis Tsipras"] 
patterns = [nlp.make_doc(term) for term in terms] 
matcher.add("politiciansList", None, *patterns) 
doc = nlp("3 EU leaders met in Berlin. German chancellor Angela Merkel first welcomed the US president Donald Trump. The following day Alexis Tsipras joined them in Brandenburg.") 
matches = matcher(doc) 
for mid, start, end in matches: 
    print(start, end, doc[start:end])

1.Getting Started

2.Core Operations with spaCy

3.Linguistic Features

4.Rule-Based Matchmaking

5.Working with Word Vectors and Semantic Similarity

6.Putting Everything Together: Semantic Parsing with spaCy

Assessment

Project

7.Customizing spaCy Models

8.Text Classification with spaCy

9.spaCy and Transformers

10.Putting Everything Together: Designing a Chatbot with spaCy

11.Appendix

12.Conclusion

Assessment

PhraseMatcher and EntityRuler

PhraseMatcher