Entity Extraction
Let's see how we will extract the entities that our chatbot will use.
We'll cover the following...
We'll now implement the first step of our chatbot NLU pipeline and extract entities from the dataset utterances. The following are the entities marked in our dataset:
To extract the entities, we'll use the spaCy NER model and the spaCy Matcher class. Let's get started by extracting the city entities.
Extracting city entities
We'll first extract the city entities. We'll get started by recalling some information about the spaCy NER model and entity labels:
First, we recall that the spaCy named entity label for cities and countries is
GPE. Let's ask spaCy to explain whatGPElabel corresponds to once again:
Secondly, we also recall that we can access entities of a
Docobject via theentsproperty. We can find all entities in an utterance that are labeled by the spaCy NER model as follows:
In this code segment, we listed all named entities of this utterance by calling doc.ents. Then, we examined the entity labels by calling ent.label_. Examining the output, we see that this utterance contains five entities—one cardinal number entity (2), one TIME entity (11:30 am), one PRODUCT entity (Bird, which is not an ideal label for a restaurant), one CITY entity (Palo Alto), and one DATE entity (today). The GPE type entity is what we're looking for; Palo Alto is a city in the US and hence is labeled by the spaCy NER model as GPE.
The code below outputs all the utterances that include a city entity together with the city entities. From the output of this script, we can see that the spaCy NER model performs very well on this corpus ...