Search⌘ K
AI Features

spaCy Container Objects

Explore spaCy container objects including Doc, Token, and Span to understand how text is structured and processed. This lesson helps you access and manipulate linguistic features like sentences, tokens, and named entities, essential for building NLP applications with spaCy.

We'll cover the following...

Overview

At the beginning of this chapter, we saw a list of container objects, including Doc, Token, Span, and Lexeme. We already used Token and Doc in our code. In this subsection, we'll see the properties of the container objects in detail.

Using container objects, we can access the linguistic properties that spaCy assigns to the text. A container object is a logical representation of the text units, such as a document, a token, or a slice of the document.

Container objects in spaCy follow the natural structure of the text: a document is composed of sentences, and sentences are composed of tokens.

We most widely use Doc, Token, and Span objects in development, which represent a document, a single token, and a phrase, respectively. A container can contain other containers. For instance, a document contains tokens and spans.

Let's explore each class and its useful properties one by one.

Doc

We created Doc objects in our code to represent the text, so you might have already figured out that Doc represents a text.

We already know how to create a Doc object:

Python 3.5
doc = nlp("I like cats.")

The doc.text method returns a Unicode representation of the document text:

Python 3.5
print(doc.text)

The building block of a Doc object is a Token. ...