Search⌘ K

Spacy Part 1

Explore the core functionalities of Spacy for natural language processing. Learn how to tokenize text, identify parts of speech, recognize named entities, and apply lemmatization to prepare text for analysis.

Spacy

Spacy is a library used extensively in the industry for text processing. It contains the implementation of state-of-the-art algorithms in the Natural Language Processing field.

Many Natural Language Systems have been deployed using this library’s functionalities.

Without getting into too much detail, we will be looking into the bells and whistles of this library and perform basic functionalities of text processing like:

  • Tokenization
  • Parts of speech tagging
  • Named entity recognition
  • Dependency parsing

Tokenization

Tokenization refers to splitting the text into its tokens (words). Splitting is usually done on white space but there can be other delimiters. Spacy also provides a built-in function to tokenize a given piece of text.

This is a very common ...