SpaCy: Part 1
Explore key SpaCy features that simplify natural language processing such as tokenization, parts of speech tagging, named entity recognition, and dependency parsing. Understand how to preprocess text effectively using SpaCy’s industry-standard functions.
SpaCy
SpaCy is a library used extensively in the industry for text processing. It contains the implementation of state-of-the-art algorithms in the natural language processing field.
Many natural language systems have been deployed using this library’s functionalities.
Without going into too much detail, we’ll explore the key features of this library and perform basic text processing tasks such as:
- Tokenization
- Parts of speech tagging
- Named entity recognition
- Dependency parsing
Tokenization
Tokenization refers to splitting the text into its tokens (words). Splitting is usually done on white space, but there can be other delimiters. Spacy also provides a built-in function to tokenize a given piece of text. ...