High-level Overview of the spaCy Library

Explore spaCy, an industrial-strength Python library designed for natural language processing tasks. Understand its features including efficient tokenization, named entity recognition, part-of-speech tagging, and how it compares to other NLP libraries. This lesson prepares you to use spaCy for practical NLP applications with pre-trained models and seamless integration.

We'll cover the following...

What is spaCy?

What is spaCy?

spaCy is an open-source Python library for modern NLP. The creators of spaCy describe their work as industrial-strength NLP. spaCy is shipped with pre-trained language models and word vectors for 60+ languages.

spaCy is focused on production and shipping code, unlike its more academic predecessors. The most famous and frequently used Python predecessor is NLTK. NLTK's main focus was providing students and researchers with an idea of language processing. It never put any claims on efficiency, model accuracy, or being an industrial-strength library. spaCy focused on providing production-ready code from the first day. You can expect models to perform on real-world data, the code to be efficient, and the ability to process a huge amount of text data in a reasonable time. The following table is an efficiency comparison from the spaCy documentation.

The spaCy code is also maintained in a professional way, with issues sorted by labels and new releases covering as many fixes as possible. We can always raise an issue on the spaCy GitHub repo.

Another predecessor is CoreNLP (also known as StanfordNLP). CoreNLP is implemented in Java. Though CoreNLP competes in terms of efficiency, Python won by easy prototyping, and spaCy is much more professional as a software package. The code is well maintained, issues are tracked on GitHub, and every issue is marked with some labels (such as bug, feature, or new project). Also, the installation of the library code and the models is easy. Together with providing backward compatibility, this makes spaCy a professional software project. The table below has a comparison of spaCy and the other NLP libraries.

Throughout this course, we will be using spaCy's latest release (the version used at the time of writing this course) for all our computational linguistics and ML purposes. The following are the features in the latest release:

Original data preserving tokenization.
Statistical sentence segmentation.
Named entity recognition.
Part-of-speech (POS) tagging.
Dependency parsing.
Pre-trained word vectors
Easy integration with popular deep learning libraries. spaCy's ML library Thinc provides thin wrappers around PyTorch, TensorFlow, and MXNet. spaCy also provides wrappers for HuggingFace Transformers by spacy-transformers library.
Industrial-level speed.
A built-in visualizer, displaCy.
Support for 60+ languages.
46 state-of-the-art statistical models for 16 languages.
Space-efficient string data structures.
Efficient serialization.
Easy model packaging and usage.
Large community support.

We had a quick glance around spaCy as an NLP library and as a software package. We will see what spaCy offers in detail throughout the course.

SYSTEM	TOKENIZE	TAG	PARSE
spaCy	0.2ms	1ms	19ms
CoreNLP	0.18ms	10ms	49ms
ZPar	1ms	8ms	850ms
NLTK	4ms	443ms	n/a

	spaCy	NLTK	CoreNLP
Programming Language	Python	Python	Java/Python
Neural Netowork Models	Yes	No	Yes
Integrated word vectors	Yes	No	No
Multi-language support	Yes	Yes	Yes
Tokenization	Yes	Yes	Yes
Part-of-speech tagging	Yes	Yes	Yes
Sentence segmentation	Yes	Yes	Yes
Dependency parsing	Yes	No	Yes
Enity recognition	Yes	Yes	Yes
Entity linking	Yes	Yes	No
Coreference resolutuion	No	No	Yes

1.Getting Started

2.Core Operations with spaCy

3.Linguistic Features

4.Rule-Based Matchmaking

5.Working with Word Vectors and Semantic Similarity

6.Putting Everything Together: Semantic Parsing with spaCy

Assessment

Project

7.Customizing spaCy Models

8.Text Classification with spaCy

9.spaCy and Transformers

10.Putting Everything Together: Designing a Chatbot with spaCy

11.Appendix

12.Conclusion

Assessment

High-level Overview of the spaCy Library

What is spaCy?

Feature comparison