Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

python

What is WhitespaceTokenizer in Python ?

Sadia Zubair

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

In Natural Language Processing, tokenization is dividing a string into a list of tokens. Tokens come in handy when finding valuable patterns and help to replace sensitive data components with non-sensitive ones.

Tokens can be thought of as a word in a sentence or a sentence in a paragraph.

WhitespaceTokenizer in Python splits a string on whitespace, i.e., space, tab, and newline.

The split function in Python works similarly.

In Python, we can tokenize with the help of the Natural Language Toolkit (NLTK) library. The library needs to be imported in the code.

Installation of NLTK

With Python 2.x, NLTK can be installed in the device by:

pip install nltk

With Python 3.x, NLTK can be installed in the device by:

pip3 install nltk

However, installation is not yet complete. In the Python file, the code below needs to be run:

import nltk
nltk.download()

Upon executing the code, an interface will pop up. Under the heading of collections, click on “all” and then click on “download” to finish the installation.

Example

The code below explains how the WhitespaceTokenizer functions.

from nltk.tokenize import WhitespaceTokenizer
data = "Good muffins cost $3.88\nin New York. Please buy me\ntwo of them.\n\nThanks."
print(WhitespaceTokenizer().tokenize(data))

RELATED TAGS

python

CONTRIBUTOR

Sadia Zubair
Copyright ©2022 Educative, Inc. All rights reserved

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Keep Exploring