Python String Operations
Explore fundamental Python string operations crucial for NLP tasks, including indexing, slicing, and case manipulation. Understand how to work with strings effectively as a foundation before advancing with spaCy's natural language processing tools.
We'll cover the following...
Reviewing some useful string operations
In Python, the text is represented by strings, objects of the str class. Strings are immutable sequences of characters. Creating a string object is easy—we enclose the text in quotation marks:
Now the word variable contains the string Hello World. As we mentioned, strings are sequences of characters, so we can ask for the first item of the sequence:
Always remember to use parentheses with print, since we are coding in Python 3.x. We can similarly access other indices, as long as the index doesn't go out of bounds:
How about string length? We can use the len() method, just like with list and other sequence types:
We can also iterate over the characters of a string with sequence methods:
Now let's go over the more string methods, such as counting characters, finding a substring, and changing letter case.
The count() method counts the number of occurrences of a character in the string, so the output is 3 here:
Often, we need to find the index of a character for a number of substring operations, such as cutting and slicing the string:
Similarly, we can search for substrings in a string with the find() method:
The find() returns –1 if the substring is not in the string:
Searching for the last occurrence of a substring is also easy:
We can change the letter case by the upper() and lower() methods:
The upper() method changes all characters to uppercase. Similarly, the lower() method changes all characters to lowercase:
The capitalize() method capitalizes the first character of the string:
The title() method makes the string title case. Title case means to make a title, so the first character of each word of the string is capitalized:
Forming new strings from other strings can be done in several ways. We can concatenate two strings by adding them:
We can also multiply a string with an integer. The output will be the string concatenated to itself by the number of times specified by the integer:
The join() method is frequently used; it takes a list of strings and joins them into one string:
There are a variety of substring methods. Replacing a substring means changing all of its occurrences with another string:
Getting a substring by an index is called slicing. You can slice a string by specifying the start index and end index. If we want only the second word, we can do the following:
Getting the first word is similar. Leaving the first index blank means the index starts from zero:
Leaving the second index blank has a special meaning as well—it means the rest of the string:
We now know some of the Pythonic NLP operations. Now we can dive into more of spaCy.