Python String Operations
Let's review some useful Python string methods.
Reviewing some useful string operations
In Python, the text is represented by strings, objects of the str
class. Strings are immutable sequences of characters. Creating a string object is easy—we enclose the text in quotation marks:
word = 'Hello World'
Now the word
variable contains the string Hello World
. As we mentioned, strings are sequences of characters, so we can ask for the first item of the sequence:
word = 'Hello World'print(word[0])
Always remember to use parentheses with print
, since we are coding in Python 3.x. We can similarly access other indices, as long as the index doesn't go out of bounds:
word = 'Hello World'print(word[4])
How about string length? We can use the len()
method, just like with list
and other sequence types:
word = 'Hello World'print(len(word))
We can also iterate over the characters of a string with sequence methods:
word = 'Hello World'for ch in word:print(ch)
Now let's go over the more string methods, such as counting characters, finding a substring, and changing letter case.
The count()
method counts the number of occurrences of a character in the string, so the output is 3
here:
word = 'Hello World'print(word.count('l'))
Often, we need to find the index of a character for a number of substring operations, such as cutting and slicing the string:
word = 'Hello World'print(word.index('e'))
Similarly, we can search for substrings in a string with the find()
method:
word = 'Hello World'print(word.find('World'))
The find()
returns –1 if the substring is not in the string:
word = 'Hello World'print(word.find('Bonjour'))
Searching for the last occurrence of a substring is also easy:
word = 'Hello World'print(word.rfind('l'))
We can change the letter case by the upper()
and lower()
methods:
word = 'Hello World'print(word.upper())
The upper()
method changes all characters to uppercase. Similarly, the lower()
method changes all characters to lowercase:
word = 'Hello World'print(word.lower())
The capitalize()
method capitalizes the first character of the string:
print('hello madam'.capitalize())
The title()
method makes the string title case. Title case means to make a title, so the first character of each word of the string is capitalized:
print('hello madam'.title())
Forming new strings from other strings can be done in several ways. We can concatenate two strings by adding them:
print('Hello Madam!' + 'Have a nice day.')
We can also multiply a string with an integer. The output will be the string concatenated to itself by the number of times specified by the integer:
print('sweet ' * 5)
The join()
method is frequently used; it takes a list of strings and joins them into one string:
print(' '.join (['hello', 'madam']))
There are a variety of substring methods. Replacing a substring means changing all of its occurrences with another string:
print('hello madam'.replace('hello', 'good morning'))
Getting a substring by an index is called slicing. You can slice a string by specifying the start index and end index. If we want only the second word, we can do the following:
word = 'Hello Madam Flower'print(word [6:11])
Getting the first word is similar. Leaving the first index blank means the index starts from zero:
word = 'Hello Madam Flower'print(word [:5])
Leaving the second index blank has a special meaning as well—it means the rest of the string:
word = 'Hello Madam Flower'print(word [12:])
We now know some of the Pythonic NLP operations. Now we can dive into more of spaCy.