Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

pandas
communitycreator

How to filter valid email addresses from a series in pandas

Arslan Bajwa

pandas’s library, combined with Regex Library, allows for the filtering of emails to check their validity.

For details about Regex’s syntax, please visit here.

Method 1

In this method, we iterate over the input_data series and match each entry with the valid email pattern using regex match(). The function returns True if the exact match is found; otherwise, it returns False. Then, the resultant values are mapped to the input_data using the map() function.

#importing pandas and regex libraries
import pandas as pd
import re as regex

#initialiing pandas series
input_data = pd.Series(['educative.io', 'jobs@educative.io', 'edpresso@educative.io'])

#initializing valid email pattern (may vary)
pattern ='[0-9a-zA-Z._%+-]+@[0-9a-zA-Z.-]+\\.[A-Za-z]{2,4}'

#mapping valid emails
mapped_result = input_data.map(lambda i: bool(regex.match(pattern, i)))

print("Valid Emails are: ")
print(input_data[mapped_result])

Method 2

In this method, we use str findall() to find all matching occurrences of the valid email pattern in input_data.

#importing pandas library
import pandas as pd

#initializing pandas series
input_data = pd.Series(['educative.io', 'jobs@educative.io', 'edpresso@educative.io'])

#initializing valid email pattern (may vary)
pattern ='[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}'

#finding all matching occurrences
result = input_data.str.findall(pattern)

print("Valid Emails are: ")
print([i for i in result if len(i) > 0])

Method 3

In this method, we use regex findall() to find all matching occurrences of the valid email pattern in input_data.

#importing pandas and regex libraries
import pandas as pd
import re as regex

#initializing pandas series
input_data = pd.Series(['educative.io', 'jobs@educative.io', 'edpresso@educative.io'])

#initializing valid email pattern (may vary)
pattern ='[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}'

#finding all matching occurrences
result = [regex.findall(pattern, email) for email in input_data]

print("Valid Emails are: ")
print([i for i in result if len(i) > 0])

RELATED TAGS

pandas
communitycreator
RELATED COURSES

View all Courses

Keep Exploring