Simplifying type checking and data validation using Pydantic
Type checking verifies that the data types used in a computer program are correct. It’s a critical part of software development, as it helps prevent errors and improve code quality. Data validation is the process of ensuring that data is accurate, complete, and consistent. Validating data before using it in any application is important, as invalid data can lead to errors and incorrect results.
Pydantic is a Python library that provides a powerful and intuitive way to perform type-checking and data validation. It leverages Python’s type annotations to define and validate data structures, making it easy to ensure that data is consistent and correct. Pydantic can be installed with the following terminal command:
pip install pydantic
Type checking
Python’s type annotations are a way to hint to the type checker what type of data is expected for a particular variable or function parameter. Pydantic takes this one step further by allowing us to define custom constraints on the given data structures. For example, we can specify that a field must be a non-empty string, a positive integer, or a list of unique values.
To use Pydantic for type checking, we simply create a Pydantic model class and define the fields we need. The type annotations for the fields will specify the expected types. For example, the following code defines a model class for a user:
from pydantic import BaseModelclass User(BaseModel):name: strage: intemail: str
In the code above, we import BaseModel from the pydantic library. We defined a class named User with BaseModel as a parameter. The class contains three variables:
name: stringage: integeremail: string
If we try to create a new User instance with invalid data, Pydantic will raise a ValidationError exception. For example, the following code will raise an exception because the age field isn’t an integer:
from pydantic import BaseModelclass User(BaseModel):name: strage: intemail: struser_data = {"name": "Tester1","age": "Eighteen","email": "Tester1@example.com",}user = User.model_validate(user_data)print(user.name)print(user.age)print(user.email)
Here’s a breakdown of the code:
Line 1: We import the
BaseModelclass from thepydanticlibrary.Lines 3–6: We define a
Userclass that inherits from theBaseModelclass. This class defines the structure of a user object, which containsname,age, andemailwith their respective data types.Lines 8–12: We create a
user_datadictionary containing the data for a new user.Line 14: We validate the
user_datadictionary using theUser.model_validate()class method. This method returns aUserobject nameduserif the data is valid, or raises aValidationErrorexception if the data is invalid.Lines 16–18: We print the values of the
userobject’sname,age, andemailproperties.
Now, when we correct the input value for age and use an integer, the code should work. Here’s the updated code:
from pydantic import BaseModelclass User(BaseModel):name: strage: intemail: struser_data = {"name": "Tester1","age": 18, #updated value"email": "Tester1@example.com",}user = User.model_validate(user_data)print(user.name)print(user.age)print(user.email)
Pydantic makes type checking easier, faster, and more efficient than manual type checking.
Data validation
Pydantic can validate data in a number of ways, including range checking, regular expression matching, uniqueness checking, and custom validation.
Range checking
Pydantic range checking is a feature that allows us to validate data against a specified range of values. This can be done by using the Field() and constr class decorators to manage integer value and string length.
The min_length() and max_length() keyword arguments are used for the constr() class decorator to define the range of the string length. The ge (greater than) and le (less than) keyword arguments are used for the Field() class decorator to define the integer value bracket. Here’s an example of using Pydantic range checking to validate the name and age field of a class:
from pydantic import BaseModel, constr, Fieldclass User(BaseModel):name: constr(min_length=3, max_length=20)age: int = Field(ge=18, le=68)email: struser_data = {"name": "Te","age": 16,"email": "Tester1@example.com",}user = User.model_validate(user_data)print(user.name)print(user.age)print(user.email)
Here’s an explanation:
Line 1: We import the
BaseModel,contr, andFieldclasses from thepydanticlibrary.Line 3: We define a
Userclass that inherits from theBaseModelclass.Line 4: We define a
namefield for theUserclass. Thenamefield is a string with a minimum length of 3 characters and a maximum length of 20 characters.Line 5: We define an
agefield for theUserclass. Theagefield is an integer with a minimum value of 18 and a maximum value of 68.Line 6: We define an
emailfield for theUserclass. Theemailfield is a string.Lines 8–12: We create a
user_datadictionary containing the data for a new user.Lines 14: We validate the
user_datadictionary against theUsermodel using themodel.validate()method of the class and return a validatedUserobject nameduser.Lines 16–18: We print the data of the class object
user.
When we run the code above, we see two errors:
The
namestring is of length 2, but according to the constraints, the length should be between 3–20.The
ageinteger has a value of 16, while the range defined for it is 18–68.
When we correct the values according to the constraints, the code runs perfectly:
from pydantic import BaseModel, constr, Fieldclass User(BaseModel):name: constr(min_length=3, max_length=20)age: int = Field(ge=18, le=68)email: struser_data = {"name": "Tester1", #updated value"age": 20, #updated value"email": "Tester1@example.com",}user = User.model_validate(user_data)print(user.name)print(user.age)print(user.email)
Through these checks, data gathering becomes convenient and we get clean data in the end.
Regular expression matching
To use regular expression matching in Pydantic, we can use the constr() field type validator. The constr() field type validator allows specifying a regular expression pattern that the field value must match. For example, the following code shows how to use the constr() field type validator and the pattern keyword argument to validate an email address:
from pydantic import BaseModel, constrclass CheckEmail(BaseModel):email: constr(pattern=r'[a-zA-Z0-9._]@([\w-]+\.)+[\w-]{2,4}')user_data = {"email" : "Tester1@example"}user = CheckEmail.model_validate(user_data)print(user.email)
Here’s an explanation of the code above:
Line 1: We import the
BaseModelandconstrclasses from the Pydantic library.Lines 3–4: We use the
BaseModelclass to create the Pydantic model classCheckEmail, and theconstrclass is used to define field validators that validate strings againstpatternor regular expression.Lines 6–8: We create a dictionary called
user_datawith a single key-value pairemail.Line 10: We call the
CheckEmail.model_validate()method to validate theuser_datadictionary.Line 12: We print the
emailattribute of theuservariable.
The code above will show an error because user_data doesn’t contain a valid email address. If we correct the email address, the code works fine:
from pydantic import BaseModel, constrclass CheckEmail(BaseModel):email: constr(pattern=r'[a-zA-Z0-9._]@([\w-]+\.)+[\w-]{2,4}')user_data = {"email" : "Tester1@example.com"}user = CheckEmail.model_validate(user_data)print(user.email)
Using regular expressions in Pydantic is a great way to ensure that the models contain valid data.
Uniqueness checking
To check for uniqueness in Pydantic, we can use the field_validator() decorator. The field_validator() decorator allows us to validate the entire model instance rather than just individual fields.
Here’s a simple example of how to use the field_validator() decorator to check for uniqueness in a Pydantic model:
from pydantic import BaseModel, Field, field_validatorclass User(BaseModel):name: str = Field(unique=True)__values__ = {}def __init__(self, **data):super().__init__(**data)self.__values__[self.name] = self@field_validator("name")def validate_unique_name(cls, value, **kwargs):if value in cls.__values__:raise ValueError("Duplicate names are not allowed")return valuedef check_for_duplicates(user_data):duplicates = []for name in user_data:try:User(name=name)except ValueError:duplicates.append(name)return duplicatesuser_data = ["Tester1", "Tester1", "Tester2", "Tester2"]duplicates = check_for_duplicates(user_data)if duplicates:print("Duplicate names:")for name in duplicates:print(f"* {name}")else:print("There are no duplicate names.")
Here’s an explanation of the code above:
Line 1: We import the
BaseModel,Field, andfield_validatorclasses from the Pydantic library.Line 3: We define a Pydantic model called
User.Line 4: The
Usermodel has a single fieldname, which is defined as astrfield with theuniquekeyword.Line 6: The
Usermodel also has a__values__attribute, which is a dictionary that stores all of the existingUserinstances.Lines 8–10: The
__init__()method of theUsermodel adds the newUserinstance to the__values__dictionary.Lines 12–16: The
validate_unique_name()field validator checks if thenamefield value is already present in the__values__dictionary. If it is, the field validator raises aValueErrorexception.Lines 18–25:
check_for_duplicates()checks for duplicate names in a list of names. It does this by trying to create a newUserinstance for each name in the list. If theUser()constructor raises aValueErrorexception, then the name is already present in the__values__dictionary and is therefore a duplicate.Line 27: We create a list of names called
user_data.Line 29: We call the
check_for_duplicates()function to check for duplicate names in the list.Lines 30–35: We print a list of duplicate names, if any.
When we run the above code, it displays the duplicates in the list. Now, when we remove the duplicates and run the code again, it shows that there are no duplicates in the list:
from pydantic import BaseModel, Field, field_validatorclass User(BaseModel):name: str = Field(unique=True)__values__ = {}def __init__(self, **data):super().__init__(**data)self.__values__[self.name] = self@field_validator("name")def validate_unique_name(cls, value, **kwargs):if value in cls.__values__:raise ValueError("Duplicate names are not allowed")return valuedef check_for_duplicates(user_data):duplicates = []for name in user_data:try:User(name=name)except ValueError:duplicates.append(name)return duplicatesuser_data = ["Tester1", "Tester2"]duplicates = check_for_duplicates(user_data)if duplicates:print("Duplicate names:")for name in duplicates:print(f"* {name}")else:print("There are no duplicate names.")
Using uniqueness checking in Pydantic is a great way to ensure that our data is consistent, accurate, and efficient.
Benefits of Pydantic
Using Pydantic for type checking and data validation has a number of benefits, including:
Pydantic helps in writing more robust and maintainable code by ensuring that data is consistent and correct.
Pydantic can catch data validation errors early on before they cause problems in the application.
Pydantic makes it easy to define and validate data structures, which can save time and effort.
Limitations of Pydantic
Pydantic isn’t included in the Python standard library, so it requires a separate installation.
Pydantic can be more complex to use for simple programs compared to other command-line parsing libraries.
Pydantic can be slower compared to other command-line parsing libraries because it does extra validation and processing of the data.
Pydantic is a powerful and intuitive library for type checking and data validation in Python. It’s easy to use and can provide significant benefits for our code quality, error reduction, and productivity.
Free Resources