How to sanitize user input in Python

Sanitizing user input is a critical step in development. Since our clients come from all over the world, we need to be cautious and ensure our operations are secure. It's the developer's responsibility to ensure that the program remains secure and free of any malicious input so that the service can function properly.

We must take the following two precautions to ensure that the input is valid and secure for our system:

Input validation: Ensuring that the input is well-formed and in the expected structure.
Input sanitization: Ensuring data is semantically and logically correct and safe to use in the system's workflow.

Note: The implementation of the above measures differ based on the technology and context of the service. For example, input sanitization for web applications may involve stripping HTML and JavaScript tags from user input, while IoT devices may need to sanitize input data from sensors or other sources.

Let's see how to sanitize user input using different techniques available in Python.

Sanitizing user input in Python

We can remove unnecessary or malformed data from our input using different techniques, some of which are listed in the table below:

Explanation

Lines 3–6: We define sanitize_input, which returns sanitized data after processing it.
Line 5: We call the html.escape() function to replace each character with a special meaning with its alternate escape value.
Line 8: We store input data from the user in the user_input.
Line 9: We use the sanitize_input method to process the input data.
Line 10: We display the sanitized output. The output only contains Hi! instead of Hi!, which means the special characters were replaced successfully.

Sanitizing using `bleach`

We use Python's bleach library to allow only allowlisted HTML tagsAllowlisted HTML tags are a set of HTML tags that are permitted for usage in web applications. in the input.

Explanation

Line 1: We import the bleach library to our code.
Line 4: We define the list of allowed tags in the user input.
Lines 6–9: We define sanitize_input, which returns sanitized data after processing it.
Line 8: We call the bleach.clean() function to only allow only allowlisted tags and replace the rest with their alternate escape value.
Line 11: We store input data from the user in the user_input.
Line 12: We use the sanitize_input method to process the input data.
Line 13: We display the sanitized output. The color of the output is not changed, which means only the allowlisted tags were interpreted successfully by the browser.

Sanitizing using `re`

We use the re module of Python to blocklist script tags using regular expressions from the user input.

Explanation

Line 1: We import the re module in our code.
Lines 3–6: We define sanitize_input, which returns sanitized data after processing it.
Line 5: We call the re.sub() function to remove unnecessary data that matches the regular repression provided in r. The flags=re.IGNORECASE tells the re module to ignore the case of characters when matching the regular expression.
Line 8: We store input data from the user in the user_input.
Line 9: We use the sanitize_input method to process the input data.
Line 10: We display the sanitized output. The output only contains Hi!, which means the script tag was removed successfully.

Technique	Description
Escape characters	Escape special characters from the input using `html.escape()`, `isalnum()`, etc., methods to prevent accidental code execution.
Third-party libraries	Sanitizing inputs using third-party libraries and frameworks, such as `bleach`, `validators`, etc.
Regular expressions	Only allowing expected data by blocklisting or allowlisting inputs, such as using the `re` module of Python

How to sanitize user input in Python

Sanitizing user input in Python

Sanitizing using html.escape()

Explanation

Sanitizing using bleach

Explanation

Sanitizing using re

Explanation

Sanitizing using `html.escape()`

Sanitizing using `bleach`

Sanitizing using `re`