Anonymizing and Encrypting Using Python
Learn about anonymizing and encrypting sensitive data as a part of the transform stage in an ETL pipeline.
We'll cover the following...
When dealing with sensitive data such as passwords, financial data, medical records, or confidential business information, we often need to protect it somehow. During the transform stage of the ETL pipeline, we might need to employ data anonymization or data encryption methods.
Data anonymization
During data anonymization, we remove or obscure Personally Identifiable Information(PII) from a dataset to keep the privacy of users and clients.
There are several methods of anonymizing data, including:
Masking: Replacing sensitive information with characters such as asterisks.
Perturbation: Adding random noise or error to the data to obscure specific values. For example, a dataset of GPS locations of users used for a statistical analysis might be perturbed by adding some random, normally distributed noise to keep the exact coordinates hidden while still allowing the analysts to perform statistical analysis on the overall distribution of the ...