Data Encryption, Masking, and PII
Data protection in AWS involves a multi-layered approach focusing on encryption, masking, and PII detection. Encryption at rest and in transit is crucial, with AWS KMS managing keys and supporting various encryption models, notably SSE-KMS. Data masking and anonymization techniques safeguard sensitive information during analysis, while Amazon Macie automates the discovery of PII, integrating with Lake Formation for governance. This layered strategy ensures secure data handling throughout the pipeline, emphasizing the importance of SSE-KMS and the Macie-Lake Formation pairing for effective data governance.
We'll cover the following...
With identity governance and Lake Formation permissions in place from earlier lessons, the next critical layer of defense for any AWS data pipeline is protecting the data itself. Encryption, masking, and PII detection form the cryptographic and privacy backbone that the AWS Certified Data Engineer – Associate exam tests extensively.
This lesson walks through:
How data is encrypted at rest and in transit across AWS analytics services,
How AWS KMS orchestrates key management, including cross-account scenarios,
How masking and anonymization protect sensitive values at the consumption layer.
These controls are essential because nearly every exam scenario involving S3, Redshift, or Glue touches encryption, and confusing key types or encryption modes is one of the most common reasons candidates select incorrect answers.
AWS encryption
AWS encryption operates along two fundamental dimensions. Encryption at rest protects data stored on disk in services like S3, Redshift, and DynamoDB. Encryption in transit protects data as it moves between clients and services or between services themselves, typically via TLS 1.2 or higher.
Within encryption at rest, you must distinguish between two models. Client-side encryption is a model where data is encrypted before it leaves the caller's environment, giving the client full control over keys and plaintext. In contrast, server-side encryption is a model where the AWS service encrypts data after receiving it, managing cryptographic operations transparently on the server side.
For S3 specifically, three server-side encryption options exist. SSE-S3 uses fully AWS-managed keys with no visibility or control. SSE-KMS uses either AWS-managed or customer-managed keys stored in AWS KMS, providing audit trails and policy control. SSE-C requires the customer to supply encryption keys with every request, placing full key management burden on the caller.
Attention: The exam strongly favors SSE-KMS as the default...