Data Security and Governance
Explore how to implement data security and governance for machine learning on AWS. Understand encryption with AWS KMS, centralized access controls using Lake Formation, and automated sensitive data detection with Amazon Macie. Learn to protect ML datasets from unauthorized access while maintaining compliance and operational efficiency.
ML systems routinely ingest large volumes of sensitive data, from personally identifiable information and financial records to protected health data, which is typically centralized in Amazon S3-based data lakes. A single misconfigured bucket policy or an unencrypted training dataset can expose an organization to compliance violations under
Together, these services implement a governance strategy that protects ML datasets across ingestion, storage, and training.
Encrypting ML datasets with AWS KMS
ML datasets stored in Amazon S3, attached to Amazon EBS volumes, or consumed by SageMaker jobs should be encrypted at rest as a best practice.