Amazon Data Firehose, formerly Kinesis Firehose, is used for real-time data streaming and enables dynamic partitioning. This approach automatically routes incoming data to S3 prefixes based on record attributes, improving data organization and significantly reducing downstream analytics costs and time.
In this Cloud Lab, you’ll build a real-time anomaly detection system using Data Firehose. You’ll start by training a model for anomaly detection using SageMaker. Then, you’ll set up a Data Firehose stream to ingest sensor data and store it in an S3 bucket. Moving on, you’ll create an SNS topic for sending email alert notifications in case an anomaly is detected. Finally, you’ll combine the application by creating the Lambda function, which is triggered when new data is added to the S3 bucket.
As you complete this lab, you’ll be well equipped to implement S3 dynamic partitioning in your applications, effectively organizing the data in S3 and reducing the costs of analytical queries.
The following is the high-level architecture diagram that you will build in this Cloud Lab: