Amazon Bedrock Data Automation (BDA) enables developers to extract structured insights from unstructured documents such as invoices, receipts, and contracts—without building complex ML pipelines. With built-in support for GenAI and custom blueprints, Bedrock automates document understanding and transforms raw data into meaningful output.
You’ll build a complete intelligent document processing (IDP) workflow in this Cloud Lab using Amazon Bedrock Data Automation. The goal is to extract, process, and manage invoice data in a fully serverless and automated pipeline.
You’ll start by creating an Amazon S3 bucket, the central location for uploading incoming invoices and storing structured outputs generated by BDA. Next, create an Amazon DynamoDB table to persist the cleaned and structured invoice data. Next, you’ll define a custom blueprint in Amazon BDA to extract only the relevant fields from the input files. You’ll then create a Lambda function to run the automation workflow using the blueprint, and the structured output is written back to S3 as a JSON file. Then, you’ll configure Amazon SQS to receive notifications whenever a new result file appears in the S3 output folder. This queue serves as a decoupling layer, allowing asynchronous processing.
Finally, you’ll create another Lambda function to fetch the JSON from S3, parse the extracted invoice data, and insert or update the record in DynamoDB. To make your workflow truly intelligent, it also handles payment proof documents. When these files are uploaded, the Lambda updates the payment status of those invoices.
By the end of this Cloud Lab, you will have created a scalable, event-driven architecture for automated document processing using Amazon Bedrock, Lambda, S3, SQS, and DynamoDB.
Below is the high-level architecture diagram of the infrastructure you’ll create in this Cloud Lab: