Building a Document Processing Pipeline with AWS Services

Building a Document Processing Pipeline with AWS Services
Building a Document Processing Pipeline with AWS Services

CLOUD LABS



Building a Document Processing Pipeline with AWS Services

Learn how to use Amazon’s ML services for document processing. We’ll learn to use multiple AWS services to automate the document processing cycle.

9 Tasks

beginner

1hr

Certificate of Completion

Desktop OnlyDevice is not compatible.
No Setup Required
Amazon Web Services

Learning Objectives

A familiarity with Amazon S3 and the ability to store and retrieve data using S3
The ability to use the IAM service to provide permissions to other services using IAM roles
Hands-on experience in creating a Lambda function to execute a piece of code
The ability to create a sender identity for SES and send emails using it
Hands-on experience in automating data analysis using S3, and AWS Textract and Comprehend

Technologies
Lambda logoLambda
CloudWatch logoCloudWatch
IAM logoIAM
Textract logoTextract
Comprehend logoComprehend
S3 logoS3
Skills Covered
Using AWS Cloud Services
Natural Language Processing
Data Pipeline Engineering
Cloud Lab Overview

The traditional way to analyze documents and extract insights was through manual processing. It used to be a time-consuming process with a high probability of errors. Using AI, we can automate this process, making it much faster and more accurate. To help us do that, Amazon provides AI tools such as Textract and Comprehend. Textract can help us extract data from images and documents. The extracted data is in the form of text. This textual data can then be fed to Comprehend, an NLP tool that analyzes textual data. In response, we’ll get the necessary insights.

In this Cloud Lab, you’ll learn to automate document processing using multiple Amazon services.

To do that, you’ll first create an S3 bucket where the input and output data will be stored. After that, you’ll create an IAM role to provide necessary permissions to other AWS services. You’ll then create a Lambda function to execute a piece of code that will feed the data stored in the bucket to Textract to convert it to text. This text will then be processed using Comprehend, and the output of Comprehend will be stored in the output folder of this bucket. Finally, you’ll integrate an email service in the pipeline using Amazon SES.

After completing this Cloud Lab, you’ll have a pipeline for extracting and processing text from documents using AWS services. Completing these tasks will equip you with practical knowledge of how to utilize these AWS services to automate document processing tasks.

Architecture diagram
Architecture diagram
Cloud Lab Tasks
1.Introduction
Getting Started
2.Create the Required Resources
Create an S3 Bucket
Create an Execution Role
Create a Lambda Function
Configure the Lambda Function
3.Text Extraction and Analysis
Test the Document Processing Pipeline
Integrate Amazon Simple Email Service (SES)
4.Conclusion
Clean Up
Wrap Up
Labs Rules Apply
Stay within resource usage requirements.
Do not engage in cryptocurrency mining.
Do not engage in or encourage activity that is illegal.

Relevant Courses

Use the following content to review prerequisites or explore specific concepts in detail.

Hear what others have to say
Join 1.4 million developers working at companies like