Create a Data Lake with Lake Formation and Analyze It with Athena

Create a Data Lake with Lake Formation and Analyze It with Athena

AWS Lake Formation is a service that allows users to build and manage secure data lakes. It allows the creation of databases with different data sources. It also uses AWS Glue Crawler, which can extract metadata from different data sources and convert it to tables, making it available for various different types of data analytics. It allows data import from multiple sources and manages them in a centralized data catalog. Moreover, it manages access to datasets and can grant fine-grained permissions to the users who are supposed to access the data in the data lake.

With Athena, you can analyze large amounts of data in a data lake without setting up and managing any infrastructure, given that you are an authorized user. Athena is serverless and can automatically scale up to query any amount of data stored in the data lake, and you only pay for the queries you run.

In this Cloud Lab, you’ll create an S3 bucket with your data stored in it. Then, you’ll configure your data lake with AWS Lake Formation. You’ll set up a database in AWS LakeFormation with an S3 bucket as its source. After that, you’ll use the AWS Glue crawler to convert data into tables and save them in the data lake. Moreover, you’ll grant permission to a user to access the data lake. At last, you’ll query the data in the data lake through Amazon Athena.

After the completion of this Cloud Lab, the provisioned infrastructure of this lab will be similar to the one given below:

The architecture diagram
The architecture diagram