Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

aws
amazon
emr
communitycreator

What is Amazon Elastic MapReduce?

Tarun Telang

Amazon Elastic MapReduce is a service provided by Amazon Web Service (AWS), through which you can run Hadoop jobs securely. This service is based on Apache Hadoop, an open-source solution for processing large datasets.

AWS EMR logo

How does it work?

It uses master-slave architecture to distribute data and process it to run applications using the MapReduce programming model. You can process petabytes of data using several hundreds of compute nodes in Amazon Elastic Compute Cloud (EC2) instances. The AWS will automatically add or remove EC2 instances depending on your processing needs.

This service also provides a resource management system to scale and optimize processing power. You can quickly scale up or scale down your cluster as per your need with just a few clicks in the AWS Management Console

Benefits

  1. Amazon Elastic MapReduce ensures that jobs are distributed in a fault-tolerant manner so that you can run interactive and batch processing jobs in the cloud.

  2. Amazon Elastic MapReduce offers a pay-as-you-go billing model, so you only pay for what you use. You can launch the clusters within minutes and access them through your web browser with a simple point-and-click interface. You can also use an elastic IP to access the cluster even if the node is in the process of being terminated or removed from AWS. You are charged on an hourly basis, and you will be able to track your usage through CloudWatch and monitor your incurred costs.

  3. Amazon Elastic MapReduce integrates many tools and frameworks such as Amazon S3, Apache Hive, Apache Mahout, and Apache Pig to build data processing jobs running on high-availability clusters in the cloud. AWS provides you with direct access to all of your data stored in S3 without requiring any intermediate services or software installed.

Limitations

Any data stored in the underlying Hadoop Distributed File System (HDFS) file system is lost every time you terminate your cluster.

RELATED TAGS

aws
amazon
emr
communitycreator
RELATED COURSES

View all Courses

Keep Exploring