Hadoop Ecosystem

Explore the Hadoop ecosystem and learn about its core components such as HDFS for distributed storage and YARN for resource management. Understand how tools like Apache Pig, Hive, Mahout, and others support big data processing, querying, and machine learning in large-scale environments.

We'll cover the following...

What is Hadoop?
Components of Hadoop
- Hadoop Distributed File System (HDFS)
  - Name node
  - Data nodes
- YARN (Yet Another Resource Negotiator)
  - Resource manager
  - Node manager
Hadoop Ecosystem

Hadoop is an open source software that involves solving big data problems using large clusters of hardware. It efficiently stores and processes big data across big clusters. The idea of Hadoop came from a MapReduce paper proposed by Google. Hadoop is developed in the Java programming language.

Components of Hadoop

While setting up a Hadoop cluster for big data processing, two services are mandatory:

HDFS (Hadoop Distributed File System) for storing data.
YARN (Yet Another Resource Negotiator) for processing the data in the HDFS.

1.What Is Data Science ?

2.Applications of Data Science

3.Overview of Libraries

4.Probability and Statistics

5.Machine Learning Part-1

6.Machine Learning Part-2

7.Machine Learning Part-3

8.Deep Learning

9.Machine Learning Tools and Libraries

10.Big Data Tools and Technologies

11.Where to go next ?

Mock Interview

Mock Interview

Hadoop Ecosystem

What is Hadoop?

Components of Hadoop

Hadoop Distributed File System (HDFS)

Name node