data.tar.gz

HADOOP_HOME

JAVA_HOME

HDFS_NAMENODE_USER

HDFS_DATANODE_USER

HDFS_SECONDARYNAMENODE_USER

YARN_RESOURCEMANAGER_USER

YARN_NODEMANAGER_USER

HADOOP_CONF_DIR

ZK_HOME

PIG_HOME

AvroWriteExample

AvroReadExample

AvroGeneratedCodeReadExample

AvroGeneratedCodeWriteExample

AvroRPCExample

ParquetReadExampleJob

ParquetWriteExampleJob

ParquetAvroReadExampleJob

ParquetAvroWriteExampleJob

ParquetProjectionReadExampleJob

SequenceFileReadExampleJob

SequenceFileWriteExampleJob

SequenceFileSyncPointExampleJob

TestCarMapperJob

TestCarReducerJob

CarCounterMrProgramJob

MyLiveAppJob

DataNodeWebUI2

YarnWebUI

YarnWebUI-copy

YarnWebUI-copy-copy

JHS-UI

Spark-UI-copy

Spark-History-Server-UI-3

This course offers a one-of-a-kind rich and interactive experience to learn the fundamentals and basics of Big Data. Throughout this course, you will have plenty of opportunities to get your hands dirty with functioning Hadoop clusters.

You will start off by learning about the rise of Big Data as well as the different types of data like structured, unstructured, and semi-structured data. You will then dive into the fundamentals of Big Data such as YARN (yet another resource manager), MapReduce, HDFS (Hadoop Distributed File System), and Spark.

By the end of this course, you will have the foundations in place to start working with Big Data, which is a massively growing field.

Introduction to Big Data and Hadoop

## Sequence File: Intro

Apart from supporting text formats, Hadoop also supports binary formats. The sequence file is one of them. Binary data takes up less disk space than textual data. The temporary data output by map tasks is stored as sequence files.

One of the problems Hadoop faces is storing lots of small files. The Namenode runs short on memory if the system has too many small files. Similarly, if the input to a map-reduce job consists of numerous small files, then the number of mapper tasks (one per file) will be significantly more than if there were fewer, larger files. To overcome these issues, sequence files were created. They serve the following two purposes:

+ Can be used as a persistent data-structure to store binary key-value pairs.
+ Can be used as a container for smaller files. In the case of storing small files as a sequence file, the names of the files become the key and the value their content.

Sequence files are well-supported within the Hadoop ecosystem but have little support outside of Java.

## Composition of a record

A sequence file is internally arranged like this:


# Sequence File: Intro

Apart from supporting text formats, Hadoop also supports binary formats. The sequence file is one of them. Binary data takes up less disk space than textual data. The temporary data output by map tasks is stored as sequence files.

One of the problems Hadoop faces is storing lots of small files. The Namenode runs short on memory if the system has too many small files. Similarly, if the input to a map-reduce job consists of numerous small files, then the number of mapper tasks (one per file) will be significantly more than if there were fewer, larger files. To overcome these issues, sequence files were created. They serve the following two purposes:

+ Can be used as a persistent data-structure to store binary key-value pairs.
+ Can be used as a container for smaller files. In the case of storing small files as a sequence file, the names of the files become the key and the value their content.

Sequence files are well-supported within the Hadoop ecosystem but have little support outside of Java.

# Composition of a record

A sequence file is internally arranged like this:


This lesson explains the structure of a sequence file.

Sequence File: Intro

Hadoop

YARN

Map Reduce

HDFS

Spark

Input & Output Formats

Misc

Quiz

Reference: Replication

Reference: Partitioning

Reference: Transactions

Reference: Issues in Distributed Systems

Sequence File: Intro

Sequence File: Intro

Composition of a record