Apache Pig is a tool that reduces the complexity of writing a
MapReduce program. It is used to analyze large data sets and represent them as data flows. These large data sets consist of a high-level language for expressing data analysis programs. All data manipulation operations are carried out with Hadoop.
Pig Latin is a high-level language provided by Apache Pig for writing data analysis programs. This high-level language also provides methods for writing, reading, and processing data in data analysis programs.
Pig Latin scripts are converted into
Reduce tasks with the aid of a component in Pig called Pig Engine.
The components of Apache Pig that process the Pig Latin language through multiple layers are:
Parser: The parser accepts a program submitted by the user and performs a syntax check and type check. The output of this operation is a
DAG that contains Pig Latin statements and logical operators.
Optimizer: This step pushes the
DAG to a logical optimizer for logical optimization.
Compiler: This is the compilation step where the optimized logical plan is compiled into
Execution Engine: In this final step, the
MapReduce jobs are submitted to Hadoop for execution. The desired data is sent to the user on completion.
orderingetc. can be carried out easily.
Apache Pig has the following features:
It is extensible. Users can create their own functions for special-purpose processing like reading and writing data.
It supports a large range of data types and analyzes all kinds of data, both structured and unstructured.
It provides support for user-defined functions where users can create functions in other programming languages such as Java.
It supports automatic optimization so the users only need to focus only on the semantics of the language.
View all Courses