Apache Pig offers a high-level scripting language, popularly used for the writing of data pipelines in the world of Hadoop. The scripting language is known as Pig Latin. It appears as a blend of SQL and many other scripting languages, it is reasonably simple to read and write, so you tend to move up and running with the use of Pig Latin pretty rapidly. It is explanatory and concise just like SQL. You also enjoy more explicit control on various datasets at each phase in the pipeline.

Pig’s language layer currently consists of a textual language called Pig Latin, which has the following key properties:

  • Ease of programming. It is trivial to achieve parallel execution of simple, “embarrassingly parallel” data analysis tasks. Complex tasks comprised of multiple interrelated data transformations are explicitly encoded as data flow sequences, making them easy to write, understand, and maintain.

  • Optimization opportunities. The way in which tasks are encoded permits the system to optimize their execution automatically, allowing the user to focus on semantics rather than efficiency.

  • Extensibility. Users can create their own functions to do special-purpose processing.


Get hands-on with 1200+ tech skills courses.