Ingestion Job: Part II

Explore how to process nested JSON files in Apache Spark using Java by applying flatMap transformations to flatten complex data structures. Understand the implementation of a JsonFlatMapper to convert nested sales data into rows suitable for database insertion. Learn how to efficiently persist these rows using mapPartitions with batch updates in JDBC, completing the ingestion job workflow in a Spark batch application.

We'll cover the following...

Processing the input

Flattening the JSON records

Writing processed information to the database
Integrating code changes to the versioning repository

Java

@Component
public class IngesterProcessor implements Processor<Dataset<Row>> {
    private static Logger LOGGER = LoggerFactory
            .getLogger(IngesterProcessor.class);
    @Override
    public Dataset<Row> process(Dataset<Row> inputDf) {
        LOGGER.info("Flattening JSON records...");
        //Get the appropriate Spark based class doing a Transformation
        Dataset<Row> parsedResults = inputDf.flatMap(new IngesterJsonFlatMapper(), RowEncoder.apply(SalesSchema.getSparkSchema()));
        return parsedResults;
    }
}

1.Course Introduction

2.Spark Introduction and Basics

3.Getting Started with Spark

4.DataFrame Basic Operations

5.DataFrame Advanced Operations

6.Spark SQL and Other Functionalities

7.Building a Big Data Batch Application

8.Deployment and Cluster Execution

9.Monitoring and Performance Fundamentals

10.Conclusion

11.Apendix

Ingestion Job: Part II

Processing the input