What is spark.driver.memoryOverhead in Spark 3?

svg viewer

Spark configuration

Spark allows you to configure the system according to your needs. One of the locations to configure a system is Spark Properties.

Spark Properties control most application parameters and can be set using a SparkConf object. One of these properties is spark.driver.memory.OverHead.

The spark.driver.memoryOverHead enables you to set the memory utilized by every Spark driver process in cluster mode. This is the memory that accounts for things like VM overheads, interned strings, other native overheads, etc. – it tends to grow with the executor size (typically 6-10%). The spark.driver.memoryOverHead option is currently supported on YARN and Kubernetes; it has been available since Spark version 2.3.0.

The memory assigned using spark.driver.memoryOverHead is the non-heap memory atop of the memory assigned using spark.driver.memory. If not set explicitly, it is by default, driverMemory * 0.10, with minimum of 384.

Memory input format

Use the following format to set memory:

Format Description
1b bytes
1k or 1kb kibibytes = 1024 bytes
1m or 1mb mebibytes = 1024 kibibytes
1g or 1gb gibibytes = 1024 mebibytes
1t or 1tb tebibytes = 1024 gibibytes
1p or 1pb pebibytes = 1024 tebibytes

How to set driver memory overhead

You can set the driver memory overhead by using the SparkConf object. Here is a sample code:

conf = new SparkConf()
conf.set('spark.driver.memoryOverhead', '15g')

Or you can dynamically load property:

./bin/spark-submit --name "My app" --master local[4] --conf spark.driver.memoryOverhead=15g

You can use spark-submit to specify any Spark property using the --conf/-c flag and then specifying its argument after the = sign. Running --help will show you the entire list of these options.

Copyright ©2024 Educative, Inc. All rights reserved