What is spark.driver.memoryOverhead in Spark 3?
Spark configuration
Spark allows you to configure the system according to your needs. One of the locations to configure a system is Spark Properties.
Spark Properties control most application parameters and can be set using a SparkConf object. One of these properties is spark.driver.memory.OverHead.
The spark.driver.memoryOverHead enables you to set the memory utilized by every Spark driver process in cluster mode. This is the memory that accounts for things like VM overheads, interned strings, other native overheads, etc. – it tends to grow with the executor size (typically 6-10%). The spark.driver.memoryOverHead option is currently supported on YARN and Kubernetes; it has been available since Spark version 2.3.0.
The memory assigned using
spark.driver.memoryOverHeadis the non-heap memory atop of the memory assigned usingspark.driver.memory. If not set explicitly, it is by default, driverMemory * 0.10, with minimum of 384.
Memory input format
Use the following format to set memory:
| Format | Description |
|---|---|
| 1b | bytes |
| 1k or 1kb | kibibytes = 1024 bytes |
| 1m or 1mb | mebibytes = 1024 kibibytes |
| 1g or 1gb | gibibytes = 1024 mebibytes |
| 1t or 1tb | tebibytes = 1024 gibibytes |
| 1p or 1pb | pebibytes = 1024 tebibytes |
How to set driver memory overhead
You can set the driver memory overhead by using the SparkConf object. Here is a sample code:
conf = new SparkConf()
conf.set('spark.driver.memoryOverhead', '15g')
Or you can dynamically load property:
./bin/spark-submit --name "My app" --master local[4] --conf spark.driver.memoryOverhead=15g
You can use spark-submit to specify any Spark property using the --conf/-c flag and then specifying its argument after the = sign. Running --help will show you the entire list of these options.
Free Resources