Spark allows you to configure the system according to your needs. One of the locations to configure a system is Spark Properties.
Spark Properties control most application parameters and can be set using a SparkConf
object. One of these properties is spark.driver.memory.OverHead
.
The spark.driver.memoryOverHead
enables you to set the memory utilized by every Spark driver process in cluster mode. This is the memory that accounts for things like VM overheads, interned strings, other native overheads, etc. – it tends to grow with the executor size (typically 6-10%). The spark.driver.memoryOverHead
option is currently supported on YARN and Kubernetes; it has been available since Spark version 2.3.0.
The memory assigned using
spark.driver.memoryOverHead
is the non-heap memory atop of the memory assigned usingspark.driver.memory
. If not set explicitly, it is by default, driverMemory * 0.10, with minimum of 384.
Use the following format to set memory:
Format | Description |
---|---|
1b | bytes |
1k or 1kb | kibibytes = 1024 bytes |
1m or 1mb | mebibytes = 1024 kibibytes |
1g or 1gb | gibibytes = 1024 mebibytes |
1t or 1tb | tebibytes = 1024 gibibytes |
1p or 1pb | pebibytes = 1024 tebibytes |
You can set the driver memory overhead by using the SparkConf
object. Here is a sample code:
conf = new SparkConf()
conf.set('spark.driver.memoryOverhead', '15g')
Or you can dynamically load property:
./bin/spark-submit --name "My app" --master local[4] --conf spark.driver.memoryOverhead=15g
You can use spark-submit
to specify any Spark property using the --conf
/-c
flag and then specifying its argument after the =
sign. Running --help
will show you the entire list of these options.