Memory Management in Spark 3
Spark configuration
Spark allows you to configure a system according to your needs. One of the locations to configure a system is Spark Properties. Spark Properties control most application parameters and can be set using a SparkConf object. One sub-domain of these properties is Memory Management.
Memory management
Since version 1.6, Spark has been using the Unified Memory Manager. The Unified Memory Manager allows the Storage Memory and Execution Memory to co-exist and share each other’s free space. This memory management model is based on JVM and has two types:
- On-Heap Memory
- Off-Heap Memory
On-Heap Memory
On-Heap Memory has four components, as illustrated on the right:
- Storage Memory
- Execution Memory
- User Memory
- Reserved Memory
Storage memory
Storage Memory stores Spark cache data, broadcast variable, and Unroll data.
Execution Memory
Execution Memory stores temporary objects during the execution of Spark tasks such as sort, aggregate, etc.
User Memory
User Memory stores your data that is needed for RDD conversion operations(e.g., the information for RDD dependency).
Reserved Memory
Reserved Memory is reserved for the system and is used to store Spark’s internal objects. Its size is hardcoded.
Off-Heap Memory
Off-Heap Memory has two components, as illustrated on the right:
- Storage Memory
- Execution Memory
They are used for the same purpose described above. Off-heap memory is disabled by default, but we can enable it with the spark.memory.offHeap.enabled parameter and set the memory size with the spark.memory.offHeap.size parameter.
Here is a list of different properties that can be used to configure Spark.
Free Resources