Spark allows you to configure a system according to your needs. One of the locations to configure a system is Spark Properties.
Spark Properties controls most application parameters and can be set using a SparkConf
object. One of the sub-domains of these Spark Properties is Thread Configuration.
Generally, running threads in a program enables you to run processes in parallel to complete the task more efficiently. Apache Spark processes large amounts of data efficiently. One way it does this is by threading the processes.
Threading enables Spark to systematically utilize available resources to get better performance. Prior to Spark 3.0, these thread configurations applied to all roles of Spark (such as driver, executor, worker, and master). This means that you could specify a single number of threads for all these processes. We can now configure threads in finer granularity and assign different numbers of threads to the driver and executor.
Here is a table with Thread Configuration for
Property Name | Default | Meaning |
---|---|---|
spark.{driver or executor}.rpc.io.serverThreads |
Fall back on spark.rpc.io.serverThreads |
Number of threads used in the server thread pool. |
spark.{driver or executor}.rpc.io.clientThreads |
Fall back on spark.rpc.io.clientThreads |
Number of threads used in the client thread pool. |
spark.{driver or executor}.rpc.netty.dispatcher.numThreads |
Fall back on spark.rpc.netty.dispatcher.numThreads |
Number of threads used in RPC message dispatcher thread pool. |
RELATED TAGS
CONTRIBUTOR
View all Courses