Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

apache spark

How to perform thread configuration in Spark 3

Shahpar Khan
svg viewer

Spark Configuration

Spark allows you to configure a system according to your needs. One of the locations to configure a system is Spark Properties.

Spark Properties controls most application parameters and can be set using a SparkConf object. One of the sub-domains of these Spark Properties is Thread Configuration.

Generally, running threads in a program enables you to run processes in parallel to complete the task more efficiently. Apache Spark processes large amounts of data efficiently. One way it does this is by threading the processes.

Threading enables Spark to systematically utilize available resources to get better performance. Prior to Spark 3.0, these thread configurations applied to all roles of Spark (such as driver, executor, worker, and master). This means that you could specify a single number of threads for all these processes. We can now configure threads in finer granularity and assign different numbers of threads to the driver and executor.

Here is a table with Thread Configuration for RPCRemote Procedure Call (RPC) is a protocol that one program can use to request a service from a program located in another computer on a network. The program that made the request does not have to understand the network’s details module.

Property Name Default Meaning
spark.{driver or executor} Fall back on Number of threads used in the server thread pool.
spark.{driver or executor} Fall back on Number of threads used in the client thread pool.
spark.{driver or executor}.rpc.netty.dispatcher.numThreads Fall back on spark.rpc.netty.dispatcher.numThreads Number of threads used in RPC message dispatcher thread pool.


apache spark


Shahpar Khan
Copyright ©2022 Educative, Inc. All rights reserved

View all Courses

Keep Exploring