Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

apache spark

What are dynamically loading properties in Spark?

Shahpar Khan
svg viewer

Spark configuration

Spark allows you to configure a system according to your needs. One of the locations to configure a system is Spark Properties. Spark Properties control most application parameters – they can be set using a SparkConf object or be dynamically loaded.

Dynamically loading Spark properties

Dynamically loading properties let us decide how to start our application on the go instead of hardcoding the configurations. For instance, if you’d like to run the same application with different masters or different amounts of memory, Spark allows you to simply create an empty SparkConf:

val sc = new SparkContext(new SparkConf())

Then, you can supply configuration values at runtime:

./bin/spark-submit --name "My app" --master local[4] --conf spark.eventLog.enabled=false
  --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" myApp.jar

You can use spark-submit to specify any Spark property using the --conf/-c flag and then specify its argument after the = sign. Running --help will show the entire list of these options.

You can also specify properties separately. For example, bin/spark-submit can also read configuration options from conf/spark-defaults.conf in which each line consists of a key and a value separated by whitespace. The spark-defaults.conf will look something like this:

spark.master            spark://5.6.7.8:7077
spark.executor.memory   4g
spark.eventLog.enabled  true
spark.serializer        org.apache.spark.serializer.KryoSerializer

Precedence

Dynamically loading properties are prioritized in two different ways:

1. Loading properties in different ways

Properties set directly on the SparkConf take the highest precedence, next comes flags passed to spark-submit or spark-shell, and then the options in the spark-defaults.conf file.

2. Loading properties with updated names

A few configuration keys have been renamed since earlier versions of Spark. In these cases, older key names are still accepted, but they take lower precedence than any instance of the newer key.

Side note

Spark properties can be divided into two kinds:

  • The first one is related to deploy, like spark.driver.memory or spark.executor.instances.

These kind of properties may not be affected when setting programmatically through SparkConf in runtime. Another reason could be that the behavior is contingent upon which cluster manager and deploy mode are chosen. Therefore, it is suggested to set through a configuration file (spark-defaults.conf) or spark-submit command-line option.

  • The second one is related to Spark runtime control like spark.task.maxFailures. These kinds of properties can be set in either way.

RELATED TAGS

apache spark

CONTRIBUTOR

Shahpar Khan
Copyright ©2022 Educative, Inc. All rights reserved
RELATED COURSES

View all Courses

Keep Exploring