Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

The default value for spark.default.parallelism is the total number of available cores in the cluster. This value determines the maximum number of tasks that can be executed simultaneously by the cluster.

When parallelizing RDD with spark submit, the value of spark.default.parallelism determines the level of parallelization that is achieved. Increasing the value of spark.default.parallelism results in more parallelism and faster execution times. However, setting it too high can lead to excessive usage of memory and other resources, thus affecting the performance of the cluster.

Therefore, the ideal value of spark.default.parallelism depends on the available resources, data size, and the complexity of the computation being performed. It is recommended to set this value according to the specific requirements of the job to achieve optimal performance.