Skip to content
gqlxj1987's Blog
Go back

Spark Submit 参数分析

Edit page

Ever wondered how to configure —num-executors, —executor-memory and —execuor-cores spark config params for your cluster?

Lil bit theory:

serveral daemons that’ll run in the background like NameNode, Secondary NameNode, DataNode, JobTracker and TaskTracker

num-executors, we need to make sure that we leave aside enough cores (~1 core per node)

If we are running spark on yarn, then we need to budget in the resources that AM would need (~1024MB and 1 Executor)

Full memory requested to yarn per executor =spark-executor-memory + spark.yarn.executor.memoryOverhead

spark.yarn.executor.memoryOverhead = Max(384MB, 7% of spark.executor-memory)

So, if we request 20GB per executor, AM will actually get 20GB + memoryOverhead = 20 + 7% of 20GB = ~23GB memory for us.

tips:

Running tiny executors (with a single core and just enough memory needed to run a single task, for example) throws away the benefits that come from running multiple tasks in a single JVM.

相关的配置:

Cluster Config: 10 Nodes 16 cores per Node 64GB RAM per Node

  • --num-executors = In this approach, we'll assign one executor per core
= `total-cores-in-cluster`
= `num-cores-per-node * total-nodes-in-cluster` 
= 16 x 10 = 160

Not Good!

  • --num-executors = In this approach, we'll assign one executor per node
= `total-nodes-in-cluster`
= 10

recommended config is: 29 executors, 18GB memory each and 5 cores each!!


Edit page
Share this post on:

Previous Post
Kafka消息格式演进
Next Post
One Hot编码