unix - Spark on Yarn memory(physical+virtual) usage -


i struggling understand how memory management works spark on yarn:

my spark-submit has

--executor-memory 48g --num-executors 2 

when run top -p <pids_of_2_yarn_containers/executors>

virt    res   %mem 51.059g 0.015t ~4    (container 1) 51.039g 0.012t ~3    (container 2) 

the total memory of system 380g.

and finally, on yarn when click on each of containers page can see:

resource: 54272 memory (container 1) resource: 54272 memory (container 2) 

why each of above metrics not add up? requesting 48g on each spark executor, yarn shows 54g, os reports 15gb physical memory used(res column in top) , 51g virtual memory used(virt column).

in yarn-site.xml

yarn.scheduler.minimum-allocation-mb (this value changes based on cluster ram capacity) - minimum allocation every container request @ rm, in mbs. memory requests lower won't take effect, , specified value allocated @ minimum , max container size

yarn.scheduler.maximum-allocation-mb (this value changes based on cluster ram capacity) - maximum allocation every container request @ rm, in mbs. memory requests higher won't take effect, , capped value

yarn.nodemanager.resource.memory-mb - amount of physical memory, in mb, can allocated containers.

yarn.nodemanager.vmem-pmem-ratio - virtual memory (physical + paged memory) upper limit each map , reduce task determined virtual memory ratio each yarn container allowed. set following configuration, , default value 2.1

yarn.nodemanager.resource.cpu-vcores - property controls maximum sum of cores used containers on each node.

in mapred-site.xml

mapreduce.map.memory.mb - maximum memory each map task use.

mapreduce.reduce.memory.mb - maximum memory each reduce task use.

mapreduce.map.java.opts - jvm heap size map task.

mapreduce.reduce.java.opts - jvm heap size map task.

spark settings

the --executor-memory/spark.executor.memory controls executor heap size, jvms can use memory off heap, example interned strings , direct byte buffers. value of spark.yarn.executor.memoryoverhead property added executor memory determine full memory request yarn each executor. defaults max(384, .07 * spark.executor.memory)

the --num-executors command-line flag or spark.executor.instances configuration property control number of executors requested

so can specify values these parameters mentioned above. calculate memory allocations in case.


Comments

Popular posts from this blog

Sort a complex associative array in PHP -

vb.net - How to ignore if a cell is empty nothing -

recursion - Can every recursive algorithm be improved with dynamic programming? -