在Linux环境下,Hadoop资源的分配主要涉及到以下几个方面:
在yarn-site.xml中配置:
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>8192</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>8</value>
</property>
启用动态分配:
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>8192</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>8</value>
</property>
<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
<value>0.1</value>
</property>
<property>
<name>yarn.scheduler.capacity.node-locality.wait</name>
<value>30000</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>queueA,queueB</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.queueA.capacity</name>
<value>50</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.queueB.capacity</name>
<value>50</value>
</property>
hadoop jar my-job.jar com.example.MyJob -D mapreduce.map.memory.mb=2048 -D mapreduce.reduce.memory.mb=4096
<property>
<name>yarn.nodemanager.container-reuse-enable</name>
<value>true</value>
</property>
通过合理配置Hadoop集群的各个组件和参数,可以实现高效的资源分配和管理。同时,结合监控工具和动态调整策略,可以确保集群资源的充分利用和作业的高效执行。