Ubuntu Hadoop 资源调度如何配置

Ubuntu 上 Hadoop 资源调度配置指南

一前置准备与基础配置

确认 Hadoop 版本为 2.x/3.x，并统一使用新参数 fs.defaultFS（而非已废弃的 fs.default.name）。示例：core-site.xml 中设置 fs.defaultFS=hdfs://:8020（Hadoop 3.x 常用 8020，Hadoop 2.x 也常用 8020，1.x 默认 9000）。主机名建议使用 FQDN，避免 localhost 导致跨节点访问异常。
启用 YARN 作为执行框架：mapred-site.xml 中设置 mapreduce.framework.name=yarn。
基础 YARN 服务参数：yarn-site.xml 中设置 yarn.nodemanager.aux-services=mapreduce_shuffle、yarn.nodemanager.aux-services.mapreduce.shuffle.class=org.apache.hadoop.mapred.ShuffleHandler、yarn.resourcemanager.hostname=、yarn.resourcemanager.webapp.address=0.0.0.0:8088。
启动后在浏览器访问 http://:8088 检查 ResourceManager 是否正常。

二选择调度器与启用方式

三种常用调度器：
1. Capacity Scheduler（容量调度器）：多队列、配额与弹性，适合多业务线资源隔离与保障。
2. Fair Scheduler（公平调度器）：按权重在运行作业间动态公平分配资源，适合共享集群、多用户公平。
3. FIFO Scheduler（先进先出）：单队列，简单但不利于多业务隔离。
启用方式（以 Capacity Scheduler 为例）：在 capacity-scheduler.xml 中设置 yarn.resourcemanager.scheduler.class=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler，并在该文件内定义队列与配额；如需公平调度，则改为 FairScheduler 并在 fair-scheduler.xml 中配置队列与权重。
队列组织以树形结构定义，根队列为 yarn.scheduler.capacity.root，子队列通过 yarn.scheduler.capacity..queues 以逗号分隔配置。

三关键资源配置步骤与示例

节点资源上报（每个 NodeManager）：在 yarn-site.xml 设置
- yarn.nodemanager.resource.memory-mb：节点可分配给容器的总内存（MB）
- yarn.nodemanager.resource.cpu-vcores：节点可分配给容器的总 vcore 数
容器与 AM 资源边界：在 yarn-site.xml 设置
- yarn.scheduler.minimum-allocation-mb / maximum-allocation-mb：单个容器可申请的最小/最大内存
- yarn.scheduler.capacity.maximum-am-resource-percent：集群允许为 ApplicationMaster 预留的最大资源比例（如 0.5 表示最多 50%）
队列容量与上限（Capacity Scheduler，示例为单队列 root.default）：在 capacity-scheduler.xml 设置
- yarn.scheduler.capacity.root.default.capacity：队列容量百分比（如 5000 表示 50.00%）
- yarn.scheduler.capacity.root.default.maximum-capacity：队列可使用的最大容量百分比（如 10000 表示 100.00%）
示例片段（仅示意，数值需按节点资源与业务目标调整）：
- yarn-site.xml
  - yarn.nodemanager.resource.memory-mb=8192
  - yarn.nodemanager.resource.cpu-vcores=8
  - yarn.scheduler.minimum-allocation-mb=1024
  - yarn.scheduler.maximum-allocation-mb=8192
  - yarn.scheduler.capacity.maximum-am-resource-percent=0.5
- capacity-scheduler.xml
  - yarn.scheduler.capacity.root.queues=default
  - yarn.scheduler.capacity.root.default.capacity=5000
  - yarn.scheduler.capacity.root.default.maximum-capacity=10000
提示：队列容量单位为“百分比×100”（如 5000=50.00%），请确保各级队列容量与上限之和符合约束。

四生效验证与常用监控

使配置生效：在 ResourceManager 与 NodeManager 节点执行滚动重启（先停 NodeManager，再启 ResourceManager，最后启 NodeManager），或使用滚动升级方式避免业务中断。
快速自检：
- 执行 jps 检查进程：ResourceManager、NodeManager、NameNode、DataNode（以及可选的 JobHistoryServer）。
- 访问 http://:8088 查看集群与应用状态、队列资源使用与调度器类型。
- 使用命令验证核心配置：
  - hadoop org.apache.hadoop.hdfs.tools.GetConf -confKey fs.defaultFS（检查默认文件系统）
  - hadoop fs -ls /（检查 HDFS 可用性）
作业侧观察：提交测试作业，观察是否在预期队列运行、容器规格是否符合 minimum/maximum-allocation-mb 的约束。

五常见排错与优化建议

无法跨节点访问：检查 fs.defaultFS 是否使用 FQDN 与正确端口（如 8020），避免 localhost；确保 /etc/hosts 与 DNS 正确解析。
队列无资源或无法提交：核对 capacity/maximum-capacity 配置是否过小或总和超限；检查 maximum-am-resource-percent 是否过低导致 AM 无法启动。
容器申请被拒或 OOM：确保 yarn.scheduler.minimum-allocation-mb ≤ 容器请求 ≤ yarn.scheduler.maximum-allocation-mb，且 yarn.nodemanager.resource.memory-mb 足够覆盖节点上所有容器需求。
性能优化方向：结合作业特征调整 dfs.blocksize（如 256MB）、开启中间数据压缩（如 mapreduce.map.output.compress=true）、合理设置 mapreduce.{map|reduce}.java.opts 的堆大小，提升吞吐与稳定性。

最新问答

相关标签