Ubuntu 上 HDFS 监控配置指南
一 内置监控入口
二 基于 Metrics2 的指标采集与 Prometheus 集成
*.sink.prometheus.class=org.apache.hadoop.metrics2.sink.prometheus.PrometheusSink
namenode.sink.prometheus.class=org.apache.hadoop.metrics2.sink.prometheus.PrometheusSink
datanode.sink.prometheus.class=org.apache.hadoop.metrics2.sink.prometheus.PrometheusSink
scrape_configs:
- job_name: 'hadoop'
static_configs:
- targets:
- '<namenode-host>:9870' # NameNode HTTP(3.x)
- '<datanode-host>:9864' # DataNode HTTP(3.x)
三 第三方监控与告警
#!/usr/bin/env bash
THRESHOLD=10
REPORT=$(hdfs dfsadmin -report 2>/dev/null)
USAGE_PCT=$(echo "$REPORT" | awk -F: '/Percent Used/ {gsub(/%/,"",$2); print $2; exit}')
if [ -z "$USAGE_PCT" ]; then
echo "UNKNOWN: cannot get HDFS usage"
exit 3
fi
if [ "$USAGE_PCT" -ge "$THRESHOLD" ]; then
echo "CRITICAL: HDFS used ${USAGE_PCT}% >= ${THRESHOLD}%"
exit 2
else
echo "OK: HDFS used ${USAGE_PCT}%"
exit 0
fi
四 关键监控项与告警阈值建议
五 快速排障清单