Hadoop在Linux上配置高可用性(HA)的完整步骤
gcc、openssl-devel等依赖)。JAVA_HOME环境变量(如export JAVA_HOME=/usr/java/jdk1.8.0_201/)。/etc/hosts文件或DNS)。zoo.cfg文件(如server.1=zoo1:2888:3888;server.2=zoo2:2888:3888;server.3=zoo3:2888:3888)。下载Hadoop安装包(如Apache Hadoop或CDH版本),解压至指定目录(如/usr/app/hadoop-3.3.6),配置环境变量HADOOP_HOME(如export HADOOP_HOME=/usr/app/hadoop-3.3.6),并将$HADOOP_HOME/bin添加至PATH。
core-site.xml:定义HDFS的默认文件系统地址(指向NameService)和ZooKeeper集群地址。<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-cluster</value> <!-- NameService名称 -->
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>zoo1:2181,zoo2:2181,zoo3:2181</value> <!-- ZooKeeper集群地址 -->
</property>
hdfs-site.xml:配置NameService、NameNode节点、共享存储(JournalNode)、自动故障转移等参数。<property>
<name>dfs.nameservices</name>
<value>hadoop-cluster</value> <!-- NameService名称,需与core-site.xml一致 -->
</property>
<property>
<name>dfs.ha.namenodes.hadoop-cluster</name>
<value>nn1,nn2</value> <!-- NameNode节点ID -->
</property>
<property>
<name>dfs.namenode.rpc-address.hadoop-cluster.nn1</name>
<value>namenode1:9000</value> <!-- nn1的RPC地址 -->
</property>
<property>
<name>dfs.namenode.rpc-address.hadoop-cluster.nn2</name>
<value>namenode2:9000</value> <!-- nn2的RPC地址 -->
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hadoop-cluster</value> <!-- JournalNode共享存储路径 -->
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value> <!-- 启用自动故障转移 -->
</property>
<property>
<name>dfs.client.failover.max.attempts</name>
<value>5</value> <!-- 客户端故障转移重试次数 -->
</property>
hdfs --daemon start journalnode,启动JournalNode服务(用于同步NameNode元数据)。yarn-site.xml:配置ResourceManager集群、ZooKeeper地址等参数。<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value> <!-- 启用ResourceManager HA -->
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-cluster</value> <!-- ResourceManager集群ID -->
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value> <!-- ResourceManager节点ID -->
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>zoo1:2181,zoo2:2181,zoo3:2181</value> <!-- ZooKeeper集群地址 -->
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value> <!-- 启用自动故障转移 -->
</property>
在所有DataNode节点上,修改hdfs-site.xml文件,配置DataNode数据存储路径:
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/datanode</value> <!-- DataNode数据目录 -->
</property>
在hadoop-env.sh文件中,指定Java路径:
export JAVA_HOME=/usr/java/jdk1.8.0_201/
namenode1)上执行一次:hdfs namenode -format
hdfs --daemon start journalnode
namenode1)上执行:hdfs --daemon start namenode
namenode2)上执行:hdfs namenode -bootstrapStandby # 同步主NameNode元数据
hdfs --daemon start namenode
resourcemanager1)上执行:yarn --daemon start resourcemanager
resourcemanager2)上执行:yarn --daemon start resourcemanager
hdfs --daemon start datanode
yarn --daemon start nodemanager
jps命令检查各节点进程是否正常(如主NameNode有NameNode进程,备用NameNode有NameNode和ZKFC进程)。namenode1)的NameNode进程:hdfs --daemon stop namenode。namenode2)是否自动切换为Active状态(通过jps命令查看NameNode进程,或访问http://namenode2:9870查看NameNode状态)。resourcemanager1)的ResourceManager进程,检查备用ResourceManager是否自动接管。http://namenode1:9870、ResourceManager的http://resourcemanager1:8088)监控集群状态。