确保Ubuntu上Hadoop高可用性的关键步骤
apt install openjdk-8-jdk);配置静态IP、主机名解析(/etc/hosts文件添加节点映射)及节点间SSH免密登录(ssh-keygen -t rsa + ssh-copy-id)。/opt/hadoop,设置HADOOP_HOME(/opt/hadoop)、PATH(包含$HADOOP_HOME/bin)环境变量(~/.bashrc文件)。dfs.nameservices=mycluster)及ZooKeeper集群地址(ha.zookeeper.quorum=zk1:2181,zk2:2181,zk3:2181)。dfs.ha.namenodes.mycluster=nn1,nn2、dfs.namenode.rpc-address.mycluster.nn1=node1:8020、dfs.namenode.rpc-address.mycluster.nn2=node2:8020)、JournalNode同步路径(dfs.namenode.shared.edits.dir=qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/mycluster)、自动故障转移(dfs.ha.automatic-failover.enabled=true)及隔离机制(dfs.ha.fencing.methods=sshfence、dfs.ha.fencing.ssh.private-key-files=/home/ubuntu/.ssh/id_rsa)。yarn.resourcemanager.ha.enabled=true、yarn.resourcemanager.cluster-id=yarn-cluster、yarn.resourcemanager.ha.rm-ids=rm1,rm2、yarn.resourcemanager.hostname.rm1=node1、yarn.resourcemanager.hostname.rm2=node2)。ZOOKEEPER_HOME(/opt/zookeeper)及dataDir(/opt/zookeeper/data)。server.1=node1:2888:3888、server.2=node2:2888:3888、server.3=node3:2888:3888),并在dataDir目录下创建myid文件(内容为节点ID,如node1对应1)。zkServer.sh start,通过zkServer.sh status确认集群状态(Leader/Follower)。hdfs namenode -formatJournalNode(格式化JournalNode),然后启动JournalNode服务(hadoop-daemon.sh start journalnode或start-dfs.sh)。hdfs namenode -format,生成元数据目录(/opt/hadoop/data/namenode)。/opt/hadoop/data/namenode目录。hdfs namenode -initializeSharedEdits,将本地编辑日志同步到JournalNode集群。start-dfs.sh)。hdfs namenode,然后在Standby NameNode节点执行hdfs namenode -bootstrapStandby(同步元数据)。yarn resourcemanager,然后在Standby ResourceManager节点(如node2)执行yarn resourcemanager。hadoop-daemon.sh start datanode,在所有NodeManager节点执行yarn-daemon.sh start nodemanager。jps,确认关键进程运行(Active NameNode、Standby NameNode、ZKFC、ResourceManager、Standby ResourceManager、JournalNode、DataNode、NodeManager)。http://node1:9870(HDFS Web UI)查看NameNode状态(Active/Standby),访问http://node1:8088(YARN Web UI)查看ResourceManager状态。hadoop-daemon.sh stop namenode),等待1-2分钟,通过Web UI或jps确认Standby NameNode已提升为Active;恢复原Active节点,确认其转为Standby状态。/opt/hadoop/data/namenode),使用hdfs dfsadmin -backup命令或第三方工具(如DistCp)。tickTime=2000)、HDFS块大小(dfs.blocksize=256M,提升并行处理能力)。