Prerequisites for HDFS High Availability (HA) on Debian
Before configuring HDFS HA, ensure the following prerequisites are met:
sudo apt install openjdk-11-jdk).~/.bashrc (e.g., export HADOOP_HOME=/usr/local/hadoop, export PATH=$PATH:$HADOOP_HOME/bin).namenode1, namenode2, journalnode1) and update /etc/hosts with IP-hostname mappings for all nodes.ssh-keygen -t rsa and copy to other nodes using ssh-copy-id).Step 1: Configure JournalNode Nodes
JournalNodes store edit logs (transaction records for HDFS metadata) and ensure consistency between Active and Standby NameNodes.
sudo mkdir -p /usr/local/hadoop/journalnode/data
sudo chown -R $USER:$USER /usr/local/hadoop/journalnode
$HADOOP_HOME/etc/hadoop/hdfs-site.xml on all nodes:<property>
<name>dfs.journalnode.edits.dir</name>
<value>/usr/local/hadoop/journalnode/data</value>
</property>
hdfs namenode -formatJournalNode
hadoop-daemon.sh start journalnode
Verify status with jps (should show JournalNode processes).Step 2: Configure NameNode High Availability
This step enables two NameNodes (Active/Standby) to share metadata via JournalNodes.
$HADOOP_HOME/etc/hadoop/core-site.xml to define the HDFS namespace and ZooKeeper address:<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value> <!-- Logical name for the HDFS cluster -->
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>zk1:2181,zk2:2181,zk3:2181</value> <!-- ZooKeeper ensemble addresses -->
</property>
$HADOOP_HOME/etc/hadoop/hdfs-site.xml to configure NameNode roles, RPC/HTTP addresses, shared edits, and failover:<property>
<name>dfs.nameservices</name>
<value>mycluster</value> <!-- Must match fs.defaultFS -->
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value> <!-- Names of NameNodes -->
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>namenode1:8020</value> <!-- RPC address for nn1 -->
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>namenode2:8020</value> <!-- RPC address for nn2 -->
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>namenode1:9870</value> <!-- HTTP address for nn1 -->
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>namenode2:9870</value> <!-- HTTP address for nn2 -->
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/mycluster</value> <!-- JournalNode quorum for shared edits -->
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> <!-- Client-side failover proxy -->
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value> <!-- Method to prevent split-brain (e.g., kill standby NN's process) -->
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/$USER/.ssh/id_rsa</value> <!-- Path to private key for SSH fencing -->
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value> <!-- Enable automatic failover -->
</property>
nn1):hdfs namenode -format
nn1:hadoop-daemon.sh start namenode
nn1 to nn2):hdfs namenode -bootstrapStandby
nn2:hadoop-daemon.sh start namenode
Verify both NameNodes are running with hdfs haadmin -getServiceState nn1 (should return “active”) and hdfs haadmin -getServiceState nn2 (should return “standby”).Step 3: Start HDFS Services
Start all HDFS components in the correct order:
start-dfs.sh # Starts JournalNodes, NameNodes, and DataNodes
Check cluster status with:
hdfs dfsadmin -report # Lists DataNodes and their health
hdfs haadmin -getAllServiceStates # Shows NameNode states (active/standby)
Access NameNode Web UIs (e.g., http://namenode1:9870, http://namenode2:9870) to confirm HA status.
Step 4: Test Automatic Failover
Simulate a failure to verify automatic failover works:
nn1, find the NameNode PID (jps | grep NameNode) and kill it:kill -9 <NameNode_PID>
nn2, check its state:hdfs haadmin -getServiceState nn2 # Should return "active"
nn1 and verify it becomes standby:hadoop-daemon.sh start namenode
hdfs haadmin -getServiceState nn1 # Should return "standby"
hdfs dfs -put /local/file.txt /test/
hdfs dfs -get /test/file.txt /local/ # Should succeed after failover
Step 5: Monitor and Maintain
Set up monitoring to detect issues early:
$HADOOP_HOME/logs/hadoop-*-namenode-*.log) for errors.