Pre-Upgrade Preparation
Backup all critical data (HDFS files, local configurations, and logs) to prevent loss during the upgrade. Verify sufficient disk space (at least 10-20% free) and internet connectivity. Check the current Hadoop version using hadoop version and confirm compatibility with your Debian version (e.g., Hadoop 3.x works best with Debian 10+).
System Update
Update the Debian package index and upgrade all installed packages to their latest stable versions. Run:
sudo apt update && sudo apt upgrade -y && sudo apt full-upgrade -y
Clean up unused packages and cached files to free disk space:
sudo apt autoremove -y && sudo apt clean
This ensures your system is up-to-date and reduces conflicts with the new Hadoop version.
Hadoop Version Upgrade
Stop all Hadoop services to avoid file corruption during the upgrade:
sudo systemctl stop hadoop-namenode hadoop-datanode hadoop-yarn-resourcemanager hadoop-yarn-nodemanager hadoop-jobhistoryserver
If upgrading across major versions (e.g., 2.x → 3.x), run the Hadoop upgrade script to migrate metadata and data structures. For example:
sudo -u hdfs hadoop namenode -upgrade
sudo -u hdfs hadoop datanode -upgrade
Replace the old Hadoop binaries with the new version (download from the Apache Hadoop website and extract to /usr/local/hadoop). Update environment variables in ~/.bashrc or /etc/profile (e.g., HADOOP_HOME, PATH) and reload with source ~/.bashrc.
Configuration Adjustment
Modify Hadoop configuration files (core-site.xml, hdfs-site.xml, yarn-site.xml, mapred-site.xml) to align with the new version’s requirements. Common changes include updating default ports, adjusting memory allocation (e.g., mapreduce.map.memory.mb), or enabling new features (e.g., erasure coding in Hadoop 3.x). Validate configurations using hadoop checknative (for native libraries) or by running a test job.
Post-Upgrade Validation
Start Hadoop services sequentially:
sudo systemctl start hadoop-namenode
sudo systemctl start hadoop-datanode
sudo systemctl start hadoop-yarn-resourcemanager
sudo systemctl start hadoop-yarn-nodemanager
sudo systemctl start hadoop-jobhistoryserver
Verify the upgrade:
hadoop version (should display the new version).hdfs dfsadmin -report.yarn node -list.hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar pi 10 100) to ensure functionality.Ongoing Monitoring
Use system tools to monitor cluster health:
jps.tail -f /var/log/syslog or journalctl -u hadoop-namenode -f./var/log/hadoop/hdfs/hadoop-namenode-*.log) for warnings or failures.