CentOS 7 下搭建 Hadoop 3.x 集群
一 环境规划与准备
192.168.1.100 master
192.168.1.101 slave1
192.168.1.102 slave2
sudo systemctl stop firewalld
sudo systemctl disable firewalld
sudo setenforce 0
sudo sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
sudo useradd hadoop
sudo passwd hadoop
su - hadoop
以上准备与网络、防火墙、SELinux、hosts 配置为集群互通与后续服务启动的基础。
二 安装 Java 与 Hadoop
sudo yum install -y java-1.8.0-openjdk-devel
java -version
wget https://downloads.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
sudo tar -xzvf hadoop-3.3.1.tar.gz -C /usr/local
sudo mv /usr/local/hadoop-3.3.1 /usr/local/hadoop
sudo chown -R hadoop:hadoop /usr/local/hadoop
echo 'export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk' >> ~/.bashrc
echo 'export HADOOP_HOME=/usr/local/hadoop' >> ~/.bashrc
echo 'export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin' >> ~/.bashrc
source ~/.bashrc
以上步骤完成 Java 与 Hadoop 的基础安装与环境变量配置。
三 配置 Hadoop 核心文件
mkdir -p $HADOOP_HOME/hdfs/{namenode,datanode}
mkdir -p $HADOOP_HOME/yarn/local
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
</configuration>
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/hadoop/hdfs/datanode</value>
</property>
</configuration>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/usr/local/hadoop/yarn/local</value>
</property>
</configuration>
slave1
slave2
rsync -av /usr/local/hadoop slave1:/usr/local/
rsync -av /usr/local/hadoop slave2:/usr/local/
ssh slave1 'sudo chown -R hadoop:hadoop /usr/local/hadoop'
ssh slave2 'sudo chown -R hadoop:hadoop /usr/local/hadoop'
以上为核心配置与目录规划,覆盖 core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml 与 workers 文件。
四 启动与验证
ssh-keygen -t rsa -P ''
ssh-copy-id master
ssh-copy-id slave1
ssh-copy-id slave2
hdfs namenode -format
start-dfs.sh
start-yarn.sh
jps
# master 应看到:NameNode、ResourceManager
# slave 应看到:DataNode、NodeManager
hdfs dfsadmin -report
# NameNode Web UI:http://master:50070
# ResourceManager Web UI:http://master:8088
以上步骤完成集群启动与可用性验证,包含 SSH 免密、NameNode 格式化、启动脚本与 Web UI 检查。
五 常见问题与优化建议