Linux环境下Hadoop配置全流程
一 环境准备
sudo apt update && sudo apt install openjdk-8-jdk,验证:java -version。sudo apt install openssh-server。ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa,cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys,chmod 0600 ~/.ssh/authorized_keys,验证:ssh localhost 无需密码即可登录。二 安装与环境变量
wget https://downloads.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gzsudo tar -xzvf hadoop-3.3.1.tar.gz -C /opt~/.bashrc(或 /etc/profile.d/hadoop.sh):
export HADOOP_HOME=/opt/hadoop-3.3.1export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbinsource ~/.bashrc。以上步骤完成后即可在任意目录使用 hdfs、yarn 等命令。三 核心配置
$HADOOP_HOME/etc/hadoop,重点编辑:hadoop-env.sh、core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml、workers。/usr/lib/jvm/java-8-openjdk-amd64 或 /usr/java/jdk1.8.0_xxx)。<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop-3.3.1/tmp</value>
</property>
</configuration>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop-3.3.1/hadoop_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop-3.3.1/hadoop_data/hdfs/datanode</value>
</property>
</configuration>
cp mapred-site.xml.template mapred-site.xml):<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
</configuration>
localhost;伪分布式与完全分布式见下一节)。四 启动与验证
hdfs namenode -formatstart-dfs.shstart-yarn.shstart-all.shjps,应看到 NameNode、DataNode、ResourceManager、NodeManager(单机伪分布式四者均在本机)。hdfs dfs -ls /stop-dfs.sh、stop-yarn.sh 或 stop-all.sh。五 常见问题与扩展
core-site.xml 的 fs.defaultFS 为主节点(如 hdfs://hadoop1:8020),workers 写入所有 DataNode 主机名(每行一个)。hdfs namenode -format,随后在主节点启动集群并验证进程与 Web 界面。hadoop.tmp.dir、dfs.namenode.name.dir、dfs.datanode.data.dir 等目录存在且 Hadoop 运行用户 有读写权限。