Ubuntu环境下Hadoop数据存储指南
一 核心概念与目录
二 单机伪分布式配置步骤
sudo apt update && sudo apt install openjdk-11-jdkexport HADOOP_HOME=/usr/local/hadoopexport HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoopexport JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/hadoop/dfs/data</value>
</property>
</configuration>
mkdir -p /usr/local/hadoop/dfs/name /usr/local/hadoop/dfs/datahdfs namenode -formatstart-dfs.sh(如需YARN:start-yarn.sh)jps 应看到NameNode/DataNode(以及SecondaryNameNode/ResourceManager/NodeManager)hdfs dfs -ls /、hdfs dfs -mkdir /data、hdfs dfs -put localfile /data/。三 更换或扩展数据存储路径
stop-dfs.shsudo mv /usr/local/hadoop/dfs /data/hadoop/dfs<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///data/hadoop/dfs/data</value>
</property>
start-dfs.sh<property>
<name>dfs.datanode.data.dir</name>
<value>file:///data1/hadoop/datanode,file:///data2/hadoop/datanode</value>
</property>
dfs.replication=3(单节点伪分布式只能设为1)。四 分布式环境要点
fs.defaultFS=hdfs://master:9000hdfs namenode -formatstart-dfs.sh(集群建议配合start-yarn.sh)五 常见问题与排查
chown -R hadoop:hadoop /data/hadoop。netstat -tulpen | grep 9000排查,或临时停用占用进程。sudo ufw allow 9000,50070,8088放行必要端口。stop-dfs.sh && start-dfs.sh),并用hdfs getconf -confKey dfs.datanode.data.dir查看实际生效路径。