Debian如何安装HDFS

Prerequisites
Before installing HDFS on Debian, ensure your system is up-to-date and install essential tools:

sudo apt update && sudo apt upgrade -y
sudo apt install wget ssh vim -y

These commands update package lists, upgrade installed packages, and install wget (for downloading Hadoop), ssh (for remote access), and vim (for configuration editing).

1. Install Java Environment
Hadoop requires Java 8 or higher. Install OpenJDK 11 (recommended for compatibility):

sudo apt install openjdk-11-jdk -y

Verify the installation:

java -version

You should see output indicating OpenJDK 11 is installed.

2. Create a Dedicated Hadoop User
For security and isolation, create a non-root user (e.g., hadoop) and add it to the sudo group:

sudo adduser hadoop
sudo usermod -aG sudo hadoop

Switch to the new user:

su - hadoop

This user will manage all Hadoop operations.

3. Download and Extract Hadoop
Download the latest stable Hadoop release (e.g., 3.3.6) from the Apache website:

wget https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz

Extract the archive to /usr/local/ and rename the directory for simplicity:

sudo tar -xzvf hadoop-3.3.6.tar.gz -C /usr/local/
sudo mv /usr/local/hadoop-3.3.6 /usr/local/hadoop

Change ownership of the Hadoop directory to the hadoop user:

sudo chown -R hadoop:hadoop /usr/local/hadoop

4. Configure Environment Variables
Set up Hadoop-specific environment variables in /etc/profile (system-wide) or ~/.bashrc (user-specific). Open the file with vim:

vim ~/.bashrc

Add the following lines at the end:

export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64  # Adjust if using a different Java version

Load the changes into the current session:

source ~/.bashrc

Verify the variables are set:

echo $HADOOP_HOME  # Should output /usr/local/hadoop

5. Configure SSH Passwordless Login
Hadoop requires passwordless SSH between the NameNode and DataNodes. Generate an SSH key pair:

ssh-keygen -t rsa -b 4096 -C "hadoop@debian"

Press Enter to accept default file locations and skip passphrase entry. Copy the public key to the local machine (for single-node clusters) or other cluster nodes:

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys

Test passwordless login:

ssh localhost

You should log in without entering a password.

6. Configure Hadoop Core Files
Navigate to the Hadoop configuration directory:

cd $HADOOP_HOME/etc/hadoop

Edit the following files to define HDFS behavior:

core-site.xml: Sets the default file system (HDFS) and NameNode address.

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://namenode:9000</value>  <!-- Replace 'namenode' with your NameNode's hostname/IP -->
    </property>
</configuration>

hdfs-site.xml: Configures replication factor (for fault tolerance) and data directories.

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>  <!-- Set to 3 for multi-node clusters; 1 for single-node -->
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/opt/hadoop/hdfs/namenode</value>  <!-- Create this directory later -->
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/opt/hadoop/hdfs/datanode</value>  <!-- Create this directory later -->
    </property>
</configuration>

mapred-site.xml: Specifies the MapReduce framework (YARN).

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

yarn-site.xml: Configures YARN resource management.

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
</configuration>

7. Create HDFS Data Directories
Create the directories specified in hdfs-site.xml for NameNode and DataNode storage:

sudo mkdir -p /opt/hadoop/hdfs/namenode
sudo mkdir -p /opt/hadoop/hdfs/datanode
sudo chown -R hadoop:hadoop /opt/hadoop  # Change ownership to the hadoop user

8. Format the NameNode
The NameNode must be formatted once before starting HDFS. Run this command carefully (it will erase existing HDFS data):

hdfs namenode -format

You should see output indicating successful formatting.

9. Start HDFS Services
Start the HDFS daemons (NameNode and DataNode) using the start-dfs.sh script:

$HADOOP_HOME/sbin/start-dfs.sh

Check the status of HDFS processes with jps:

jps

You should see NameNode and DataNode running (along with other Java processes).

10. Verify HDFS Installation
Use HDFS commands to confirm the cluster is operational:

List the root directory:
```
hdfs dfs -ls /
```
Create a test directory:
```
hdfs dfs -mkdir -p /user/hadoop/input
```

Upload a local file to HDFS:

echo "Hello, HDFS!" > test.txt
hdfs dfs -put test.txt /user/hadoop/input/

Read the file from HDFS:

hdfs dfs -cat /user/hadoop/input/test.txt

You should see the output Hello, HDFS!

Troubleshooting Tips

Port Conflicts: Ensure ports like 9000 (NameNode) and 50070 (Web UI) are not blocked by your firewall.
Java Issues: Verify JAVA_HOME is correctly set in $HADOOP_HOME/etc/hadoop/hadoop-env.sh.
Permission Errors: Use chown to ensure the hadoop user owns all Hadoop-related directories.

最新问答

相关标签