温馨提示×

Debian HBase集群搭建方法

小樊
40
2025-10-21 12:44:23
栏目: 智能运维

Prerequisites
Before starting, ensure all cluster nodes (master and region servers) meet the following requirements:

  • Network Connectivity: Nodes can communicate via hostname (add entries to /etc/hosts if needed).
  • Time Synchronization: Install and configure ntp or chrony to keep system clocks in sync.
  • SSH Access: Enable passwordless SSH between nodes for HBase master/worker communication.
  • Java Environment: Install OpenJDK 8 or 11 on all nodes. Verify with java -version.
  • Hadoop & ZooKeeper: Deploy a running Hadoop HDFS cluster (for distributed storage) and ZooKeeper ensemble (for coordination). HBase relies on these services.

Step 1: Download and Install HBase

  1. Choose a stable HBase version (e.g., 2.4.x) from the Apache HBase website.
  2. Download and extract the tarball on all nodes (master and region servers):
    wget https://archive.apache.org/dist/hbase/2.4.9/hbase-2.4.9-bin.tar.gz
    tar -xzvf hbase-2.4.9-bin.tar.gz -C /opt
    sudo mv /opt/hbase-2.4.9 /usr/local/hbase
    
  3. Set ownership to the current user for easier management:
    sudo chown -R $USER:$USER /usr/local/hbase
    

Step 2: Configure Environment Variables
Edit the ~/.bashrc file (or /etc/profile for system-wide access) to add HBase environment variables:

export HBASE_HOME=/usr/local/hbase
export PATH=$PATH:$HBASE_HOME/bin

Apply changes immediately:

source ~/.bashrc

Step 3: Configure HBase Core Files

  1. Edit hbase-env.sh (located in $HBASE_HOME/conf):
    • Set JAVA_HOME to your JDK path (e.g., /usr/lib/jvm/java-11-openjdk-amd64).
    • Disable HBase’s built-in ZooKeeper (since you’re using an external ensemble):
      export HBASE_MANAGES_ZK=false
      
  2. Edit hbase-site.xml (critical for cluster setup):
    Add the following properties to define HBase’s distributed mode, data storage, and ZooKeeper integration:
    <configuration>
      <!-- Root directory for HBase data in HDFS -->
      <property>
        <name>hbase.rootdir</name>
        <value>hdfs://namenode:8020/hbase</value> <!-- Replace with your NameNode hostname/IP -->
      </property>
      <!-- Enable distributed mode -->
      <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
      </property>
      <!-- External ZooKeeper quorum (comma-separated list of ZooKeeper nodes) -->
      <property>
        <name>hbase.zookeeper.quorum</name>
        <value>zookeeper1,zookeeper2,zookeeper3</value> <!-- Replace with your ZooKeeper hostnames/IPs -->
      </property>
      <!-- Directory for ZooKeeper local data -->
      <property>
        <name>hbase.zookeeper.property.dataDir</name>
        <value>/var/lib/zookeeper</value> <!-- Ensure this directory exists on all ZooKeeper nodes -->
      </property>
    </configuration>
    
  3. Configure regionservers (list all region server nodes):
    Edit $HBASE_HOME/conf/regionservers and add each region server’s hostname (one per line). The master node is not included here by default.

Step 4: Start Hadoop and ZooKeeper
Before launching HBase, ensure HDFS and ZooKeeper are running:

  1. Start HDFS: On the NameNode, run:
    hdfs namenode -format  # Format HDFS (only needed once)
    start-dfs.sh           # Start HDFS daemons (NameNode, DataNodes)
    start-yarn.sh          # Start YARN (if using MapReduce)
    
  2. Start ZooKeeper: On each ZooKeeper node, run:
    zkServer.sh start
    
    Verify ZooKeeper status with zkServer.sh status (ensure at least one node is in “leader” mode).

Step 5: Start HBase Cluster
On the HBase master node, execute the following command to start all HBase services:

start-hbase.sh

This script starts the HMaster (manages the cluster) and RegionServers (handle data storage) on their respective nodes.

To verify processes are running, use jps on each node:

  • Master node: Should show HMaster.
  • Region server nodes: Should show HRegionServer.

Step 6: Validate the Cluster

  1. Access HBase Shell: Run the following command on any node (master or region server):
    hbase shell
    
  2. Check Cluster Status: In the HBase shell, execute:
    status
    
    You should see output indicating the number of region servers, HMaster status, and ZooKeeper connection details.
  3. Test Basic Operations: Create a table, insert data, and query it to confirm functionality:
    create 'test_table', 'cf'  # Create a table named 'test_table' with column family 'cf'
    put 'test_table', 'row1', 'cf:col1', 'value1'  # Insert data
    get 'test_table', 'row1'  # Retrieve data
    

Post-Installation Checks

  • Logs: Monitor HBase logs (located in $HBASE_HOME/logs) for errors or warnings.
  • Web UI: Access the HBase master web interface at http://<master-node-ip>:16010 (default port) to view cluster metrics.
  • Firewall: Allow required ports (e.g., 16000-16030 for HBase, 2181-2186 for ZooKeeper, 50070 for HDFS) using ufw or your firewall tool.

Key Notes for Production

  • High Availability: Configure multiple HMaster nodes and ZooKeeper ensemble (odd number of nodes) for fault tolerance.
  • Performance Tuning: Adjust parameters like hbase.regionserver.handler.count (handler threads), hbase.hregion.memstore.flush.size (flush threshold), and compression (hbase.hregion.compress.algo) based on your hardware and workload.
  • Monitoring: Use tools like Prometheus + Grafana or Ambari to track cluster health (e.g., RegionServer load, memory usage).

0