HBase集群安装

爱被打了一巴掌 2022-05-25 05:47 330阅读 0赞

前言:阅读本文之前,可以去apache hbase官网阅读安装配置文档,看不懂可以借助翻译,计算机所有的各种新技术只有官网是唯一且准确的第一手资料。
http://hbase.apache.org/book.html#hbase_default_configurations
一、Hbase简介
Apache HBase 是一个高可靠性、高性能、面向列、可伸缩的分布式存储系统,是NoSQL数据库,基于Google Bigtable思想的开源实现,可在廉价的PC Server上搭建大规模结构化存储集群,利用Hadoop HDFS作为其文件存储系统,利用Hadoop MapReduce来处理HBase海量数据,使用Zookeeper协调服务器集群。
二、环境准备
笔者还是搭建hadoop、spark集群的那三个节点,要配置HBase Master、Master-backup、RegionServer各节点的进程规划如下:
centos 6.5
JDK-1.8.x
Zookeeper-3.4.6
Hadoop-2.7.3
以上这些都在笔者的其他博客里有详细介绍,不再赘述。
主机 IP 节点进程
master HMaster
worker1 Master-backup、RegionServer
worker2 RegionServer
三、安装Hbase-1.2.6
1.可从官网或者国内阿里云等镜像站下载Hbase
http://mirror.bit.edu.cn/apache/hbase/stable/hbase-1.2.6-bin.tar.gz
2.解压至/app/hbase/目录下

  1. [hadoop@master app]$ ls
  2. hadoop hbase hive java kafka scala spark tgz zookeeper

3.添加系统环境变量

  1. # User specific environment and startup programs
  2. export JAVA_HOME=/app/java/jdk1.8.0_141
  3. export HADOOP_HOME=/app/hadoop/hadoop-2.7.3
  4. export SCALA_HOME=/app/scala/scala-2.11.8
  5. export SPARK_HOME=/app/spark/spark-2.1.1
  6. export ZOOKEEPER_HOME=/app/zookeeper/zookeeper-3.4.6
  7. export KAFKA_HOME=/app/kafka/kafka_2.10-0.9.0.0
  8. export HIVE_HOME=/app/hive/apache-hive-2.1.1-bin
  9. export HBASE_HOME=/app/hbase/hbase-1.2.6
  10. PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin:$ZOOKEEPER_HOME/bin:$KAFKA_HOME/bin:$HIVE_HOME/bin:$HBASE_HOME/bin
  11. export PATH

别忘了source一下使用环境变量生效

  1. [hadoop@master app]$ source ~/.bash_profile

4.阅读apache hbase官方文档发现,有一项是要将hadoop的conf目录下的hdfs-site.xml复制到hbase的conf目录下,这样可以保证hdfs和hbase的配置一致,例如副本数hbase和hdfs要一致。
官网原文:
Procedure: HDFS Client Configuration
Of note, if you have made HDFS client configuration changes on your Hadoop cluster, such as configuration directives for HDFS clients, as opposed to server-side configurations, you must use one of the following methods to enable HBase to see and use these configuration changes:

Add a pointer to your HADOOP_CONF_DIR to the HBASE_CLASSPATH environment variable in hbase-env.sh.

Add a copy of hdfs-site.xml (or hadoop-site.xml) or, better, symlinks, under ${HBASE_HOME}/conf, or

if only a small set of HDFS client configurations, add them to hbase-site.xml.

  1. [hadoop@master conf]$ cp /app/hadoop/hadoop-2.7.3/etc/hadoop/hdfs-site.xml /app/hbase/hbase-1.2.6/conf/

5.配置hbase-site.xml
参照官方文档http://hbase.apache.org/book.html#_configuration_files
编辑 $HBASE_HOME/conf/hbase-site.xml

  1. <configuration>
  2. <property>
  3. <name>hbase.zookeeper.property.clientPort</name>
  4. <value>2181</value>
  5. </property>
  6. <property>
  7. <name>hbase.zookeeper.quorum</name>
  8. <value>master,worker1,worker2</value>
  9. <description>The directory shared by RegionServers.
  10. </description>
  11. </property>
  12. <property>
  13. <name>hbase.zookeeper.property.dataDir</name>
  14. <value>/app/zookeeper/zookeeper-3.4.6/data</value>
  15. <description>
  16. 注意这里的zookeeper数据目录与hadoop ha的共用,也即要与 zoo.cfg 中配置的一致
  17. Property from ZooKeeper config zoo.cfg.
  18. The directory where the snapshot is stored.
  19. </description>
  20. </property>
  21. <property>
  22. <name>hbase.rootdir</name>
  23. <value>hdfs://master:8020/hbase</value>
  24. <description>The directory shared by RegionServers.
  25. 官网多次强调这个目录不要预先创建,hbase会自行创建,否则会做迁移操作,引发错误
  26. 至于端口,有些是8020,有些是9000,看 $HADOOP_HOME/etc/hadoop/hdfs-site.xml 里面的配置,本实验配置的是
  27. dfs.namenode.rpc-address.hdcluster.nn1 , dfs.namenode.rpc-address.hdcluster.nn2
  28. </description>
  29. </property>
  30. <property>
  31. <name>hbase.cluster.distributed</name>
  32. <value>true</value>
  33. <description>分布式集群配置,这里要设置为true,如果是单节点的,则设置为false
  34. The mode the cluster will be in. Possible values are
  35. false: standalone and pseudo-distributed setups with managed ZooKeeper
  36. true: fully-distributed with unmanaged ZooKeeper Quorum (see hbase-env.sh)
  37. </description>
  38. </property>
  39. </configuration>

6.配置regionserver文件
编辑 $HBASE_HOME/conf/regionservers 文件,输入要运行 regionserver 的主机名

  1. worker1
  2. worker2

7.配置 backup-masters 文件(master备用节点)
HBase 支持运行多个 master 节点,因此不会出现单点故障的问题,但只能有一个活动的管理节点(active master),其余为备用节点(backup master),编辑 $HBASE_HOME/conf/backup-masters 文件进行配置备用管理节点的主机名
这一步也可不配置。

  1. worker1

8.配置 hbase-env.sh 文件
编辑 HBASE_HOME/conf/hbase-env.sh 配置环境变量,为了实现统一管理,本文使用单独配置的zookeeper,即将其中的 HBASE_MANAGES_ZK 设置为 false

  1. # Tell HBase whether it should manage it's own instance of Zookeeper or not.
  2. export HBASE_MANAGES_ZK=false
  3. #The default log rolling policy is RFA, where the log file is rolled as per the size defined for the
  4. export JAVA_HOME=/app/java/jdk1.8.0_141
  5. export HBASE_CLASSPATH=/app/hbase/hbase-1.2.6/conf
  6. # 此配置信息,设置由hbase自己管理zookeeper,不需要单独的zookeeper。
  7. export HBASE_HOME=/app/hbase/hbase-1.2.6
  8. export HADOOP_HOME=/app/hadoop/hadoop-2.7.3
  9. #Hbase日志目录
  10. export HBASE_LOG_DIR=/app/hbase/hbase-1.2.6/logs

将master节点的hbase文件拷贝至worker1和worker2

  1. [hadoop@master app]$ scp -r hbase/ hadoop@worker1:/app/
  2. [hadoop@master app]$ scp -r hbase/ hadoop@worker2:/app/

并将环境变量也拷贝过去

  1. [hadoop@master app]$ scp ~/.bash_profile hadoop@worker1:~/
  2. [hadoop@master app]$ scp ~/.bash_profile hadoop@worker2:~/

ssh到各节点source使其生效。
至此,hbase配置全部完成
9.测试启动
使用 $HBASE_HOME/bin/start-hbase.sh 指令启动整个集群,查看其shell脚本,可见其启动顺序。

  1. # HBASE-6504 - only take the first line of the output in case verbose gc is on
  2. distMode=`$bin/hbase --config "$HBASE_CONF_DIR" org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed | head -n 1`
  3. if [ "$distMode" == 'false' ]
  4. then
  5. "$bin"/hbase-daemon.sh --config "${HBASE_CONF_DIR}" $commandToRun master $@
  6. else
  7. "$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" $commandToRun zookeeper
  8. "$bin"/hbase-daemon.sh --config "${HBASE_CONF_DIR}" $commandToRun master
  9. "$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" \
  10. --hosts "${HBASE_REGIONSERVERS}" $commandToRun regionserver
  11. "$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" \
  12. --hosts "${HBASE_BACKUP_MASTERS}" $commandToRun master-backup
  13. fi

hbase的启动依赖于hadoop和zookeeper,所以我们按照上述的启动顺序,一一启动。
也就是使用 hbase-daemon.sh 命令依次启动 zookeeper、master、regionserver、master-backup

因此,我们也按照这个顺序,在各个节点进行启动

在启动HBase之前,必须先启动Hadoop,以便于HBase初始化、读取存储在hdfs上的数据
(1)启动zookeeper(三个节点分别启动)

  1. [hadoop@master bin]$ zkServer.sh status
  2. [hadoop@master bin]$ zkServer.sh status
  3. JMX enabled by default
  4. Using config: /app/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
  5. Mode: follower
  6. [hadoop@worker1 ~]$ zkServer.sh status
  7. JMX enabled by default
  8. Using config: /app/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
  9. Mode: leader
  10. [hadoop@worker2 ~]$ zkServer.sh status
  11. JMX enabled by default
  12. Using config: /app/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
  13. Mode: follower

(2)启动Hadoop分布式集群
这里使用start-all.sh启动

  1. [hadoop@master bin]$ start-all.sh
  2. This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
  3. Starting namenodes on [master]
  4. master: starting namenode, logging to /app/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-namenode-master.out
  5. worker2: starting datanode, logging to /app/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-datanode-worker2.out
  6. worker1: starting datanode, logging to /app/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-datanode-worker1.out
  7. master: starting datanode, logging to /app/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-datanode-master.out
  8. Starting secondary namenodes [0.0.0.0]
  9. 0.0.0.0: starting secondarynamenode, logging to /app/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-secondarynamenode-master.out
  10. starting yarn daemons
  11. starting resourcemanager, logging to /app/hadoop/hadoop-2.7.3/logs/yarn-hadoop-resourcemanager-master.out
  12. master: starting nodemanager, logging to /app/hadoop/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-master.out
  13. worker1: starting nodemanager, logging to /app/hadoop/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-worker1.out
  14. worker2: starting nodemanager, logging to /app/hadoop/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-worker2.out
  15. [hadoop@master bin]$ jps
  16. 3683 NodeManager
  17. 3715 Jps
  18. 3380 SecondaryNameNode
  19. 3575 ResourceManager
  20. 3223 DataNode
  21. 2890 QuorumPeerMain
  22. 3117 NameNode
  23. [hadoop@master bin]$

(3)启动hbase

  1. [hadoop@master app]$ start-hbase.sh
  2. starting master, logging to /app/hbase/hbase-1.2.6/logs/hbase-hadoop-master-master.out
  3. Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
  4. Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
  5. worker2: starting regionserver, logging to /app/hbase/hbase-1.2.6/logs/hbase-hadoop-regionserver-worker2.out
  6. worker1: starting regionserver, logging to /app/hbase/hbase-1.2.6/logs/hbase-hadoop-regionserver-worker1.out
  7. worker2: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
  8. worker2: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
  9. worker1: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
  10. worker1: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
  11. worker1: starting master, logging to /app/hbase/hbase-1.2.6/logs/hbase-hadoop-master-worker1.out

查看各节点jps
master:

  1. [hadoop@master app]$ jps
  2. 5762 Jps
  3. 3683 NodeManager
  4. 3380 SecondaryNameNode
  5. 5557 HMaster
  6. 3575 ResourceManager
  7. 3223 DataNode
  8. 2890 QuorumPeerMain
  9. 3117 NameNode

worker1:
这里start-hbase.sh脚本里启动了,Hmaster的备份进程
hbase-daemon.sh start master –backup &

  1. [hadoop@worker1 app]$ jps
  2. 3970 Jps
  3. 2883 NodeManager
  4. 2692 QuorumPeerMain
  5. 3722 HMaster
  6. 2798 DataNode
  7. 3615 HRegionServer

worker2:

  1. [hadoop@worker2 ~]$ jps
  2. 3424 HRegionServer
  3. 2730 QuorumPeerMain
  4. 2826 DataNode
  5. 2939 NodeManager
  6. 3612 Jps

至此hbase配置启动完成。
四、Hbase测试并使用
1.使用hbase shell进入到 hbase 的交互命令行界面,这时可进行测试使用

  1. [hadoop@master app]$ hbase shell
  2. SLF4J: Class path contains multiple SLF4J bindings.
  3. SLF4J: Found binding in [jar:file:/app/hbase/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  4. SLF4J: Found binding in [jar:file:/app/hadoop/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  5. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  6. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
  7. HBase Shell; enter 'help<RETURN>' for list of supported commands.
  8. Type "exit<RETURN>" to leave the HBase Shell
  9. Version 1.2.6, rUnknown, Mon May 29 02:25:32 CDT 2017
  10. hbase(main):001:0>

(1)查看集群状态和节点数量,很重要

  1. hbase(main):001:0> status
  2. 1 active master, 1 backup masters, 2 servers, 0 dead, 1.0000 average load

由此可知,一个活跃的master,一个备份masters,两个HRegionserver,零个死亡。
(2)创建表
hbase创建表create命令语法为:表名、列名1、列名2、列名3……

  1. hbase(main):002:0> create 'employee' ,'name','age'
  2. 0 row(s) in 6.4790 seconds => Hbase::Table - employee

(3)查看表

  1. hbase(main):003:0> list 'employee'
  2. TABLE
  3. employee
  4. 1 row(s) in 0.2430 seconds
  5. => ["employee"]

(4)导入数据

  1. => Hbase::Table - employee
  2. hbase(main):025:0> put 'employee','row1','name:firstname','james'
  3. 0 row(s) in 0.0240 seconds
  4. hbase(main):026:0> put 'employee','row1','name:lastname','lebron'
  5. 0 row(s) in 0.0210 seconds
  6. hbase(main):027:0> put 'employee','row2','age','33'
  7. 0 row(s) in 0.0150 seconds

(5)全表扫描数据

  1. hbase(main):028:0> scan 'employee'
  2. ROW COLUMN+CELL
  3. row1 column=name:firstname, timestamp=1525800600153, value=james
  4. row1 column=name:lastname, timestamp=1525800607228, value=lebron
  5. row2 column=age:, timestamp=1525800622854, value=33
  6. 2 row(s) in 0.0690 seconds

(6)根据条件查询数据

  1. hbase(main):029:0> get 'employee','row1'
  2. COLUMN CELL
  3. name:firstname timestamp=1525800600153, value=james
  4. name:lastname timestamp=1525800607228, value=lebron
  5. 2 row(s) in 0.0220 seconds
  6. hbase(main):003:0> get 'employee','row2'
  7. COLUMN CELL
  8. age: timestamp=1525800622854, value=33
  9. 1 row(s) in 0.1410 seconds

(7)表失效

  1. hbase(main):004:0> disable 'employee'
  2. 0 row(s) in 7.0270 seconds

(8)表重新生效

  1. hbase(main):005:0> enable 'employee'
  2. 0 row(s) in 5.1070 seconds

(9)删除数据表
使用drop命令对表进行删除,但只有表在失效的情况下,才能进行删除

  1. hbase(main):004:0> disable 'employee'
  2. 0 row(s) in 7.0270 seconds
  3. hbase(main):006:0> drop 'employee'

(10)退出 hbase shell

  1. quit

2.HBase 管理页面
HBase 还提供了管理页面,供用户查看,可更加方便地查看集群状态

在浏览器中输入 http://172.17.0.1:16010 地址(默认端口为 16010),即可进入到管理页面,如下图
这里写图片描述

至此hbase配置全部完成。

发表评论

表情:
评论列表 (有 0 条评论,330人围观)

还没有评论,来说两句吧...

相关阅读

    相关 HBase配置

    HBase集群配置 简介 HBase是一个高可靠、高性能、面向列、可伸缩的分布式存储系统,利用HBase技术可在廉价的PC Server上搭建大规模结构化存储集群。

    相关 HBase安装

    前言:阅读本文之前,可以去apache hbase官网阅读安装配置文档,看不懂可以借助翻译,计算机所有的各种新技术只有官网是唯一且准确的第一手资料。 [http://hba