部署规划

HBase全称叫Hadoop Database,它的数据存储在HDFS上。我们的实验环境依然基于上个主题Hive的配置,参考大数据学习(11)—— Hive元数据服务模式搭建

在此基础上,增加HBase的部署规划。我感觉这8G的内存马上要跑不动了。

主机 RegionServer Master
server01  •  
server02  •  
server03  •

安装HBase

把HBase解压到/usr目录下,版本是2.26。

[root@server01 home]# tar -xvf hbase-2.2.6-bin.tar.gz -C /usr/

把解压好的目录权限修改为hadoop用户和组。

[root@server01 usr]# chown -R hadoop:hadoop hbase-2.2.6/
[root@server01 usr]# ll
总用量 92
drwxr-xr-x. 10 hadoop hadoop 184 9月 24 08:04 apache-hive-3.1.2
drwxr-xr-x. 7 hadoop hadoop 146 9月 24 12:57 apache-zookeeper-3.5.8
dr-xr-xr-x. 2 root root 24576 10月 23 13:11 bin
drwxr-xr-x. 2 root root 6 4月 11 2018 etc
drwxr-xr-x. 2 root root 6 4月 11 2018 games
drwxr-xr-x. 11 hadoop hadoop 227 9月 24 12:58 hadoop-3.3.0
drwxr-xr-x. 6 hadoop hadoop 170 12月 5 14:58 hbase-2.2.6
drwxr-xr-x. 3 root root 23 9月 22 16:44 include
drwxr-xr-x. 4 root root 69 10月 23 13:06 java
dr-xr-xr-x. 27 root root 4096 9月 22 16:46 lib
dr-xr-xr-x. 35 root root 20480 9月 22 16:46 lib64
drwxr-xr-x. 21 root root 4096 9月 22 16:46 libexec
drwxr-xr-x. 12 root root 131 9月 22 16:44 local
dr-xr-xr-x. 2 root root 12288 9月 29 18:17 sbin
drwxr-xr-x. 77 root root 4096 9月 23 18:21 share
drwxr-xr-x. 4 root root 34 9月 22 16:44 src
lrwxrwxrwx. 1 root root 10 9月 22 16:44 tmp -> ../var/tmp

修改系统环境变量,增加HBase的路径设置

JAVA_HOME=/usr/java/jdk1.8.0
ZOOKEEPER_HOME=/usr/apache-zookeeper-3.5.8
HADOOP_HOME=/usr/hadoop-3.3.0
HIVE_HOME=/usr/apache-hive-3.1.2
HBASE_HOME=/usr/hbase-2.2.6
PATH=$PATH:$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HBASE_HOME/bin

切换到hadoop用户,修改配置文件hbase-env.sh,创建/opt/hadoop/pids目录。

# The java implementation to use.  Java 1.8+ required.
export JAVA_HOME=/usr/java/jdk1.8.0/ # Tell HBase whether it should manage it's own instance of ZooKeeper or not.
export HBASE_MANAGES_ZK=false

# The directory where pid files are stored. /tmp by default.
  export HBASE_PID_DIR=/opt/hadoop/pids

修改配置文件hbase-site.xml

<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://mycluster/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>server01,server02,server03</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>./tmp</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
</configuration>

修改regionservers文件,增加RegionServer配置

[hadoop@server01 conf]$ cat regionservers
server01
server02
server03

HDFS客户端配置

官网原文

Of note, if you have made HDFS client configuration changes on your Hadoop cluster, such as configuration directives for HDFS clients, as opposed to server-side configurations, you must use one of the following methods to enable HBase to see and use these configuration changes:

  1. Add a pointer to your HADOOP_CONF_DIR to the HBASE_CLASSPATH environment variable in hbase-env.sh.

  2. Add a copy of hdfs-site.xml (or hadoop-site.xml) or, better, symlinks, under ${HBASE_HOME}/conf, or

  3. if only a small set of HDFS client configurations, add them to hbase-site.xml.

这里采用第二种方式,建一个链接吧。

[hadoop@server01 conf]$ ln -s /usr/hadoop-3.3.0/etc/hadoop/hdfs-site.xml hdfs-site.xml
[hadoop@server01 conf]$ ll
总用量 44
-rw-r--r--. 1 hadoop hadoop 1811 1月 22 2020 hadoop-metrics2-hbase.properties
-rw-r--r--. 1 hadoop hadoop 4284 1月 22 2020 hbase-env.cmd
-rw-r--r--. 1 hadoop hadoop 7533 12月 5 15:43 hbase-env.sh
-rw-r--r--. 1 hadoop hadoop 2257 1月 22 2020 hbase-policy.xml
-rw-r--r--. 1 hadoop hadoop 2322 12月 5 16:40 hbase-site.xml
lrwxrwxrwx. 1 hadoop hadoop 42 12月 5 17:08 hdfs-site.xml -> /usr/hadoop-3.3.0/etc/hadoop/hdfs-site.xml
-rw-r--r--. 1 hadoop hadoop 1169 1月 22 2020 log4j-hbtop.properties
-rw-r--r--. 1 hadoop hadoop 4977 1月 22 2020 log4j.properties
-rw-r--r--. 1 hadoop hadoop 27 12月 5 16:43 regionservers

第一台机的配置全部完成了。把/usr/hbase-2.2.6用scp拷贝到第二台和第三台机器相同目录下,并修改系统环境变量。至此,所有安装和配置全部完成。

启动HBase

在server03上执行start-hbase.sh,启动hbase

[hadoop@server03 conf]$ start-hbase.sh
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hadoop-3.3.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hbase-2.2.6/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
running master, logging to /usr/hbase-2.2.6/bin/../logs/hbase-hadoop-master-server03.out
server03: running regionserver, logging to /usr/hbase-2.2.6/bin/../logs/hbase-hadoop-regionserver-server03.out
server02: running regionserver, logging to /usr/hbase-2.2.6/bin/../logs/hbase-hadoop-regionserver-server02.out
server01: running regionserver, logging to /usr/hbase-2.2.6/bin/../logs/hbase-hadoop-regionserver-server01.out

在server02上执行hbase shell,启动命令行

[hadoop@server02 opt]$ hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hadoop-3.3.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hbase-2.2.6/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell
Version 2.2.6, r88c9a386176e2c2b5fd9915d0e9d3ce17d0e456e, Tue Sep 15 17:36:14 CST 2020
Took 0.0020 seconds

命令行启动之后就可以试一下hbase的命令了,比方说查看一下有什么表

hbase(main):001:0> list
TABLE
0 row(s)
Took 9.3559 seconds
=> []

用help可以查看所有命令

hbase(main):002:0> help
HBase Shell, version 2.2.6, r88c9a386176e2c2b5fd9915d0e9d3ce17d0e456e, Tue Sep 15 17:36:14 CST 2020
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group. COMMAND GROUPS:
Group name: general
Commands: processlist, status, table_help, version, whoami Group name: ddl
Commands: alter, alter_async, alter_status, clone_table_schema, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, list_regions, locate_region, show_filters Group name: namespace
Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables Group name: dml
Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve Group name: tools
Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, cleaner_chore_enabled, cleaner_chore_run, cleaner_chore_switch, clear_block_cache, clear_compaction_queues, clear_deadservers, close_region, compact, compact_rs, compaction_state, compaction_switch, decommission_regionservers, flush, hbck_chore_run, is_in_maintenance_mode, list_deadservers, list_decommissioned_regionservers, major_compact, merge_region, move, normalize, normalizer_enabled, normalizer_switch, recommission_regionserver, regioninfo, rit, split, splitormerge_enabled, splitormerge_switch, stop_master, stop_regionserver, trace, unassign, wal_roll, zk_dump Group name: replication
Commands: add_peer, append_peer_exclude_namespaces, append_peer_exclude_tableCFs, append_peer_namespaces, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, get_peer_config, list_peer_configs, list_peers, list_replicated_tables, remove_peer, remove_peer_exclude_namespaces, remove_peer_exclude_tableCFs, remove_peer_namespaces, remove_peer_tableCFs, set_peer_bandwidth, set_peer_exclude_namespaces, set_peer_exclude_tableCFs, set_peer_namespaces, set_peer_replicate_all, set_peer_serial, set_peer_tableCFs, show_peer_tableCFs, update_peer_config Group name: snapshots
Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, delete_table_snapshots, list_snapshots, list_table_snapshots, restore_snapshot, snapshot Group name: configuration
Commands: update_all_config, update_config Group name: quotas
Commands: disable_exceed_throttle_quota, disable_rpc_throttle, enable_exceed_throttle_quota, enable_rpc_throttle, list_quota_snapshots, list_quota_table_sizes, list_quotas, list_snapshot_sizes, set_quota Group name: security
Commands: grant, list_security_capabilities, revoke, user_permission Group name: procedures
Commands: list_locks, list_procedures Group name: visibility labels
Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility Group name: rsgroup
Commands: add_rsgroup, balance_rsgroup, get_rsgroup, get_server_rsgroup, get_table_rsgroup, list_rsgroups, move_namespaces_rsgroup, move_servers_namespaces_rsgroup, move_servers_rsgroup, move_servers_tables_rsgroup, move_tables_rsgroup, remove_rsgroup, remove_servers_rsgroup, rename_rsgroup SHELL USAGE:
Quote all names in HBase Shell such as table and column names. Commas delimit
command parameters. Type <RETURN> after entering a command to run it.
Dictionaries of configuration used in the creation and alteration of tables are
Ruby Hashes. They look like this: {'key1' => 'value1', 'key2' => 'value2', ...} and are opened and closed with curley-braces. Key/values are delimited by the
'=>' character combination. Usually keys are predefined constants such as
NAME, VERSIONS, COMPRESSION, etc. Constants do not need to be quoted. Type
'Object.constants' to see a (messy) list of all constants in the environment. If you are using binary keys or values and need to enter them in the shell, use
double-quote'd hexadecimal representation. For example: hbase> get 't1', "key\x03\x3f\xcd"
hbase> get 't1', "key\003\023\011"
hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40" The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.
For more on the HBase Shell, see http://hbase.apache.org/book.html

常用命令

HBase的语法跟SQL完全不同,毕竟是NoSQL。如果不知道怎么使用这些命令,可以直接敲help,根据输出内容,把命令一个一个拿来试试。用错了,它会给出提示,告诉你怎么用。

#官网上有很多例子,我直接拿过来用吧

#创建test表,包含一个列族cf。我这电脑卡的不行,创建个表都要50秒。
hbase(main):008:0> create 'test', 'cf'
Created table test
Took 48.5794 seconds
=> Hbase::Table - test #看看有没有test这个表
hbase(main):009:0> list 'test'
TABLE
test
1 row(s)
Took 0.4441 seconds
=> ["test"] #查看表的详细信息
hbase(main):016:0> describe 'test'
Table test is ENABLED
test
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DEL
ETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN
_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMOR
Y => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE',
BLOCKCACHE => 'true', BLOCKSIZE => '65536'} 1 row(s) QUOTAS
0 row(s)
Took 0.3061 seconds #向test表插入两条记录
hbase(main):017:0> put 'test','rowkey1','cf:level','P8'
Took 1.7265 seconds
hbase(main):018:0> put 'test','rowkey2','cf:salary','200w'
Took 0.0235 seconds #全表查询
hbase(main):019:0> scan 'test'
ROW COLUMN+CELL
rowkey1 column=cf:level, timestamp=1607316881281, value=P8
rowkey2 column=cf:salary, timestamp=1607317009943, value=200w
1 row(s)
Took 0.8274 seconds #查询某一个rowkey的值
hbase(main):029:0> get 'test','rowkey2'
COLUMN CELL
cf:salary timestamp=1607317246868, value=200w
1 row(s)
Took 0.2384 seconds #禁用test表
hbase(main):030:0> disable 'test'
Took 9.4715 seconds #删除test表,删除之前必须先禁用disable。不能直接删除使用中的表,否则报错。
hbase(main):031:0> drop 'test'
Took 3.6645 seconds

IDEA连接HBase

在IDEA里创建一个maven工程,pom配置如下

<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency> <!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-client -->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.13</version>
<scope>compile</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-protocol -->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-protocol</artifactId>
<version>2.0.0</version>
</dependency> </dependencies>

我用的HBase版本是2.2.6的,但是在pom里面不能导入2.2.6的包,否则运行代码会报下面的错。在网上找了半天,换成低版本的依赖就正常了,不知道是什么原理。

java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/protobuf/generated/MasterProtos$MasterService$BlockingInterface

服务端数据还用之前那个test表,已经有两条记录了

hbase(main):005:0> scan 'test'
ROW COLUMN+CELL
rowkey1 column=cf:level, timestamp=1607328808361, value=P8
rowkey2 column=cf:salary, timestamp=1607328820620, value=200w
2 row(s)
Took 0.1988 seconds

写一个HBaseTest类,代码如下

package gov.hbczt;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.After;
import org.junit.Before;
import org.junit.Test; import java.io.IOException; public class HBaseTest { Configuration conf = null;
Connection connection = null;
TableName tname = TableName.valueOf("test");
Table table = null; @Before
public void init() throws IOException {
conf = HBaseConfiguration.create();
connection = ConnectionFactory.createConnection(conf);
table = connection.getTable(tname);
} @Test
public void addData() throws IOException {
Put put = new Put(Bytes.toBytes("rowkey3"));
put.addColumn(Bytes.toBytes("cf"),Bytes.toBytes("corp"), Bytes.toBytes("Alibaba")); table.put(put);
} @After
public void destroy() throws IOException {
if(table != null)
table.close();
if(connection != null)
connection.close();
}
}

执行addData方法,执行成功之后用命令行查一下是不是新增了一条记录

hbase(main):010:0> scan 'test'
ROW COLUMN+CELL
rowkey1 column=cf:level, timestamp=1607328808361, value=P8
rowkey2 column=cf:salary, timestamp=1607328820620, value=200w
rowkey3 column=cf:corp, timestamp=1607330730061, value=Alibaba
3 row(s)
Took 0.0930 seconds

详细的API说明文档看这里Apache HBase 2.2.3 API。我不喜欢这种在线的API文档,我喜欢做成chm格式的那种,可以搜索,很方便。

网上增删改查这种例子非常多,这里就不一一列举了。

大数据学习(16)—— HBase环境搭建和基本操作的更多相关文章

  1. 《OD大数据实战》HBase环境搭建

    一.环境搭建 1. 下载 hbase-0.98.6-cdh5.3.6.tar.gz 2. 解压 tar -zxvf hbase-0.98.6-cdh5.3.6.tar.gz -C /opt/modul ...

  2. 大数据学习之Hadoop环境搭建

    一.Hadoop的优势 1)高可靠性:因为Hadoop假设计算元素和存储会出现故障,因为它维护多个工作数据副本,在出现故障时可以对失败的节点重新分布处理. 2)高扩展性:在集群间分配任务数据,可方便的 ...

  3. 分享知识-快乐自己:大数据(hadoop)环境搭建

    大数据 hadoop 环境搭建: 一):大数据(hadoop)初始化环境搭建 二):大数据(hadoop)环境搭建 三):运行wordcount案例 四):揭秘HDFS 五):揭秘MapReduce ...

  4. 大数据 -- Hadoop集群环境搭建

    首先我们来认识一下HDFS, HDFS(Hadoop Distributed File System )Hadoop分布式文件系统.它其实是将一个大文件分成若干块保存在不同服务器的多个节点中.通过联网 ...

  5. 【原创干货】大数据Hadoop/Spark开发环境搭建

    已经自学了好几个月的大数据了,第一个月里自己通过看书.看视频.网上查资料也把hadoop(1.x.2.x).spark单机.伪分布式.集群都部署了一遍,但经历短暂的兴奋后,还是觉得不得门而入. 只有深 ...

  6. 《OD大数据实战》Hue环境搭建

    官网: http://archive.cloudera.com/cdh5/cdh/5/hue-3.7.0-cdh5.3.6/ 一.Hue环境搭建 1. 下载 http://archive.cloude ...

  7. 《OD大数据实战》Hive环境搭建

    一.搭建hadoop环境 <OD大数据实战>hadoop伪分布式环境搭建 二.Hive环境搭建 1. 准备安装文件 下载地址: http://archive.cloudera.com/cd ...

  8. 《OD大数据实战》MongoDB环境搭建

    一.MongonDB环境搭建 1. 下载 https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-3.0.6.tgz 2. 解压 tar -zxvf ...

  9. 大数据学习笔记——HBase使用bulkload导入数据

    HBase使用bulkload批量导入数据 HBase可使用put命令向一张已经建好了的表中插入数据,然而,当遇到数据量非常大的情况,一条一条的进行插入效率将会大大降低,因此本篇博客将会整理提高批量导 ...

随机推荐

  1. WEB安全新玩法 [6] 防范图形验证码重复使用

    在完成关键业务操作时,要求用户输入图形验证码是防范自动化攻击的一种措施.为安全起见,即使针对同一用户,在重新输入信息时也应该更新图形验证码.iFlow 业务安全加固平台可以加强这方面的处理. 某网站系 ...

  2. 理解vertical-align

    vertical-align 支持的属性值及组成 inherit 线类baseline, top, middle, bottom 文本类text-top, text-bottom 上标下标类sub, ...

  3. Redis的Pipeline、事务和lua

    1. Pipeline 1.1 Pipeline概念 Redis客户端执行一条命令分别为如下4个过程: 1) 发送命令 2) 命令排队 3) 命令执行 4) 返回结果 其中1)+4)称为Round T ...

  4. ubuntu 替换某一内核模块

    流程 方法一 以下配置仅执行一次,并以 linux kernel 3.13.0 为例 $ cd ~ $ apt-get source linux-source-3.13.0 $ cd linux-3. ...

  5. Docker学不会?不妨看看这篇文章

    大家好,我是辰哥! 上一篇文章(2300+字!在不同系统上安装Docker!)教大家如何在系统上安装docker,今天咱们来学习docker的基本使用. 辰哥将在本文里详细介绍docker的各种使用命 ...

  6. 小程序开发 access_token 统一管理

    TOKEN 定时刷新器 一.背景 对于使用过公众平台的API功能的开发者来说,access_token绝对不会陌生,它就像一个打开家门的钥匙,只要拿着它,就能使用公众平台绝大部分的API功能.因此,对 ...

  7. IDA Pro 6.0使用Qt 框架实现了跨平台的UI

    IDA Pro 6.0使用Qt 框架实现了跨平台的UI.它的好处是插件编写者还可以直接使用 Qt 开发跨平台 UI.但是编剧呢? 在这篇博文中,我们将说明如何使用PySide使用IDAPython为 ...

  8. IDEA+Hadoop2.10.1+Zookeeper3.4.10+Hbase 2.3.5 操作JavaAPI

    在此之前要配置好三节点的hadoop集群,zookeeper集群,并启动它们,然后再配置好HBase环境 本文只是HBase2.3.5API操作作相应说明,如果前面环境还没有配置好,可以翻看我之前的博 ...

  9. 谷粒商城--分布式基础篇(P1~P27)

    谷粒商城--分布式基础篇P1~P27 去年3月份谷粒商城分布式基础.进阶.高级刚出的时候就开始学了,但是中途因为一些事就中断了,结果一直到现在才有时间重新开始学,看到现在网上这么多人都学完了,确实感觉 ...

  10. 以太网MAC地址组成与交换机基本知识

    以太网MAC地址 MAC地址由48位二进制组成,通常分为六段,用十六进制表示,工作在数据链路层. 数据链路层功能: 链路的建立,维护与拆除 帧包装,帧传输,帧同步 帧的差错恢复 简单的流量控制 第八位 ...