HBase 详解

1.HBase 架构

============================================

2. HBase Shell 操作

2.1. 基本操作

进入HBase客户端命令行：bin/hbase shell
查看帮助命令：help
查看当前数据库中有哪些表：list

2.2. 表的操作

// 2.2.1. 创建表

create 'student','info'(列族)

// 2.2.2. 插入数据到表

put 'student','1001','info:sex','male'

put 'student','1001','info:age','18'

put 'student','1002','info:name','Janna'

put 'student','1002','info:sex','female'

put 'student','1002','info:age','20'

// 2.2.3. 扫描查看表数据

scan 'student'

scan 'student',{STARTROW => '1001', STOPROW => '1004'}

scan 'student',{STARTROW => '1001'}

// 2.2.4. 查看表结构

describe 'student'

// 2.2.5. 更新指定字段的数据

put 'student','1001','info:name','zhangsan'

put 'student','1001','info:age','22'

// 2.2.6. 查看“指定行”或“指定列族：列”的数据

get 'student','1001'

get 'student','1001','info:name'

// 2.2.7. 统计表数据行数

count 'student'

// 2.2.8. 删除数据:

// 删除某rowkey的全部数据

deleteall 'student','1001'

// 删除某rowkey的某一列数据

delete 'student','1002','info:sex'

// 2.2.9. 清空表数据

// 清空表的操作顺序为先disable， 然后再 truncate

truncate 'student'

// 2.2.10. 删除表

// 先让该表为 disable 状态

disable 'student'

// 然后才能 drop 这个表

drop 'student'

// 2.2.11 变更表信息

// 将info列族中的数据存放3个版本

alter 'student',{NAME=>'info',VERSIONS=>3}

get 'student','1001',{COLUMN=>'info:name',VERSIONS=>3}

3. HBase 数据结构

3.1 RowKey

RowKey 可以是任意字符串（最大长度是64KB，实际应用中长度一般为10-100bytes），在HBASE内部，RowKey 保存为字节数组。存储时，数据按照 RowKey 的字典序排序存储。设计 RowKey 时，要充分考虑排序存储这个特性，将经常一起读取的行存储在一起。（位置相关性）
RowKey 是用来检索记录的主键。访问HBase table中的行，有三种方式：
- 通过单个 RowKey 访问；
- 通过 RowKey 的 range（正则）
- 全表扫描

3.2 Column Family

列族：HBASE 表中的每个列，都归属于某个列族。列族是表的 schema 的一部分（而列不是）。必须在使用表之前定义。列名都以列族作为前缀。例如：courses:history, courses:math 都属于 courses 这个列族。

3.3 Cell

由（rowkey, column Family:column, version)唯一确定的单元。cell 中的数据是没有类型的，全部是字节码形式存储。

3.4 TimeStamp

版本通过时间戳来索引；

3.5 命名空间

Table: 表，所有的表都是命名空间的成员，即表必属于某个命名空间，如果没有指定，则在 default 默认的命名空间中；
RegionServer group：一个命名空间包含了默认的 RegionServer Group;
Permission: 权限。命名空间能够让我们来定义访问控制列表ACL（Access Control List)；
Quota：限额，可以强制一个命名空间可包含的 region 数量；
常用命令：
- 查看命名空间：list_namespace;
- 创建命名空间：create_namespace 'bigdata';
- 在指定命名空间下，创建表：create 'bigdata:student','info';
- 删除命名空间：drop_namespace 'bigdata';

4. HBase 原理

4.1 读流程

=========================

4.2 写流程

Client 向 HregionServer 发送写请求；
HregionServer 将数据写到 HLog(write ahead log)。为了数据的持久化和恢复；
HregionServer 将数据写到内存（MemStore）；
反馈 Client 写成功；

=========================

4.3 数据flush过程

当 MemStore 数据达到阈值（默认是 128M，老版本是 64M），将数据刷到磁盘，将内存中的数据删除，同时删除HLog中的历史数据；
将数据存储到HDFS中；
在HLog中做标记点；

4.4 数据合并过程

当数据块达到4块，Hmaster 触发合并操作， Region 将数据块加载到本地，进行合并；
当合并的数据超过 256M，进行拆分，将拆分后的 Region 分配给不同的 HregionServer 管理；
当 HregionServer 宕机后，将 HregionServer 上的 hlog拆分，然后分配给不同的 HregionServer 加载，修改 META;
注意：HLog 会同步到 HDFS;

5. HBase API 操作

5.1 环境准备

// pom.xml

  <dependencies>

      <dependency>

          <groupId>junit</groupId>

          <artifactId>junit</artifactId>

          <version>RELEASE</version>

      </dependency>

      <dependency>

          <groupId>org.apache.hbase</groupId>

          <artifactId>hbase-server</artifactId>

          <version>1.3.4</version>

      </dependency>

      <dependency>

          <groupId>org.apache.hbase</groupId>

          <artifactId>hbase-client</artifactId>

          <version>1.3.4</version>

      </dependency>

  <dependencies>

5.2 程序编写

public class TestHbase {

	private static Admin admin = null;

	private static Connection connection = null;

	private static Configuration configuration = null;

	static {

		// HBase 配置文件

		configuration = HBaseConfiguration.create();

		// zookeeper 配置

		configuration.set("hbase.zookeeper.quorum", "Zookeeper地址");

		configuration.set("hbase.zookeeper.property.clientPort", "2181");

		// 获取连接对象

		try {

			connection = ConnectionFactory.createConnection(configuration);

			admin = connection.getAdmin();

		} catch (IOException e) {

			// TODO Auto-generated catch block

			e.printStackTrace();

		}

	}

	private void close(Connection conn, Admin admin) {

		if (conn != null) {

			try {

				conn.close();

			} catch (IOException e) {

				// TODO Auto-generated catch block

				e.printStackTrace();

			}

		}

		if (admin != null) {

			try {

				admin.close();

			} catch (IOException e) {

				// TODO Auto-generated catch block

				e.printStackTrace();

			}

		}

	}

	// 判断表是否存在

	@Test

	public void tableExist() throws IOException {

		// 执行操作

		Boolean student = admin.tableExists(TableName.valueOf("student"));

		System.out.println("表Staff是否存在："+student);

		Boolean staff = admin.tableExists(TableName.valueOf("staff"));

		System.out.println("表Staff是否存在："+staff);

		this.close(connection, admin);

	}

	// 创建表

	@Test

	public void createTable() throws IOException {

		// 创建表描述器

		HTableDescriptor hTableDescriptor = new HTableDescriptor(TableName.valueOf("country"));

		// 创建列描述器

		HColumnDescriptor hColumnDescriptor = new HColumnDescriptor("name");

		// 添加列族

		hTableDescriptor.addFamily(hColumnDescriptor);

		// 创建表操作

		admin.createTable(hTableDescriptor);

		System.out.println("表已经创建成功!");

		this.close(connection, admin);

	}

	// 删除表

	@Test

	public void deleteTable() throws IOException {

		// 使表不可用

		admin.disableTable(TableName.valueOf("country"));

		// 删除表

		admin.deleteTable(TableName.valueOf("country"));

		System.out.println("表已经删除成功!");

		this.close(connection, admin);

	}

	// 操作表中数据：

	// 增/改数据

	@Test

	public void putData() throws IOException {

		// 获取table对象

		Table table = connection.getTable(TableName.valueOf("student"));

		// 创建put对象

		Put put = new Put(Bytes.toBytes("1010"));

		// 添加数据

		put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes("貂蝉"));

		// 执行添加操作

		table.put(put);

		System.out.println("数据添加成功!");

		this.close(connection, admin);

	}

	// 删除数据

	@Test

	public void deleteData() throws IOException {

		// 获取table对象

		Table table = connection.getTable(TableName.valueOf("student"));

		// 创建delete对象

		Delete delete = new Delete(Bytes.toBytes("1010"));

		// 第一种方式：不建议使用，只是删除最新版本

//		delete.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name"));

		// 第二种方式：删除所有版本

		delete.addColumns(Bytes.toBytes("info"), Bytes.toBytes("name"));

		// 删除数据

		table.delete(delete);

		System.out.println("数据删除成功!");

		table.close();

		this.close(connection, admin);

	}

	// 查询数据

	// 全表扫描

	@Test

	public void scanTable() throws IOException {

		Table table = connection.getTable(TableName.valueOf("student"));

		// 构建扫描器

		Scan scan = new Scan();

		ResultScanner results = table.getScanner(scan);

		for(Result result : results) {

			Cell[] cells = result.rawCells();

			for (Cell cell : cells) {

				System.out.println("RK:" + Bytes.toString(CellUtil.cloneRow(cell)) +

								   ",CF:" + Bytes.toString(CellUtil.cloneFamily(cell)) +

								   ",CN:" + Bytes.toString(CellUtil.cloneQualifier(cell)) +

								   ",VALUE:" + Bytes.toString(CellUtil.cloneValue(cell)));

			}

		}

		table.close();

		this.close(connection, admin);

	}

  // 获取指定列族：列的数据

	@Test

	public void scanColumnFamily() throws IOException {

		Table table = connection.getTable(TableName.valueOf("student"));

		// 创建get对象

		Get get = new Get(Bytes.toBytes("1001"));

		Result result = table.get(get);

		Cell[] cells = result.rawCells();

		for (Cell cell : cells) {

			System.out.println("RK:" + Bytes.toString(CellUtil.cloneRow(cell)) +

							   ",CF:" + Bytes.toString(CellUtil.cloneFamily(cell)) +

							   ",CN:" + Bytes.toString(CellUtil.cloneQualifier(cell)) +

							   ",VALUE:" + Bytes.toString(CellUtil.cloneValue(cell)));

		}

	}

}

**参考资料:**
- [ubuntu 16.04 开放指定端口](https://blog.csdn.net/fancancan/article/details/81286689)
- [HBase 单机启动问题](https://blog.csdn.net/Sunday2017/article/details/80569947)
- [hadoop +zookeeper + hbase 单节点安装](https://www.cnblogs.com/chaoren399/p/5018550.html)
- [Windows 环境下，出现“UnknownHostException”](https://blog.csdn.net/weixin_40861707/article/details/79979491)

HBase 详解的更多相关文章

[转帖]HBase详解（很全面）
HBase详解(很全面) very long story 简单看了一遍很多不明白的地方.. 2018-06-08 16:12:32 卢子墨阅读数 34857更多分类专栏: HBase [转自 ...
图解大数据 | 海量数据库查询-Hive与HBase详解
作者:韩信子@ShowMeAI 教程地址:http://www.showmeai.tech/tutorials/84 本文地址:http://www.showmeai.tech/article-det ...
HBase详解
1. hbase简介 1.1. 什么是hbase HBASE是一个高可靠性.高性能.面向列.可伸缩的分布式存储系统,利用HBASE技术可在廉价PC Server上搭建起大规模结构化存储集群. H ...
大数据入门第十四天——Hbase详解（一）入门与安装配置
一.概述 1.什么是Hbase 根据官网:https://hbase.apache.org/ Apache HBase™ is the Hadoop database, a distributed, ...
分布式数据库hbase详解
新霸哥注意到了在人类随着计算机技术的发展,数据的存储量发生了很大的变化,可以用海量来形容,同时,存储的数据类型也是有多种多样的,网页,图片,视频,音频,电子邮件等等,所以在这中情况下以谷歌旗下的Big ...
Hive集成HBase详解
摘要 Hive提供了与HBase的集成,使得能够在HBase表上使用HQL语句进行查询插入操作以及进行Join和Union等复杂查询应用场景 1. 将ETL操作的数据存入HBase 2. HB ...
大数据入门第十四天——Hbase详解（三）hbase基本原理与MR操作Hbase
一.基本原理 1.hbase的位置上图描述了Hadoop 2.0生态系统中的各层结构.其中HBase位于结构化存储层,HDFS为HBase提供了高可靠性的底层存储支持, MapReduce为HBas ...
大数据入门第十四天——Hbase详解（二）基本概念与命令、javaAPI
一.hbase数据模型完整的官方文档的翻译,参考:https://www.cnblogs.com/simple-focus/p/6198329.html 1.rowkey 与nosql数据库们一样, ...
Hadoop之Hbase详解
1.什么是Hbase HBASE是一个高可靠性.高性能.面向列.可伸缩的分布式存储系统, hbase是列式的分布式数据库 1.2.HBASE优势: 1)线性扩展,随着数据量增多可以通过节点扩展进行支撑 ...

随机推荐

为什么 JVM 不用 JIT 全程编译？
考虑到跨平台,所以无法使用AOT: 考虑到执行效率,所以无法全部使用JIT: 编译技术大约分为两种,一种AOT,只线下(offline)就将源代码编译成目标机器码,这是普遍用在系统程序语言中:另一种是 ...
Mysql命令下导出select查询数据之 select ... into outfile方法
Mysql日常使用中经常遇到将select查询的数据导出到本地目录的情况,以便数据备份.分析等. 接下来将介绍Mysql终端下使用 select ... into outfile 语句导出数据方法命 ...
Robots 2019南京网络赛（概率dp）
Robots \[ Time Limit: 1000 ms \quad Memory Limit: 262144 kB \] 题意有一个机器人要从 $1$ 点走到 $n$ 点,每走一步都需要 ...
Cyclical Quest CodeForces - 235C （后缀自动机）
Cyclical Quest \[ Time Limit: 3000 ms\quad Memory Limit: 524288 kB \] 题意给出一个字符串为 $s$ 串,接下来 $T$ ...
LeetCode 911. Online Election
原题链接在这里:https://leetcode.com/problems/online-election/ 题目: In an election, the i-th vote was cast fo ...
破解EXCEL工作表保护密码
神技破解EXCEL工作表保护密码 http://www.mr-wu.cn/crack-excel-workbook-protection/ 我们可以通过新建工作本,来创建一个新的工作本来创造新的宏而 ...
MongoDB 聚合查询报错
1.Distinct聚合查询报错 db.users.distinct("uname") db.runCommand({"distinct":"user ...
「插头dp」
Tasklist: 标识设计神奇游乐园 Manhattan Wiring ParkII 游览计划 CITY: 只用一条回路经过所有可通过的块括号匹配,注意结束位置不一定是(n,m) 地板: 分已经 ...
洛谷P2312 解方程题解
洛谷P2312 解方程题解题目描述已知多项式方程: \[a_0+a_1x+a_2x^2+\cdots+a_nx^n=0\] 求这个方程在 $[1,m]$ 内的整数解($n$ 和 $m$ ...
shell之批量新增用户脚本(http-basic-auth)
user.txt(用户名记录文件) test001@.com test002@.com user.sh(shell脚本): for line in `cat user.txt` do echo $li ...

HBase 详解

1.HBase 架构

2. HBase Shell 操作

2.1. 基本操作

2.2. 表的操作

3. HBase 数据结构

3.1 RowKey

3.2 Column Family

3.3 Cell

3.4 TimeStamp

3.5 命名空间

4. HBase 原理

4.1 读流程

4.2 写流程

4.3 数据flush过程

4.4 数据合并过程

5. HBase API 操作

5.1 环境准备

5.2 程序编写

HBase 详解的更多相关文章

随机推荐

热门专题