HBase Scan,Get用法

Scan,get用法

1. get help帮助信息

从下列get用法信息可以看出 get 后面可以跟table表名，rowkey,以及column，value.但是如果想通过get直接获取一个表中的全部数据是做不到的，这种情况就要用到另外一个命令scan。

hbase(main):214:0> help 'get'

Get row or cell contents; pass table name, row, and optionally

a dictionary of column(s), timestamp, timerange and versions. Examples:

  hbase> get 'ns1:t1', 'r1'

  hbase> get 't1', 'r1'

  hbase> get 't1', 'r1', {TIMERANGE => [ts1, ts2]}

  hbase> get 't1', 'r1', {COLUMN => 'c1'}

  hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']}

  hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}

  hbase> get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4}

  hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}

  hbase> get 't1', 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"}

  hbase> get 't1', 'r1', 'c1'

  hbase> get 't1', 'r1', 'c1', 'c2'

  hbase> get 't1', 'r1', ['c1', 'c2']

  hbsase> get 't1','r1', {COLUMN => 'c1', ATTRIBUTES => {'mykey'=>'myvalue'}}

  hbsase> get 't1','r1', {COLUMN => 'c1', AUTHORIZATIONS => ['PRIVATE','SECRET']}

2. Scan help帮助信息

scan的用法很多，可以直接扫描全表信息也可以通过指定条件来显示我们所需要获取的数据。这里涉及到Filter的用法接下来会逐一演示

hbase(main):221:0> help 'scan'

Scan a table; pass table name and optionally a dictionary of scanner

specifications.  Scanner specifications may include one or more of:

TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, TIMESTAMP, MAXLENGTH,

or COLUMNS, CACHE

If no columns are specified, all columns will be scanned.

To scan all members of a column family, leave the qualifier empty as in

'col_family:'.

The filter can be specified in two ways:

1. Using a filterString - more information on this is available in the

Filter Language document attached to the HBASE-4176 JIRA

2. Using the entire package name of the filter.

Some examples:

  hbase> scan 'hbase:meta'

  hbase> scan 'hbase:meta', {COLUMNS => 'info:regioninfo'}

  hbase> scan 'ns1:t1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}

  hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}

  hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]}

  hbase> scan 't1', {REVERSED => true}

  hbase> scan 't1', {FILTER => "(PrefixFilter ('row2') AND

    (QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter ( 123, 456))"}

  hbase> scan 't1', {FILTER =>

    org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)}

For setting the Operation Attributes

  hbase> scan 't1', { COLUMNS => ['c1', 'c2'], ATTRIBUTES => {'mykey' => 'myvalue'}}

  hbase> scan 't1', { COLUMNS => ['c1', 'c2'], AUTHORIZATIONS => ['PRIVATE','SECRET']}

For experts, there is an additional option -- CACHE_BLOCKS -- which

switches block caching for the scanner on (true) or off (false).  By

default it is enabled.  Examples:

  hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false}

Also for experts, there is an advanced option -- RAW -- which instructs the

scanner to return all cells (including delete markers and uncollected deleted

cells). This option cannot be combined with requesting specific COLUMNS.

Disabled by default.  Example:

  hbase> scan 't1', {RAW => true, VERSIONS => 10}

Besides the default 'toStringBinary' format, 'scan' supports custom formatting

by column.  A user can define a FORMATTER by adding it to the column name in

the scan specification.  The FORMATTER can be stipulated: 

 1. either as a org.apache.hadoop.hbase.util.Bytes method name (e.g, toInt, toString)

 2. or as a custom class followed by method name: e.g. 'c(MyFormatterClass).format'.

Example formatting cf:qualifier1 and cf:qualifier2 both as Integers:

  hbase> scan 't1', {COLUMNS => ['cf:qualifier1:toInt',

    'cf:qualifier2:c(org.apache.hadoop.hbase.util.Bytes).toInt'] } 

Note that you can specify a FORMATTER by column only (cf:qualifer).  You cannot

specify a FORMATTER for all columns of a column family.

Scan can also be used directly from a table, by first getting a reference to a

table, like such:

  hbase> t = get_table 't'

  hbase> t.scan

Note in the above situation, you can still provide all the filtering, columns,

options, etc as described above.

3. 通过get，Scan用法来获取表中指定rowkey信息。

1. get 获取table中rowkey语句 于 Scan获取table中rowkey语句

=================================================================================================================

【get】

hbase(main):011:0> get 'liupeng:employee','1001'

COLUMN                                  CELL

 contect:mail                           timestamp=1522202414649, value=liupliup@cn.ibm.com

 contect:phone                          timestamp=1522202430196, value=15962459503

 group:number                           timestamp=1522202455929, value=1

 info:age                               timestamp=1522202371257, value=34

 info:name                              timestamp=1522202364156, value=liupeng

【Scan】

hbase(main):010:0> scan 'liupeng:employee',FILTER=>"PrefixFilter('1001')"

ROW                                     COLUMN+CELL

 1001                                   column=contect:mail, timestamp=1522202414649, value=liupliup@cn.ibm.com

 1001                                   column=contect:phone, timestamp=1522202430196, value=15962459503

 1001                                   column=group:number, timestamp=1522202455929, value=1

 1001                                   column=info:age, timestamp=1522202371257, value=34

 1001                                   column=info:name, timestamp=1522202364156, value=liupeng

1 row(s) in 0.0590 seconds

总结：从上述两种不同的方法可以看出Scan的结果包含了rowkey本身。而get获取到的信息不包含rowkey的值。另外get的column于cell是分开的。而Scan是2者结合在一起的。
     另外Scan中FILTER过滤“PrefixFilter”关键字是用来筛选rowkey的。

4. get于Scan获取table中单条数据信息中的区别
《相同点》

hbase(main):229:0> get "liupeng:employee",'1001','info:phone'

COLUMN                          CELL

 info:phone                     timestamp=1527914569028, value=15962459503

1 row(s) in 0.0320 seconds

hbase(main):230:0> scan "liupeng:employee",FILTER=>"PrefixFilter('1001')AND ValueFilter(=,'substring:159')"

ROW                             COLUMN+CELL

 1001                           column=info:phone, timestamp=1527914569028, value=15962459503

1 row(s) in 0.1010 seconds

《不同点》
##注意事项：上述都可以把table中rowkey为1002，元素为'159'的信息查询出来。但是查询的方式截然不同。get是通过指定固定的value 'contect:phone'来获取到的。
而scan是通过PerfixFilter指定固定的rowkey,然后通过AND条件语句结合ValueFilter指定模糊查询的字符串159查出来的。如果不知道对应的value是contect:phone的基础上
显然Scan这种模糊查询的方式更加高效。

另外Scan下面这种相同语句的查询用get语法是做不到的。例如：
=================================================================================================================

hbase(main):026:0> scan 'liupeng:employee',FILTER=>"ValueFilter(=,'substring:159')"

ROW                                     COLUMN+CELL

 1001                                   column=contect:phone, timestamp=1522202430196, value=15962459503

 1002                                   column=contect:phone, timestamp=1522202527866, value=15977634464

##解释：上述是通过模糊查询直接找到了只要包含159的字段的值就全部显示出来。而get的语法如下所视必须指定rowkey的基础上才可以查询columns。这就需要对rowkey定义的时候
考虑全面的涉及才可以做到。因此从这点来看Scan的方法个人认为比get获取信息更加的便捷。

 hbase> t.get 'r1'

  hbase> t.get 'r1', {TIMERANGE => [ts1, ts2]}

  hbase> t.get 'r1', {COLUMN => 'c1'}

  hbase> t.get 'r1', {COLUMN => ['c1', 'c2', 'c3']}

  hbase> t.get 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}

  hbase> t.get 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4}

  hbase> t.get 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}

  hbase> t.get 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"}

  hbase> t.get 'r1', 'c1'

  hbase> t.get 'r1', 'c1', 'c2'

  hbase> t.get 'r1', ['c1', 'c2']

5. Scan方法可以不用指定rowkey检索的情况下直接找valuse值。更具体点说也就是我们要找的哪个column中的哪个value值。get方法是无法做到这一点的。

ColumnPrefixFilter('列名')

hbase(main):038:0> scan 'liupeng:employee',FILTER=>"ColumnPrefixFilter('name')"

ROW                                     COLUMN+CELL

 1001                                   column=info:name, timestamp=1522202364156, value=liupeng

 1002                                   column=info:name, timestamp=1522202474669, value=Jack_Ma

 1003                                   column=info:name, timestamp=1522202561029, value=kevin_shi

3 row(s) in 0.0210 seconds

##注释：ColumnPrefixFilter代表指定具体哪一个column（key（info）对应的value(name)）。

6. Scan方法方便在于它可以随意指定rowkey，column以及value的值来进行查找。还可以结合AND,ORD等条件语句并用来找到自己想要的数据。
下列语法是AND及OR的连用方法。但是同一条语句中相同的条件语句不可以同时使用。例如AND ....AND..这种方法是不允许的。

hbase(main):060:0> scan 'liupeng:employee',FILTER=>"ColumnPrefixFilter('ph')AND ValueFilter(=,'substring:15962')OR ValueFilter(=,'substring:186')"

ROW                                                  COLUMN+CELL

 1001                                                column=contect:phone, timestamp=1522202430196, value=15962459503

 1003                                                column=contect:phone, timestamp=1522202605976, value=18665851263

2 row(s) in 0.0170 seconds

7. 通过SingleColumnValueFilter类方法指定检索值列举出检索值对应的所有列及value数据

hbase(main):242:0> scan "liupeng:employee",{FILTER=>"SingleColumnValueFilter('info','age',=,'substring:30')"}

ROW                             COLUMN+CELL

 1005                           column=contect:mail, timestamp=1528420218800, value=zhangsan@163.com

 1005                           column=info:age, timestamp=1528439967493, value=30

 1005                           column=info:name, timestamp=1528420218800, value=zhangsan

 1008                           column=contect:mail, timestamp=1528681786126, value=www.kevin@alibaba.com

 1008                           column=info:age, timestamp=1528681786126, value=30

 1008                           column=info:name, timestamp=1528681786126, value=kevin

2 row(s) in 0.0110 seconds

8. SingleColumnValueFilter类还提供正则表达式查询方法。可以通过模糊查询来查找对应的rowkeys,columns以及values。

hbase(main):244:0> scan "liupeng:employee",{FILTER=>"SingleColumnValueFilter('info','name',=,'regexstring:liu')"}

ROW                             COLUMN+CELL

 1001                           column=contect:mail, timestamp=1527231141046, value=liupliup@cn.ibm.com

 1001                           column=info:address, timestamp=1527753987327, value=shanghai

 1001                           column=info:age, timestamp=1527231097033, value=34

 1001                           column=info:name, timestamp=1527231081262, value=liupeng

 1001                           column=info:phone, timestamp=1527914569028, value=15962459503

 1004                           column=contect:mail, timestamp=1527473497956, value=lqdong@jingdong.com

 1004                           column=info:address, timestamp=1527755135174, value=shenzhen

 1004                           column=info:age, timestamp=1527473477124, value=40

 1004                           column=info:name, timestamp=1527415665182, value=liuqiangdong

2 row(s) in 0.0080 seconds

HBase Scan,Get用法的更多相关文章

HBase Scan Timeout-OutOfOrderScannerNextException
最近迁移数据时需要执行大Scan,HBase集群经常碰到以下日志: Exception in thread "main" org.apache.hadoop.hbase.DoNot ...
<HBase><Scan>
Overview The Scan operation for HBase. Scan API All operations are identical to Get with the excepti ...
HBase Scan流程分析
HBase Scan流程分析 HBase的读流程目前看来比较复杂,主要由于: HBase的表数据分为多个层次,HRegion->HStore->[HFile,HFile,...,MemSt ...
HBase shell scan 过滤器用法总结
比较器: 前面例子中的regexstring:2014-11-08.*.binary:\x00\x00\x00\x05,这都是比较器.HBase的filter有四种比较器: (1)二进制比较器:如’b ...
hbase scan 的例子
/** * Created by han on 2016/1/28. */ import org.apache.hadoop.conf.Configuration; import org.apache ...
HBase scan 时异常 ScannerTimeoutException 解决
org.apache.Hadoop.hbase.client.ScannerTimeoutException: 60622ms passed since the last invocation, ti ...
HBase scan setBatch和setCaching的区别
HBase的查询实现只提供两种方式: 1.按指定RowKey获取唯一一条记录,get方法(org.apache.hadoop.hbase.client.Get) 2.按指定的条件获取一批记录,scan ...
HBase scan setBatch和setCaching的区别【转】
转自:http://blog.csdn.net/caoli98033/article/details/44650497 HBase的查询实现只提供两种方式: 1.按指定RowKey获取唯一一条记录,g ...
HBase scan shell操作详解
创建表 create 'test1', 'lf', 'sf' lf: column family of LONG values (binary value) -- sf: column family ...

随机推荐

元素float以后，div高度无法自适应解决方案
首先要明白 >> 浮动的子元素会脱离文档流,不再占据父元素的空间,父元素也就没有了高度. 解决方案:1 给父元素加上overflow:hidden;属性就行了. 第一种:(给父级加over ...
OC 成员变量（ -> 使用）
@interface Student : NSObject { // @public // @protected // @private // 默认的作用域是@protected int age; @ ...
IOS JPush 集成步骤(极光远程推送解决方案,支持android和iOS两个平台)
● 什么是JPush ● 一套远程推送解决方案,支持android和iOS两个平台 ● 它能够快捷地为iOS App增加推送功能,减少集成APNs需要的工作量.开发复杂度 ● 更多的信息,可 ...
selenium使用js进行点击
WebElement button = driver.findElement(By.xpath("/html/body/div[1]/div[3]/h2/div[2]")); Ja ...
java实现权重随机算法
权重随机算法在抽奖,资源调度等系统中应用还是比较广泛的,一个简单的按照权重来随机的实现,权重为几个随机对象(分类)的命中的比例,权重设置越高命中越容易,之和可以不等于100: 简单实现代码如下: im ...
node学习笔记（连载）
这段时间玩了小程序.浏览器插件.koa建站,本来想写几篇文章总结一下的.迫于工作上有新需求要跟进,所以先写写读书笔记吧.公司九点上班,不过弹性工作时间,大家基本上九点半之前到.而我作为渣渣,八点半就到 ...
Python—面向对象04 绑定方法
坚持把梳理的知识都给记下来....... 嗯哼哼 1.绑定方法与非绑定方法在类内部定义的函数,分为两大类: 绑定到类的方法:用classmethod装饰器装饰的方法. 为类量身定制类.boud_m ...
中小学信息学奥林匹克竞赛-理论知识考点--IP地址
IP地址同身份证号一样,具有唯一性! 每个人都有一个唯一的标识:身份证号. 互联网中的计算机也一样,具有一个唯一的标识:IP地址. IP地址是一个32位的二进制数,通常被分割为4个“8位二进制数”(也 ...
o'Reill的SVG精髓（第二版）学习笔记——第十章
10.1 裁剪路径创建SVG文档时,可以通过制定感兴趣区域的宽度和高度建立视口.这会变成默认的裁剪区域,任何绘制在该范围外部的部分都不会显示.你也可以使用<clipPath>元素来建立自 ...
block与inline，inline和inline-block，块级和行内元素，行内替换和行内非替换元素
block:块级元素默认display属性为block:无论块内内容有多少,总是占满一行: inline:行内元素默认display属性为inline:只占据块内的内容的大小,不会占满一整行: inl ...

HBase Scan,Get用法

HBase Scan,Get用法的更多相关文章

随机推荐

热门专题