利用Mapreduce/hive查询Phoenix数据时如何划分partition?

PhoenixInputFormat的源码一看便知:

    public List<InputSplit> getSplits(JobContext context) throws IOException, InterruptedException {
Configuration configuration = context.getConfiguration();
QueryPlan queryPlan = this.getQueryPlan(context, configuration);
List allSplits = queryPlan.getSplits();
List splits = this.generateSplits(queryPlan, allSplits);
return splits;
}

根据select查询语句创建查询计划,QueryPlan,实际是子类ScanPlan。getQueryPlan函数有一个特殊操作:

queryPlan.iterator(MapReduceParallelScanGrouper.getInstance());

如果HBase表有多个Region,则会将一个Scan划分为多个,每个Region对应一个Split。这个逻辑跟MR on HBase类似。只是这边的实现过程不同,这边调用的是Phoenix的QueryPlan,而不是HBase API。

以下是一个示例,加深这一过程的理解。

Phoenix 建表

将表presplit为4个region:[-∞,CS), [CS, EU), [EU, NA), [NA, +∞)

CREATE TABLE TEST (HOST VARCHAR NOT NULL PRIMARY KEY, DESCRIPTION VARCHAR) SPLIT ON ('CS','EU','NA');
upsert into test(host, description) values ('CS11', 'cccccccc');
upsert into test(host, description) values ('EU11', 'eeeddddddddd')
upsert into test(host, description) values ('NA11', 'nnnnneeeddddddddd');
0: jdbc:phoenix:localhost> select * from test;
+-------+--------------------+
| HOST | DESCRIPTION |
+-------+--------------------+
| CS11 | cccccccc |
| EU11 | eeeddddddddd |
| NA11 | nnnnneeeddddddddd |
+-------+--------------------+

窥探ScanPlan

import org.apache.hadoop.hbase.client.Scan;
import org.apache.log4j.BasicConfigurator;
import org.apache.phoenix.compile.QueryPlan;
import org.apache.phoenix.iterate.MapReduceParallelScanGrouper;
import org.apache.phoenix.jdbc.PhoenixStatement; import java.io.IOException;
import java.sql.*;
import java.util.List; public class LocalPhoenix {
public static void main(String[] args) throws SQLException, IOException {
BasicConfigurator.configure(); Statement stmt = null;
ResultSet rs = null; Connection con = DriverManager.getConnection("jdbc:phoenix:localhost:2181:/hbase");
stmt = con.createStatement();
PhoenixStatement pstmt = (PhoenixStatement)stmt;
QueryPlan queryPlan = pstmt.optimizeQuery("select * from TEST");
queryPlan.iterator(MapReduceParallelScanGrouper.getInstance()); Scan scan = queryPlan.getContext().getScan();
List<List<Scan>> scans = queryPlan.getScans(); for (List<Scan> sl : scans) {
System.out.println();
for (Scan s : sl) {
System.out.print(s);
}
} con.close(); }
}

4个scan如下:

{"loadColumnFamiliesOnDemand":null,"startRow":"","stopRow":"CS","batch":-1,"cacheBlocks":true,"totalColumns":1,"maxResultSize":-1,"families":{"0":["ALL"]},"caching":100,"maxVersions":1,"timeRange":[0,1523338217847]}
{"loadColumnFamiliesOnDemand":null,"startRow":"CS","stopRow":"EU","batch":-1,"cacheBlocks":true,"totalColumns":1,"maxResultSize":-1,"families":{"0":["ALL"]},"caching":100,"maxVersions":1,"timeRange":[0,1523338217847]}
{"loadColumnFamiliesOnDemand":null,"startRow":"EU","stopRow":"NA","batch":-1,"cacheBlocks":true,"totalColumns":1,"maxResultSize":-1,"families":{"0":["ALL"]},"caching":100,"maxVersions":1,"timeRange":[0,1523338217847]}
{"loadColumnFamiliesOnDemand":null,"startRow":"NA","stopRow":"","batch":-1,"cacheBlocks":true,"totalColumns":1,"maxResultSize":-1,"families":{"0":["ALL"]},"caching":100,"maxVersions":1,"timeRange":[0,1523338217847]}Disconnected from the target VM, address: '127.0.0.1:63406', transport: 'socket'

Mapreduce atop Apache Phoenix (ScanPlan 初探)的更多相关文章

  1. Apache Phoenix基本操作-1

    本篇我们将介绍phoenix的一些基本操作. 1. 如何使用Phoenix输出Hello World? 1.1 使用sqlline终端命令 sqlline.py SZB-L0023780:2181:/ ...

  2. Apache Phoenix系列 | 从入门到精通(转载)

    原文地址:https://cloud.tencent.com/developer/article/1498057 来源: 云栖社区 作者: 瑾谦 By 大数据技术与架构 文章简介:Phoenix是一个 ...

  3. [saiku] 使用 Apache Phoenix and HBase 结合 saiku 做大数据查询分析

    saiku不仅可以对传统的RDBMS里面的数据做OLAP分析,还可以对Nosql数据库如Hbase做统计分析. 本文简单介绍下一个使用saiku去查询分析hbase数据的例子. 1.phoenix和h ...

  4. Apache Phoenix JDBC 驱动和Spring JDBCTemplate的集成

    介绍:Phoenix查询引擎会将SQL查询转换为一个或多个HBase scan,并编排运行以生成标准的JDBC结果集. 直接使用HBase API.协同处理器与自己定义过滤器.对于简单查询来说,其性能 ...

  5. phoenix 报错:type org.apache.phoenix.schema.types.PhoenixArray is not supported

    今天用phoenix报如下错误: 主要原因: hbase的表中某字段类型是array,phoenix目前不支持此类型 解决方法: 复制替换phoenix包的cursor文件 # Copyright 2 ...

  6. org.apache.phoenix.exception.PhoenixIOException: SYSTEM:CATALOG

    Error: SYSTEM:CATALOG (state=08000,code=101)org.apache.phoenix.exception.PhoenixIOException: SYSTEM: ...

  7. phoenix连接hbase数据库,创建二级索引报错:Error: org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=36, exceptions: Tue Mar 06 10:32:02 CST 2018, null, java.net.SocketTimeoutException: callTimeou

    v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VM ...

  8. apache phoenix 安装试用

    备注:   本次安装是在hbase docker 镜像的基础上配置的,主要是为了方便学习,而hbase搭建有觉得   有点费事,用镜像简单.   1. hbase 镜像 docker pull har ...

  9. How to use DBVisualizer to connect to Hbase using Apache Phoenix

    How to use DBVisualizer to connect to Hbase using Apache Phoenix Article DB Visualizer is a popular ...

随机推荐

  1. saltstack的封装和内网使用

    0.客户端使用 linux:把linux的ragent文件夹拷贝到内网linux /opt目录下,运行初始化脚本 salt服务端:# @Master:"/opt/ragent/python/ ...

  2. Kettle安装Kafka Consumer和Kafka Producer插件

    1.从github上下载kettle的kafka插件,地址如下 Kafka Consumer地址: https://github.com/RuckusWirelessIL/pentaho-kafka- ...

  3. 编织织物的knit course direction and knit wale direction

    来自:http://www.definetextile.com/2013/04/course-wale.html

  4. 尚硅谷springboot学习27-使用外置servlet容器

    嵌入式Servlet容器:应用打成可执行的jar ​ 优点:简单.便携: ​ 缺点:默认不支持JSP.优化定制比较复杂(使用定制器[ServerProperties.自定义EmbeddedServle ...

  5. board_led.h/board_led.c

    /******************************************************************************* Filename: board_led ...

  6. 用element-ui 时,报value.getTime is not a function错误:

    在用element-ui 时,报value.getTime is not a function错误:错误分析:date-picker 的时间是格林威时间,如果Thu Jun 22 2017 19:07 ...

  7. java学习笔记(四):import语法

    Import 语法是给编译器寻找特定类的适当位置的一种方法. 创建一个Employee 类,包括四个实体变量姓名(name),年龄(age),职位(designation)和薪水(salary). p ...

  8. matplotlib 坑

    1 archlinux里安装好matplotlib之后一定要安装python-cario pacman -S python-cairo

  9. oracle 新增并返回新增的主键

    oracle 的insert into 语句需要返回新增的主键的时候,可以使用一下insert 语法: insert into ims.t_bank_inquire_results (t_date,l ...

  10. swift 分组tableview 设置分区投或者尾部,隐藏默认间隔高度

    1.隐藏尾部或者头部,配套使用 //注册头部id tv.register(JYWithdrawalRecordSectionView.self, forHeaderFooterViewReuseIde ...