Mapreduce atop Apache Phoenix (ScanPlan 初探)
利用Mapreduce/hive查询Phoenix数据时如何划分partition?
PhoenixInputFormat的源码一看便知:
public List<InputSplit> getSplits(JobContext context) throws IOException, InterruptedException {
Configuration configuration = context.getConfiguration();
QueryPlan queryPlan = this.getQueryPlan(context, configuration);
List allSplits = queryPlan.getSplits();
List splits = this.generateSplits(queryPlan, allSplits);
return splits;
}
根据select查询语句创建查询计划,QueryPlan,实际是子类ScanPlan。getQueryPlan函数有一个特殊操作:
queryPlan.iterator(MapReduceParallelScanGrouper.getInstance());
如果HBase表有多个Region,则会将一个Scan划分为多个,每个Region对应一个Split。这个逻辑跟MR on HBase类似。只是这边的实现过程不同,这边调用的是Phoenix的QueryPlan,而不是HBase API。
以下是一个示例,加深这一过程的理解。
Phoenix 建表
将表presplit为4个region:[-∞,CS), [CS, EU), [EU, NA), [NA, +∞)
CREATE TABLE TEST (HOST VARCHAR NOT NULL PRIMARY KEY, DESCRIPTION VARCHAR) SPLIT ON ('CS','EU','NA');
upsert into test(host, description) values ('CS11', 'cccccccc');
upsert into test(host, description) values ('EU11', 'eeeddddddddd')
upsert into test(host, description) values ('NA11', 'nnnnneeeddddddddd');
0: jdbc:phoenix:localhost> select * from test;
+-------+--------------------+
| HOST | DESCRIPTION |
+-------+--------------------+
| CS11 | cccccccc |
| EU11 | eeeddddddddd |
| NA11 | nnnnneeeddddddddd |
+-------+--------------------+
窥探ScanPlan
import org.apache.hadoop.hbase.client.Scan;
import org.apache.log4j.BasicConfigurator;
import org.apache.phoenix.compile.QueryPlan;
import org.apache.phoenix.iterate.MapReduceParallelScanGrouper;
import org.apache.phoenix.jdbc.PhoenixStatement;
import java.io.IOException;
import java.sql.*;
import java.util.List;
public class LocalPhoenix {
public static void main(String[] args) throws SQLException, IOException {
BasicConfigurator.configure();
Statement stmt = null;
ResultSet rs = null;
Connection con = DriverManager.getConnection("jdbc:phoenix:localhost:2181:/hbase");
stmt = con.createStatement();
PhoenixStatement pstmt = (PhoenixStatement)stmt;
QueryPlan queryPlan = pstmt.optimizeQuery("select * from TEST");
queryPlan.iterator(MapReduceParallelScanGrouper.getInstance());
Scan scan = queryPlan.getContext().getScan();
List<List<Scan>> scans = queryPlan.getScans();
for (List<Scan> sl : scans) {
System.out.println();
for (Scan s : sl) {
System.out.print(s);
}
}
con.close();
}
}
4个scan如下:
{"loadColumnFamiliesOnDemand":null,"startRow":"","stopRow":"CS","batch":-1,"cacheBlocks":true,"totalColumns":1,"maxResultSize":-1,"families":{"0":["ALL"]},"caching":100,"maxVersions":1,"timeRange":[0,1523338217847]}
{"loadColumnFamiliesOnDemand":null,"startRow":"CS","stopRow":"EU","batch":-1,"cacheBlocks":true,"totalColumns":1,"maxResultSize":-1,"families":{"0":["ALL"]},"caching":100,"maxVersions":1,"timeRange":[0,1523338217847]}
{"loadColumnFamiliesOnDemand":null,"startRow":"EU","stopRow":"NA","batch":-1,"cacheBlocks":true,"totalColumns":1,"maxResultSize":-1,"families":{"0":["ALL"]},"caching":100,"maxVersions":1,"timeRange":[0,1523338217847]}
{"loadColumnFamiliesOnDemand":null,"startRow":"NA","stopRow":"","batch":-1,"cacheBlocks":true,"totalColumns":1,"maxResultSize":-1,"families":{"0":["ALL"]},"caching":100,"maxVersions":1,"timeRange":[0,1523338217847]}Disconnected from the target VM, address: '127.0.0.1:63406', transport: 'socket'
Mapreduce atop Apache Phoenix (ScanPlan 初探)的更多相关文章
- Apache Phoenix基本操作-1
本篇我们将介绍phoenix的一些基本操作. 1. 如何使用Phoenix输出Hello World? 1.1 使用sqlline终端命令 sqlline.py SZB-L0023780:2181:/ ...
- Apache Phoenix系列 | 从入门到精通(转载)
原文地址:https://cloud.tencent.com/developer/article/1498057 来源: 云栖社区 作者: 瑾谦 By 大数据技术与架构 文章简介:Phoenix是一个 ...
- [saiku] 使用 Apache Phoenix and HBase 结合 saiku 做大数据查询分析
saiku不仅可以对传统的RDBMS里面的数据做OLAP分析,还可以对Nosql数据库如Hbase做统计分析. 本文简单介绍下一个使用saiku去查询分析hbase数据的例子. 1.phoenix和h ...
- Apache Phoenix JDBC 驱动和Spring JDBCTemplate的集成
介绍:Phoenix查询引擎会将SQL查询转换为一个或多个HBase scan,并编排运行以生成标准的JDBC结果集. 直接使用HBase API.协同处理器与自己定义过滤器.对于简单查询来说,其性能 ...
- phoenix 报错:type org.apache.phoenix.schema.types.PhoenixArray is not supported
今天用phoenix报如下错误: 主要原因: hbase的表中某字段类型是array,phoenix目前不支持此类型 解决方法: 复制替换phoenix包的cursor文件 # Copyright 2 ...
- org.apache.phoenix.exception.PhoenixIOException: SYSTEM:CATALOG
Error: SYSTEM:CATALOG (state=08000,code=101)org.apache.phoenix.exception.PhoenixIOException: SYSTEM: ...
- phoenix连接hbase数据库,创建二级索引报错:Error: org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=36, exceptions: Tue Mar 06 10:32:02 CST 2018, null, java.net.SocketTimeoutException: callTimeou
v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VM ...
- apache phoenix 安装试用
备注: 本次安装是在hbase docker 镜像的基础上配置的,主要是为了方便学习,而hbase搭建有觉得 有点费事,用镜像简单. 1. hbase 镜像 docker pull har ...
- How to use DBVisualizer to connect to Hbase using Apache Phoenix
How to use DBVisualizer to connect to Hbase using Apache Phoenix Article DB Visualizer is a popular ...
随机推荐
- windows密码策略
windows域环境配置密码策略: http://www.cnblogs.com/danzhang/p/3693024.html windows配置密码策略: https://jingyan.baid ...
- Python正则替换字符串函数re.sub用法示例(1)
本文实例讲述了Python正则替换字符串函数re.sub用法.分享给大家供大家参考,具体如下: python re.sub属于python正则的标准库,主要是的功能是用正则匹配要替换的字符串然后把它替 ...
- html:class名命名规范
1 前端开发命名规范 1.1 为什么要制定CSS命名规范 统一的命名规范,便于多人开发维护时代码统一,减少项目沟通和交接的成本,增加代码的语义化. 1.2 CSS命名规则 样式类名全部用小写,首字符必 ...
- jschDemo
jsch是java的sftp实现 import com.jcraft.jsch.*; import java.io.OutputStream; public class JschStart { pub ...
- js 菜单收起和展开
- 动态加载、移除js、css
本文简单介绍动态加载.移除.替换js/css文件 .有时候我们在写前端的时候,会有出现需要动态加载一些东如css js 这样能减轻用户加载负担,从而提高响应效率.下面贴出代码.//JS写法 <s ...
- thu-learn-lib 开发小记(转)
原创:https://harrychen.xyz/2019/02/09/thu-learn-lib/ 今天是大年初五,原本计划出门玩,但是天气比较糟糕就放弃了.想到第一篇博客里面预告了要给thu-le ...
- 关于java中分割字符串
例子:String path = "123.456.789"; 如果要使用“.”将path分割成String[], path.split("//."); or ...
- HTML中关于class内容空格多类名的问题详解
之所以想谈谈这个,不明所以.所以转载下来方便自己看看. 问:像 class="info fl" 这种class定义是何意思? 答:这里的空格隔开后,它们所代表的是两个类名,分别为i ...
- 关闭浏览器时提示的javascript事件
onbeforeunload事件 它是这样用的: <script language="javascript"> g_blnCheckUnload = true; fun ...