Mapreduce atop Apache Phoenix (ScanPlan 初探)
利用Mapreduce/hive查询Phoenix数据时如何划分partition?
PhoenixInputFormat的源码一看便知:
public List<InputSplit> getSplits(JobContext context) throws IOException, InterruptedException {
Configuration configuration = context.getConfiguration();
QueryPlan queryPlan = this.getQueryPlan(context, configuration);
List allSplits = queryPlan.getSplits();
List splits = this.generateSplits(queryPlan, allSplits);
return splits;
}
根据select查询语句创建查询计划,QueryPlan,实际是子类ScanPlan。getQueryPlan函数有一个特殊操作:
queryPlan.iterator(MapReduceParallelScanGrouper.getInstance());
如果HBase表有多个Region,则会将一个Scan划分为多个,每个Region对应一个Split。这个逻辑跟MR on HBase类似。只是这边的实现过程不同,这边调用的是Phoenix的QueryPlan,而不是HBase API。
以下是一个示例,加深这一过程的理解。
Phoenix 建表
将表presplit为4个region:[-∞,CS), [CS, EU), [EU, NA), [NA, +∞)
CREATE TABLE TEST (HOST VARCHAR NOT NULL PRIMARY KEY, DESCRIPTION VARCHAR) SPLIT ON ('CS','EU','NA');
upsert into test(host, description) values ('CS11', 'cccccccc');
upsert into test(host, description) values ('EU11', 'eeeddddddddd')
upsert into test(host, description) values ('NA11', 'nnnnneeeddddddddd');
0: jdbc:phoenix:localhost> select * from test;
+-------+--------------------+
| HOST | DESCRIPTION |
+-------+--------------------+
| CS11 | cccccccc |
| EU11 | eeeddddddddd |
| NA11 | nnnnneeeddddddddd |
+-------+--------------------+
窥探ScanPlan
import org.apache.hadoop.hbase.client.Scan;
import org.apache.log4j.BasicConfigurator;
import org.apache.phoenix.compile.QueryPlan;
import org.apache.phoenix.iterate.MapReduceParallelScanGrouper;
import org.apache.phoenix.jdbc.PhoenixStatement;
import java.io.IOException;
import java.sql.*;
import java.util.List;
public class LocalPhoenix {
public static void main(String[] args) throws SQLException, IOException {
BasicConfigurator.configure();
Statement stmt = null;
ResultSet rs = null;
Connection con = DriverManager.getConnection("jdbc:phoenix:localhost:2181:/hbase");
stmt = con.createStatement();
PhoenixStatement pstmt = (PhoenixStatement)stmt;
QueryPlan queryPlan = pstmt.optimizeQuery("select * from TEST");
queryPlan.iterator(MapReduceParallelScanGrouper.getInstance());
Scan scan = queryPlan.getContext().getScan();
List<List<Scan>> scans = queryPlan.getScans();
for (List<Scan> sl : scans) {
System.out.println();
for (Scan s : sl) {
System.out.print(s);
}
}
con.close();
}
}
4个scan如下:
{"loadColumnFamiliesOnDemand":null,"startRow":"","stopRow":"CS","batch":-1,"cacheBlocks":true,"totalColumns":1,"maxResultSize":-1,"families":{"0":["ALL"]},"caching":100,"maxVersions":1,"timeRange":[0,1523338217847]}
{"loadColumnFamiliesOnDemand":null,"startRow":"CS","stopRow":"EU","batch":-1,"cacheBlocks":true,"totalColumns":1,"maxResultSize":-1,"families":{"0":["ALL"]},"caching":100,"maxVersions":1,"timeRange":[0,1523338217847]}
{"loadColumnFamiliesOnDemand":null,"startRow":"EU","stopRow":"NA","batch":-1,"cacheBlocks":true,"totalColumns":1,"maxResultSize":-1,"families":{"0":["ALL"]},"caching":100,"maxVersions":1,"timeRange":[0,1523338217847]}
{"loadColumnFamiliesOnDemand":null,"startRow":"NA","stopRow":"","batch":-1,"cacheBlocks":true,"totalColumns":1,"maxResultSize":-1,"families":{"0":["ALL"]},"caching":100,"maxVersions":1,"timeRange":[0,1523338217847]}Disconnected from the target VM, address: '127.0.0.1:63406', transport: 'socket'
Mapreduce atop Apache Phoenix (ScanPlan 初探)的更多相关文章
- Apache Phoenix基本操作-1
本篇我们将介绍phoenix的一些基本操作. 1. 如何使用Phoenix输出Hello World? 1.1 使用sqlline终端命令 sqlline.py SZB-L0023780:2181:/ ...
- Apache Phoenix系列 | 从入门到精通(转载)
原文地址:https://cloud.tencent.com/developer/article/1498057 来源: 云栖社区 作者: 瑾谦 By 大数据技术与架构 文章简介:Phoenix是一个 ...
- [saiku] 使用 Apache Phoenix and HBase 结合 saiku 做大数据查询分析
saiku不仅可以对传统的RDBMS里面的数据做OLAP分析,还可以对Nosql数据库如Hbase做统计分析. 本文简单介绍下一个使用saiku去查询分析hbase数据的例子. 1.phoenix和h ...
- Apache Phoenix JDBC 驱动和Spring JDBCTemplate的集成
介绍:Phoenix查询引擎会将SQL查询转换为一个或多个HBase scan,并编排运行以生成标准的JDBC结果集. 直接使用HBase API.协同处理器与自己定义过滤器.对于简单查询来说,其性能 ...
- phoenix 报错:type org.apache.phoenix.schema.types.PhoenixArray is not supported
今天用phoenix报如下错误: 主要原因: hbase的表中某字段类型是array,phoenix目前不支持此类型 解决方法: 复制替换phoenix包的cursor文件 # Copyright 2 ...
- org.apache.phoenix.exception.PhoenixIOException: SYSTEM:CATALOG
Error: SYSTEM:CATALOG (state=08000,code=101)org.apache.phoenix.exception.PhoenixIOException: SYSTEM: ...
- phoenix连接hbase数据库,创建二级索引报错:Error: org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=36, exceptions: Tue Mar 06 10:32:02 CST 2018, null, java.net.SocketTimeoutException: callTimeou
v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VM ...
- apache phoenix 安装试用
备注: 本次安装是在hbase docker 镜像的基础上配置的,主要是为了方便学习,而hbase搭建有觉得 有点费事,用镜像简单. 1. hbase 镜像 docker pull har ...
- How to use DBVisualizer to connect to Hbase using Apache Phoenix
How to use DBVisualizer to connect to Hbase using Apache Phoenix Article DB Visualizer is a popular ...
随机推荐
- saltstack的封装和内网使用
0.客户端使用 linux:把linux的ragent文件夹拷贝到内网linux /opt目录下,运行初始化脚本 salt服务端:# @Master:"/opt/ragent/python/ ...
- Kettle安装Kafka Consumer和Kafka Producer插件
1.从github上下载kettle的kafka插件,地址如下 Kafka Consumer地址: https://github.com/RuckusWirelessIL/pentaho-kafka- ...
- 编织织物的knit course direction and knit wale direction
来自:http://www.definetextile.com/2013/04/course-wale.html
- 尚硅谷springboot学习27-使用外置servlet容器
嵌入式Servlet容器:应用打成可执行的jar 优点:简单.便携: 缺点:默认不支持JSP.优化定制比较复杂(使用定制器[ServerProperties.自定义EmbeddedServle ...
- board_led.h/board_led.c
/******************************************************************************* Filename: board_led ...
- 用element-ui 时,报value.getTime is not a function错误:
在用element-ui 时,报value.getTime is not a function错误:错误分析:date-picker 的时间是格林威时间,如果Thu Jun 22 2017 19:07 ...
- java学习笔记(四):import语法
Import 语法是给编译器寻找特定类的适当位置的一种方法. 创建一个Employee 类,包括四个实体变量姓名(name),年龄(age),职位(designation)和薪水(salary). p ...
- matplotlib 坑
1 archlinux里安装好matplotlib之后一定要安装python-cario pacman -S python-cairo
- oracle 新增并返回新增的主键
oracle 的insert into 语句需要返回新增的主键的时候,可以使用一下insert 语法: insert into ims.t_bank_inquire_results (t_date,l ...
- swift 分组tableview 设置分区投或者尾部,隐藏默认间隔高度
1.隐藏尾部或者头部,配套使用 //注册头部id tv.register(JYWithdrawalRecordSectionView.self, forHeaderFooterViewReuseIde ...