Calcite分析 - Rule
Calcite源码分析,参考:
http://matt33.com/2019/03/07/apache-calcite-process-flow/
https://matt33.com/2019/03/17/apache-calcite-planner/
Rule作为Calcite查询优化的核心,
具体看几个有代表性的Rule,看看是如何实现的
最简单的例子,Join结合律,JoinAssociateRule
首先所有的Rule都继承RelOptRule类
/**
* Planner rule that changes a join based on the associativity rule.
*
* <p>((a JOIN b) JOIN c) → (a JOIN (b JOIN c))</p>
*
* <p>We do not need a rule to convert (a JOIN (b JOIN c)) →
* ((a JOIN b) JOIN c) because we have
* {@link JoinCommuteRule}.
*
* @see JoinCommuteRule
*/
public class JoinAssociateRule extends RelOptRule {
RelOptRule用于transform expression
它维护的Operand tree表明,该rule可以适用于何种树结构,具体看下面的例子
/**
* A <code>RelOptRule</code> transforms an expression into another. It has a
* list of {@link RelOptRuleOperand}s, which determine whether the rule can be
* applied to a particular section of the tree.
*
* <p>The optimizer figures out which rules are applicable, then calls
* {@link #onMatch} on each of them.</p>
*/
public abstract class RelOptRule { /**
* Root of operand tree.
*/
private final RelOptRuleOperand operand; /** Factory for a builder for relational expressions.
*
* <p>The actual builder is available via {@link RelOptRuleCall#builder()}. */
public final RelBuilderFactory relBuilderFactory; /**
* Flattened list of operands.
*/
public final List<RelOptRuleOperand> operands; //~ Constructors ----------------------------------------------------------- /**
* Creates a rule with an explicit description.
*
* @param operand root operand, must not be null
* @param description Description, or null to guess description
* @param relBuilderFactory Builder for relational expressions
*/
public RelOptRule(RelOptRuleOperand operand,
RelBuilderFactory relBuilderFactory, String description) {
this.operand = Objects.requireNonNull(operand);
this.relBuilderFactory = Objects.requireNonNull(relBuilderFactory);this.description = description;
this.operands = flattenOperands(operand);
assignSolveOrder();
}
比如对于Join结合律,调用super,即RelOptRule的构造函数
/**
* Creates a JoinAssociateRule.
*/
public JoinAssociateRule(RelBuilderFactory relBuilderFactory) {
super(
operand(Join.class,
operand(Join.class, any()),
operand(RelSubset.class, any())),
relBuilderFactory, null);
}
operand也是一个树形结构,Top的Operand的类型是Join,他有两个children,其中一个也是join,另一个是RelSubset
他们的children是any()
/**
* Creates a list of child operands that signifies that the operand matches
* any number of child relational expressions.
*
* @return List of child operands that signifies that the operand matches
* any number of child relational expressions
*/
public static RelOptRuleOperandChildren any() {
return RelOptRuleOperandChildren.ANY_CHILDREN;
}
然后要理解Rule如果被使用的,来看看HepPlaner里面的代码
因为HepPlaner逻辑比较简单,就是遍历所有的HepRelVertex,看看RelOptRule是否可以匹配,如果匹配就应用Rule
所以会调用到applyRule,输入包含Rule和Vertex
private HepRelVertex applyRule(
RelOptRule rule,
HepRelVertex vertex,
boolean forceConversions) { final List<RelNode> bindings = new ArrayList<>();
final Map<RelNode, List<RelNode>> nodeChildren = new HashMap<>();
boolean match =
matchOperands(
rule.getOperand(),
vertex.getCurrentRel(),
bindings,
nodeChildren); if (!match) {
return null;
} HepRuleCall call =
new HepRuleCall(
this,
rule.getOperand(),
bindings.toArray(new RelNode[0]),
nodeChildren,
parents); // Allow the rule to apply its own side-conditions.
if (!rule.matches(call)) {
return null;
} fireRule(call); if (!call.getResults().isEmpty()) {
return applyTransformationResults(
vertex,
call,
parentTrait);
}
return null;
}
1. 首先调用,matchOperands,看看是否match,
private boolean matchOperands(
RelOptRuleOperand operand, //Rule中的Operand
RelNode rel, //vertex中的RelNode
List<RelNode> bindings,
Map<RelNode, List<RelNode>> nodeChildren) {
if (!operand.matches(rel)) { //先比较top node和operand是否match
return false;
}
bindings.add(rel); //如果match,把top relnode加入bindings
//接着来比较child是否match
//child有几种类型:
//Any,随意,直接返回true
//Unordered,无序,对于每个operand只要有任何一个child RelNode可满足即可
//Default,Some,Operand和RelNode严格有序匹配
List<HepRelVertex> childRels = (List) rel.getInputs();
switch (operand.childPolicy) {
case ANY:
return true;
case UNORDERED:
// For each operand, at least one child must match. If
// matchAnyChildren, usually there's just one operand.
for (RelOptRuleOperand childOperand : operand.getChildOperands()) {
boolean match = false;
for (HepRelVertex childRel : childRels) {
match =
matchOperands( //递归调用matchOperands
childOperand,
childRel.getCurrentRel(),
bindings,
nodeChildren);
if (match) {
break;
}
}
if (!match) {
return false;
}
}
final List<RelNode> children = new ArrayList<>(childRels.size());
for (HepRelVertex childRel : childRels) {
children.add(childRel.getCurrentRel());
}
nodeChildren.put(rel, children);
return true;
default:
int n = operand.getChildOperands().size();
if (childRels.size() < n) {
return false;
}
//一一按顺序对应match
for (Pair<HepRelVertex, RelOptRuleOperand> pair
: Pair.zip(childRels, operand.getChildOperands())) {
boolean match =
matchOperands(
pair.right,
pair.left.getCurrentRel(),
bindings,
nodeChildren);
if (!match) {
return false;
}
}
return true;
}
}
逻辑是先比较自身,然后再递归比较children
比较函数,
可以看出,从类型,Trait,Predicate上来比较,是否匹配
/**
* Returns whether a relational expression matches this operand. It must be
* of the right class and trait.
*/
public boolean matches(RelNode rel) {
if (!clazz.isInstance(rel)) {
return false;
}
if ((trait != null) && !rel.getTraitSet().contains(trait)) {
return false;
}
return predicate.test(rel);
}
child的类型分为,
/**
* Policy by which operands will be matched by relational expressions with
* any number of children.
*/
public enum RelOptRuleOperandChildPolicy {
/**
* Signifies that operand can have any number of children.
*/
ANY, /**
* Signifies that operand has no children. Therefore it matches a
* leaf node, such as a table scan or VALUES operator.
*
* <p>{@code RelOptRuleOperand(Foo.class, NONE)} is equivalent to
* {@code RelOptRuleOperand(Foo.class)} but we prefer the former because
* it is more explicit.</p>
*/
LEAF, /**
* Signifies that the operand's children must precisely match its
* child operands, in order.
*/
SOME, /**
* Signifies that the rule matches any one of its parents' children.
* The parent may have one or more children.
*/
UNORDERED,
}
不同的类型按照注释中的规则进行match
2. 如果match,继续会把当前结果封装成HepRuleCall,继承自RelOptRuleCall
是RelOptRule的一次invocation,即会记录调用中用到的上下文数据和结果
/**
* A <code>RelOptRuleCall</code> is an invocation of a {@link RelOptRule} with a
* set of {@link RelNode relational expression}s as arguments.
*/
public abstract class RelOptRuleCall { /**
* Generator for {@link #id} values.
*/
private static int nextId = 0; //~ Instance fields -------------------------------------------------------- public final int id;
protected final RelOptRuleOperand operand0; //Rule的Operand树的root
protected Map<RelNode, List<RelNode>> nodeInputs; //所有的RelNode和他们的inputs
public final RelOptRule rule;
public final RelNode[] rels; //match到的所有RelNodes
private final RelOptPlanner planner;
private final List<RelNode> parents; //对于Top RelNodes而言的parents
3. 调用rule.matches(call)
默认实现是,return true,即不检查,这里允许Rule加一下特定的match检测
4. 调用fireRule(call)
而其中主要是调用,ruleCall.getRule().onMatch(ruleCall);
所以做的事情是需要在Rule的onMatch里面定义的
下面看下JoinAssociateRule的实现,
public void onMatch(final RelOptRuleCall call) {
final Join topJoin = call.rel(0);
final Join bottomJoin = call.rel(1);
final RelNode relA = bottomJoin.getLeft();
final RelNode relB = bottomJoin.getRight();
final RelSubset relC = call.rel(2);
final RelOptCluster cluster = topJoin.getCluster();
final RexBuilder rexBuilder = cluster.getRexBuilder();
if (relC.getConvention() != relA.getConvention()) {
// relC could have any trait-set. But if we're matching say
// EnumerableConvention, we're only interested in enumerable subsets.
return;
}
// topJoin
// / \
// bottomJoin C
// / \
// A B
final int aCount = relA.getRowType().getFieldCount();
final int bCount = relB.getRowType().getFieldCount();
final int cCount = relC.getRowType().getFieldCount();
final ImmutableBitSet aBitSet = ImmutableBitSet.range(0, aCount);
final ImmutableBitSet bBitSet =
ImmutableBitSet.range(aCount, aCount + bCount);
这部分都是在初始化,上面好理解,下面算这些count是干啥?
其实是在算RexInputRef,
RexNode和RelNode的共同点是,他们都代表一个树
差别是他们代表的东西不同,RelNode代表关系代数算子组成的数,而RexNode表示表达式树
比如RexLiteral表示常量,RexVariable表示变量,RexCall表示操作来连接Literal和Variable
RexVariable往往表示输入的某个field,为了效率,这里只会记录field的id,即这个RexInputRef
/**
* Variable which references a field of an input relational expression.
*
* <p>Fields of the input are 0-based. If there is more than one input, they are
* numbered consecutively. For example, if the inputs to a join are</p>
*
* <ul>
* <li>Input #0: EMP(EMPNO, ENAME, DEPTNO) and</li>
* <li>Input #1: DEPT(DEPTNO AS DEPTNO2, DNAME)</li>
* </ul>
*
* <p>then the fields are:</p>
*
* <ul>
* <li>Field #0: EMPNO</li>
* <li>Field #1: ENAME</li>
* <li>Field #2: DEPTNO (from EMP)</li>
* <li>Field #3: DEPTNO2 (from DEPT)</li>
* <li>Field #4: DNAME</li>
* </ul>
*
* <p>So <code>RexInputRef(3, Integer)</code> is the correct reference for the
* field DEPTNO2.</p>
*/
public class RexInputRef extends RexSlot {
而RexInputRef的index不是固定的,是按顺序排的,
所以按上面把fieldcount都算出来,就可以排出对应的index
接着做的一堆都是为了调整conditions的位置和相对应的index
// Goal is to transform to
//
// newTopJoin
// / \
// A newBottomJoin
// / \
// B C // Split the condition of topJoin and bottomJoin into a conjunctions. A
// condition can be pushed down if it does not use columns from A.
final List<RexNode> top = new ArrayList<>();
final List<RexNode> bottom = new ArrayList<>();
JoinPushThroughJoinRule.split(topJoin.getCondition(), aBitSet, top, bottom);
JoinPushThroughJoinRule.split(bottomJoin.getCondition(), aBitSet, top,
bottom); // Mapping for moving conditions from topJoin or bottomJoin to
// newBottomJoin.
// target: | B | C |
// source: | A | B | C |
final Mappings.TargetMapping bottomMapping =
Mappings.createShiftMapping(
aCount + bCount + cCount,
0, aCount, bCount,
bCount, aCount + bCount, cCount);
final List<RexNode> newBottomList = new ArrayList<>();
new RexPermuteInputsShuttle(bottomMapping, relB, relC)
.visitList(bottom, newBottomList);
RexNode newBottomCondition =
RexUtil.composeConjunction(rexBuilder, newBottomList);
1. 把原先TopJoin和BottomJoin里面的conditions,分成和A相关的和无关的
因为调整完以后,和A相关的conditions要放到TopJoin,而和A无关的需要放到BottomJoin
aBitSet里面记录了所有A的fields的index,
/**
* Splits a condition into conjunctions that do or do not intersect with
* a given bit set.
*/
static void split(
RexNode condition,
ImmutableBitSet bitSet,
List<RexNode> intersecting,
List<RexNode> nonIntersecting) {
for (RexNode node : RelOptUtil.conjunctions(condition)) {//把conjunction的条件拆分开
ImmutableBitSet inputBitSet = RelOptUtil.InputFinder.bits(node); //找出condition所用到的fields
if (bitSet.intersects(inputBitSet)) {
intersecting.add(node);
} else {
nonIntersecting.add(node);
}
}
}
如何找出condition所用到的fields?
如下,在递归的过程做回把所有碰到的inputRef记录下来
/**
* Returns a bit set describing the inputs used by an expression.
*/
public static ImmutableBitSet bits(RexNode node) {
return analyze(node).inputBitSet.build();
} /** Returns an input finder that has analyzed a given expression. */
public static InputFinder analyze(RexNode node) {
final InputFinder inputFinder = new InputFinder();
node.accept(inputFinder); //Visitor模式
return inputFinder;
} //如果RexNode是InputRef,就记录下该Ref的index
public Void visitInputRef(RexInputRef inputRef) {
inputBitSet.set(inputRef.getIndex());
return null;
} //如果RexNode是Call,继续递归accept该Call的相关Operands
public R visitCall(RexCall call) {
if (!deep) {
return null;
} R r = null;
for (RexNode operand : call.operands) {
r = operand.accept(this);
}
return r;
}
2. 由于树结构变了,会导致整个field的index发生改变
对于原先的,bottomJoin,A的field从0开始,B是从aCount开始
而对于变换后的,BottomJoin,B的field从0开始,C是从bCount开始
所以需要完成Ref的修正,
createShiftMapping,生成的mapping,包含原先的index,和当前新的index
第一个参数是index的范围,一共aCount + bCount + cCount,所以index需要小于这个值
然后,只有B,C的index需要调整,A的index不需要调整
其中,B的原先是从aCount开始,当前从0开始,一共bCount个fields;C的原先是从aCount + bCount开始,当前从bCount开始,一共cCount个fields
接着,RexPermuteInputsShuttle做具体的更新
/**
* Shuttle which applies a permutation to its input fields.
*
* @see RexPermutationShuttle
* @see RexUtil#apply(org.apache.calcite.util.mapping.Mappings.TargetMapping, RexNode)
*/
public class RexPermuteInputsShuttle extends RexShuttle {
//~ Instance fields -------------------------------------------------------- private final Mappings.TargetMapping mapping;
private final ImmutableList<RelDataTypeField> fields; //对于每个InputRef,根据Mapping更新index
@Override public RexNode visitInputRef(RexInputRef local) {
final int index = local.getIndex();
int target = mapping.getTarget(index);
return new RexInputRef(
target,
local.getType());
}
最后,构建新的join,
RexNode newBottomCondition =
RexUtil.composeConjunction(rexBuilder, newBottomList); final Join newBottomJoin =
bottomJoin.copy(bottomJoin.getTraitSet(), newBottomCondition, relB,
relC, JoinRelType.INNER, false); // Condition for newTopJoin consists of pieces from bottomJoin and topJoin.
// Field ordinals do not need to be changed.
RexNode newTopCondition = RexUtil.composeConjunction(rexBuilder, top);
@SuppressWarnings("SuspiciousNameCombination")
final Join newTopJoin =
topJoin.copy(topJoin.getTraitSet(), newTopCondition, relA,
newBottomJoin, JoinRelType.INNER, false); call.transformTo(newTopJoin);
Calcite分析 - Rule的更多相关文章
- Calcite分析 - RelTrait
RelTrait 表示RelNode的物理属性 由RelTraitDef代表RelTrait的类型 /** * RelTrait represents the manifestation of a r ...
- fxjwind Calcite分析 - Volcano模型
参考,https://matt33.com/2019/03/17/apache-calcite-planner/ Volcano模型使用,分为下面几个步骤, //1. 初始化 VolcanoPlann ...
- JUnit源码分析 - 扩展 - 自定义Rule
JUnit Rule简述 Rule是JUnit 4.7之后新加入的特性,有点类似于拦截器,可以在测试类或测试方法执行前后添加额外的处理,本质上是对@BeforeClass, @AfterClass, ...
- [源码分析]从"UDF不应有状态" 切入来剖析Flink SQL代码生成 (修订版)
[源码分析]从"UDF不应有状态" 切入来剖析Flink SQL代码生成 (修订版) 目录 [源码分析]从"UDF不应有状态" 切入来剖析Flink SQL代码 ...
- Flink sql 之 两阶段聚合与 TwoStageOptimizedAggregateRule(源码分析)
本文源码基于flink1.14 上一篇文章分析了<flink的minibatch微批处理>的源码 乘热打铁分析一下两阶段聚合的源码,因为使用两阶段要先开启minibatch,至于为什么后面 ...
- flask/app.py-add_url_rule源码分析
之前分析route方法的时候,可以看到中间会调用add_url_rule方法,add_url_rule方法和route方法一样属于Flask这个类的. add_url_rule方法主要用来连接url规 ...
- Hive使用Calcite CBO优化流程及SQL优化实战
目录 Hive SQL执行流程 Hive debug简单介绍 Hive SQL执行流程 Hive 使用Calcite优化 Hive Calcite优化流程 Hive Calcite使用细则 Hive向 ...
- Flink sql 之 微批处理与MiniBatchIntervalInferRule (源码分析)
本文源码基于flink1.14 平台用户在使用我们的flinkSql时经常会开启minaBatch来优化状态读写 所以从源码的角度具体解读一下miniBatch的原理 先看一下flinksql是如何触 ...
- calcite物化视图详解
概述 物化视图和视图类似,反映的是某个查询的结果,但是和视图仅保存SQL定义不同,物化视图本身会存储数据,因此是物化了的视图. 当用户查询的时候,原先创建的物化视图会注册到优化器中,用户的查询命中物化 ...
随机推荐
- linux技能点七 shell
shell脚本:定义,连接符,输入输出流,消息重定向,命令的退出状态,申明变量,运算符,控制语句 定义:linux下的多命令操作文件 连接符: ::用于命令的分隔符,命令会从左往右依次执行 & ...
- CspParameters 对象已存在异常 解决多应用对同一容器的访问问题
CspParameters cspParams; cspParams = new CspParameters(PROVIDER_RSA_FULL); cspParams.KeyContainerNam ...
- .net core中使用Quartz任务调度
使用xml配置Quartz任务调度程序 1.Nuget Install-Package Quartz Install-Package Quartz.Plugins 2.站点根目录下加入文件quartz ...
- Linux环境变量设置declare/typeset
形而上,质在内!形形色色,追寻本质! declare/typeset declare 或 typeset 是一样的功能,就是在宣告变数的属性 declare 后面并没有接任何参数,那么bash 就会主 ...
- [FreeRTOS]FreeRTOS使用
转自:https://blog.csdn.net/zhzht19861011/article/details/49819109 FreeRTOS系列第1篇---为什么选择FreeRTOS? FreeR ...
- 网络空间安全基础篇(2)----wireshark
wireshrak是一款通过本地端口,来抓取通过以太网获取的文件包,包括SMB HTTP FTP TELNET 等包 在网络安全比赛中最常用的就是HTTP协议,TELNET协议,FTP协议,SMB协议 ...
- c++实现按行读取文本文件
包含头文件fstream既可以读又可以写(我的理解是头文件fstream中包含ifstream和ofstream),可以同时创建ifstream对象和ofstream对象,分别实现读写:也可以直接创建 ...
- MyBatis框架之注解开发
MyBatis注解开发 @Insert注解注解属性value:写入SQL语句 @Options注解实现添加新数据的主键封装注解属性useGeneratedKeys:使用生成的主键,配置为truekey ...
- BZOJ-1975: 魔法猪学院 (K短路:A*+SPFA)
题意:有N种化学元素,有M种转化关系,(u,v,L)表示化学物质由u变为v需要L能量,现在你有E能量,问最多有多少种不同的途径,使得1转为为N,且总能量不超过E. 思路:可以转为为带权有向图,即是求前 ...
- Top 20 IoT Platforms in 2018
https://internetofthingswiki.com/top-20-iot-platforms/634/ After learning what is the internet of th ...