Calcite分析

Calcite源码分析，参考：

http://matt33.com/2019/03/07/apache-calcite-process-flow/

https://matt33.com/2019/03/17/apache-calcite-planner/

Rule作为Calcite查询优化的核心，

具体看几个有代表性的Rule，看看是如何实现的

最简单的例子，Join结合律，JoinAssociateRule

首先所有的Rule都继承RelOptRule类

/**

 * Planner rule that changes a join based on the associativity rule.

 *

 * <p>((a JOIN b) JOIN c) &rarr; (a JOIN (b JOIN c))</p>

 *

 * <p>We do not need a rule to convert (a JOIN (b JOIN c)) &rarr;

 * ((a JOIN b) JOIN c) because we have

 * {@link JoinCommuteRule}.

 *

 * @see JoinCommuteRule

 */

public class JoinAssociateRule extends RelOptRule {

RelOptRule用于transform expression

它维护的Operand tree表明，该rule可以适用于何种树结构，具体看下面的例子

/**

 * A <code>RelOptRule</code> transforms an expression into another. It has a

 * list of {@link RelOptRuleOperand}s, which determine whether the rule can be

 * applied to a particular section of the tree.

 *

 * <p>The optimizer figures out which rules are applicable, then calls

 * {@link #onMatch} on each of them.</p>

 */

public abstract class RelOptRule {

  /**

   * Root of operand tree.

   */

  private final RelOptRuleOperand operand;

  /** Factory for a builder for relational expressions.

   *

   * <p>The actual builder is available via {@link RelOptRuleCall#builder()}. */

  public final RelBuilderFactory relBuilderFactory;

  /**

   * Flattened list of operands.

   */

  public final List<RelOptRuleOperand> operands;

  //~ Constructors -----------------------------------------------------------

  /**

   * Creates a rule with an explicit description.

   *

   * @param operand     root operand, must not be null

   * @param description Description, or null to guess description

   * @param relBuilderFactory Builder for relational expressions

   */

  public RelOptRule(RelOptRuleOperand operand,

      RelBuilderFactory relBuilderFactory, String description) {

    this.operand = Objects.requireNonNull(operand);

    this.relBuilderFactory = Objects.requireNonNull(relBuilderFactory);this.description = description;

    this.operands = flattenOperands(operand);

    assignSolveOrder();

  }

比如对于Join结合律，调用super，即RelOptRule的构造函数

  /**

   * Creates a JoinAssociateRule.

   */

  public JoinAssociateRule(RelBuilderFactory relBuilderFactory) {

    super(

        operand(Join.class,

            operand(Join.class, any()),

            operand(RelSubset.class, any())),

        relBuilderFactory, null);

  }

operand也是一个树形结构，Top的Operand的类型是Join，他有两个children，其中一个也是join，另一个是RelSubset

他们的children是any()

  /**

   * Creates a list of child operands that signifies that the operand matches

   * any number of child relational expressions.

   *

   * @return List of child operands that signifies that the operand matches

   *   any number of child relational expressions

   */

  public static RelOptRuleOperandChildren any() {

    return RelOptRuleOperandChildren.ANY_CHILDREN;

  }

然后要理解Rule如果被使用的，来看看HepPlaner里面的代码

因为HepPlaner逻辑比较简单，就是遍历所有的HepRelVertex，看看RelOptRule是否可以匹配，如果匹配就应用Rule

所以会调用到applyRule，输入包含Rule和Vertex

  private HepRelVertex applyRule(

      RelOptRule rule,

      HepRelVertex vertex,

      boolean forceConversions) {

    final List<RelNode> bindings = new ArrayList<>();

    final Map<RelNode, List<RelNode>> nodeChildren = new HashMap<>();

    boolean match =

        matchOperands(

            rule.getOperand(),

            vertex.getCurrentRel(),

            bindings,

            nodeChildren);

    if (!match) {

      return null;

    }

    HepRuleCall call =

        new HepRuleCall(

            this,

            rule.getOperand(),

            bindings.toArray(new RelNode[0]),

            nodeChildren,

            parents);

    // Allow the rule to apply its own side-conditions.

    if (!rule.matches(call)) {

      return null;

    }

    fireRule(call);

    if (!call.getResults().isEmpty()) {

      return applyTransformationResults(

          vertex,

          call,

          parentTrait);

    }

    return null;

  }

1. 首先调用，matchOperands，看看是否match，

  private boolean matchOperands(

      RelOptRuleOperand operand,  //Rule中的Operand

      RelNode rel,  //vertex中的RelNode

      List<RelNode> bindings,

      Map<RelNode, List<RelNode>> nodeChildren) {

    if (!operand.matches(rel)) { //先比较top node和operand是否match

      return false;

    }

    bindings.add(rel); //如果match，把top relnode加入bindings

    //接着来比较child是否match

    //child有几种类型：

    //Any，随意，直接返回true

    //Unordered，无序，对于每个operand只要有任何一个child RelNode可满足即可

    //Default，Some，Operand和RelNode严格有序匹配

    List<HepRelVertex> childRels = (List) rel.getInputs();

    switch (operand.childPolicy) {

    case ANY:

      return true;

    case UNORDERED:

      // For each operand, at least one child must match. If

      // matchAnyChildren, usually there's just one operand.

      for (RelOptRuleOperand childOperand : operand.getChildOperands()) {

        boolean match = false;

        for (HepRelVertex childRel : childRels) {

          match =

              matchOperands( //递归调用matchOperands

                  childOperand,

                  childRel.getCurrentRel(),

                  bindings,

                  nodeChildren);

          if (match) {

            break;

          }

        }

        if (!match) {

          return false;

        }

      }

      final List<RelNode> children = new ArrayList<>(childRels.size());

      for (HepRelVertex childRel : childRels) {

        children.add(childRel.getCurrentRel());

      }

      nodeChildren.put(rel, children);

      return true;

    default:

      int n = operand.getChildOperands().size();

      if (childRels.size() < n) {

        return false;

      }

      //一一按顺序对应match

      for (Pair<HepRelVertex, RelOptRuleOperand> pair

          : Pair.zip(childRels, operand.getChildOperands())) {

        boolean match =

            matchOperands(

                pair.right,

                pair.left.getCurrentRel(),

                bindings,

                nodeChildren);

        if (!match) {

          return false;

        }

      }

      return true;

    }

  }

逻辑是先比较自身，然后再递归比较children

比较函数，

可以看出，从类型，Trait，Predicate上来比较，是否匹配

 /**

   * Returns whether a relational expression matches this operand. It must be

   * of the right class and trait.

   */

  public boolean matches(RelNode rel) {

    if (!clazz.isInstance(rel)) {

      return false;

    }

    if ((trait != null) && !rel.getTraitSet().contains(trait)) {

      return false;

    }

    return predicate.test(rel);

  }

child的类型分为，

/**

 * Policy by which operands will be matched by relational expressions with

 * any number of children.

 */

public enum RelOptRuleOperandChildPolicy {

  /**

   * Signifies that operand can have any number of children.

   */

  ANY,

  /**

   * Signifies that operand has no children. Therefore it matches a

   * leaf node, such as a table scan or VALUES operator.

   *

   * <p>{@code RelOptRuleOperand(Foo.class, NONE)} is equivalent to

   * {@code RelOptRuleOperand(Foo.class)} but we prefer the former because

   * it is more explicit.</p>

   */

  LEAF,

  /**

   * Signifies that the operand's children must precisely match its

   * child operands, in order.

   */

  SOME,

  /**

   * Signifies that the rule matches any one of its parents' children.

   * The parent may have one or more children.

   */

  UNORDERED,

}

不同的类型按照注释中的规则进行match

2. 如果match，继续会把当前结果封装成HepRuleCall，继承自RelOptRuleCall

是RelOptRule的一次invocation，即会记录调用中用到的上下文数据和结果

/**

 * A <code>RelOptRuleCall</code> is an invocation of a {@link RelOptRule} with a

 * set of {@link RelNode relational expression}s as arguments.

 */

public abstract class RelOptRuleCall {

  /**

   * Generator for {@link #id} values.

   */

  private static int nextId = 0;

  //~ Instance fields --------------------------------------------------------

  public final int id;

  protected final RelOptRuleOperand operand0; //Rule的Operand树的root

  protected Map<RelNode, List<RelNode>> nodeInputs; //所有的RelNode和他们的inputs

  public final RelOptRule rule;

  public final RelNode[] rels; //match到的所有RelNodes

  private final RelOptPlanner planner;

  private final List<RelNode> parents; //对于Top RelNodes而言的parents

3. 调用rule.matches(call)

默认实现是，return true，即不检查，这里允许Rule加一下特定的match检测

4. 调用fireRule(call)

而其中主要是调用，ruleCall.getRule().onMatch(ruleCall);

所以做的事情是需要在Rule的onMatch里面定义的

下面看下JoinAssociateRule的实现，

  public void onMatch(final RelOptRuleCall call) {

    final Join topJoin = call.rel(0);

    final Join bottomJoin = call.rel(1);

    final RelNode relA = bottomJoin.getLeft();

    final RelNode relB = bottomJoin.getRight();

    final RelSubset relC = call.rel(2);

    final RelOptCluster cluster = topJoin.getCluster();

    final RexBuilder rexBuilder = cluster.getRexBuilder();

    if (relC.getConvention() != relA.getConvention()) {

      // relC could have any trait-set. But if we're matching say

      // EnumerableConvention, we're only interested in enumerable subsets.

      return;

    }

    //        topJoin

    //        /     \

    //   bottomJoin  C

    //    /    \

    //   A      B

    final int aCount = relA.getRowType().getFieldCount();

    final int bCount = relB.getRowType().getFieldCount();

    final int cCount = relC.getRowType().getFieldCount();

    final ImmutableBitSet aBitSet = ImmutableBitSet.range(0, aCount);

    final ImmutableBitSet bBitSet =

        ImmutableBitSet.range(aCount, aCount + bCount);

这部分都是在初始化，上面好理解，下面算这些count是干啥？

其实是在算RexInputRef，

RexNode和RelNode的共同点是，他们都代表一个树

差别是他们代表的东西不同，RelNode代表关系代数算子组成的数，而RexNode表示表达式树

比如RexLiteral表示常量，RexVariable表示变量，RexCall表示操作来连接Literal和Variable

RexVariable往往表示输入的某个field，为了效率，这里只会记录field的id，即这个RexInputRef

/**

 * Variable which references a field of an input relational expression.

 *

 * <p>Fields of the input are 0-based. If there is more than one input, they are

 * numbered consecutively. For example, if the inputs to a join are</p>

 *

 * <ul>

 * <li>Input #0: EMP(EMPNO, ENAME, DEPTNO) and</li>

 * <li>Input #1: DEPT(DEPTNO AS DEPTNO2, DNAME)</li>

 * </ul>

 *

 * <p>then the fields are:</p>

 *

 * <ul>

 * <li>Field #0: EMPNO</li>

 * <li>Field #1: ENAME</li>

 * <li>Field #2: DEPTNO (from EMP)</li>

 * <li>Field #3: DEPTNO2 (from DEPT)</li>

 * <li>Field #4: DNAME</li>

 * </ul>

 *

 * <p>So <code>RexInputRef(3, Integer)</code> is the correct reference for the

 * field DEPTNO2.</p>

 */

public class RexInputRef extends RexSlot {

而RexInputRef的index不是固定的，是按顺序排的，

所以按上面把fieldcount都算出来，就可以排出对应的index

接着做的一堆都是为了调整conditions的位置和相对应的index

    // Goal is to transform to

    //

    //       newTopJoin

    //        /     \

    //       A   newBottomJoin

    //               /    \

    //              B      C

    // Split the condition of topJoin and bottomJoin into a conjunctions. A

    // condition can be pushed down if it does not use columns from A.

    final List<RexNode> top = new ArrayList<>();

    final List<RexNode> bottom = new ArrayList<>();

    JoinPushThroughJoinRule.split(topJoin.getCondition(), aBitSet, top, bottom);

    JoinPushThroughJoinRule.split(bottomJoin.getCondition(), aBitSet, top,

        bottom);

    // Mapping for moving conditions from topJoin or bottomJoin to

    // newBottomJoin.

    // target: | B | C      |

    // source: | A       | B | C      |

    final Mappings.TargetMapping bottomMapping =

        Mappings.createShiftMapping(

            aCount + bCount + cCount,

            0, aCount, bCount,

            bCount, aCount + bCount, cCount);

    final List<RexNode> newBottomList = new ArrayList<>();

    new RexPermuteInputsShuttle(bottomMapping, relB, relC)

        .visitList(bottom, newBottomList);

    RexNode newBottomCondition =

        RexUtil.composeConjunction(rexBuilder, newBottomList);

1. 把原先TopJoin和BottomJoin里面的conditions，分成和A相关的和无关的

因为调整完以后，和A相关的conditions要放到TopJoin，而和A无关的需要放到BottomJoin

aBitSet里面记录了所有A的fields的index，

  /**

   * Splits a condition into conjunctions that do or do not intersect with

   * a given bit set.

   */

  static void split(

      RexNode condition,

      ImmutableBitSet bitSet,

      List<RexNode> intersecting,

      List<RexNode> nonIntersecting) {

    for (RexNode node : RelOptUtil.conjunctions(condition)) {//把conjunction的条件拆分开

      ImmutableBitSet inputBitSet = RelOptUtil.InputFinder.bits(node); //找出condition所用到的fields

      if (bitSet.intersects(inputBitSet)) {

        intersecting.add(node);

      } else {

        nonIntersecting.add(node);

      }

    }

  }

如何找出condition所用到的fields？

如下，在递归的过程做回把所有碰到的inputRef记录下来

    /**

     * Returns a bit set describing the inputs used by an expression.

     */

    public static ImmutableBitSet bits(RexNode node) {

      return analyze(node).inputBitSet.build();

    }

  /** Returns an input finder that has analyzed a given expression. */

    public static InputFinder analyze(RexNode node) {

      final InputFinder inputFinder = new InputFinder();

      node.accept(inputFinder); //Visitor模式

      return inputFinder;

    }

   //如果RexNode是InputRef，就记录下该Ref的index

    public Void visitInputRef(RexInputRef inputRef) {

      inputBitSet.set(inputRef.getIndex());

      return null;

    }

  //如果RexNode是Call，继续递归accept该Call的相关Operands

  public R visitCall(RexCall call) {

    if (!deep) {

      return null;

    }

    R r = null;

    for (RexNode operand : call.operands) {

      r = operand.accept(this);

    }

    return r;

  }

2. 由于树结构变了，会导致整个field的index发生改变

对于原先的，bottomJoin，A的field从0开始，B是从aCount开始
而对于变换后的，BottomJoin，B的field从0开始，C是从bCount开始

所以需要完成Ref的修正，

createShiftMapping，生成的mapping，包含原先的index，和当前新的index

第一个参数是index的范围，一共aCount + bCount + cCount，所以index需要小于这个值

然后，只有B，C的index需要调整，A的index不需要调整

其中，B的原先是从aCount开始，当前从0开始，一共bCount个fields；C的原先是从aCount + bCount开始，当前从bCount开始，一共cCount个fields

接着，RexPermuteInputsShuttle做具体的更新

/**

 * Shuttle which applies a permutation to its input fields.

 *

 * @see RexPermutationShuttle

 * @see RexUtil#apply(org.apache.calcite.util.mapping.Mappings.TargetMapping, RexNode)

 */

public class RexPermuteInputsShuttle extends RexShuttle {

  //~ Instance fields --------------------------------------------------------

  private final Mappings.TargetMapping mapping;

  private final ImmutableList<RelDataTypeField> fields;

  //对于每个InputRef，根据Mapping更新index

  @Override public RexNode visitInputRef(RexInputRef local) {

    final int index = local.getIndex();

    int target = mapping.getTarget(index);

    return new RexInputRef(

        target,

        local.getType());

  }

最后，构建新的join，

    RexNode newBottomCondition =

        RexUtil.composeConjunction(rexBuilder, newBottomList);

    final Join newBottomJoin =

        bottomJoin.copy(bottomJoin.getTraitSet(), newBottomCondition, relB,

            relC, JoinRelType.INNER, false);

    // Condition for newTopJoin consists of pieces from bottomJoin and topJoin.

    // Field ordinals do not need to be changed.

    RexNode newTopCondition = RexUtil.composeConjunction(rexBuilder, top);

    @SuppressWarnings("SuspiciousNameCombination")

    final Join newTopJoin =

        topJoin.copy(topJoin.getTraitSet(), newTopCondition, relA,

            newBottomJoin, JoinRelType.INNER, false);

    call.transformTo(newTopJoin);

Calcite分析 - Rule的更多相关文章

Calcite分析 - RelTrait
RelTrait 表示RelNode的物理属性由RelTraitDef代表RelTrait的类型 /** * RelTrait represents the manifestation of a r ...
fxjwind Calcite分析 - Volcano模型
参考,https://matt33.com/2019/03/17/apache-calcite-planner/ Volcano模型使用,分为下面几个步骤, //1. 初始化 VolcanoPlann ...
JUnit源码分析 - 扩展 - 自定义Rule
JUnit Rule简述 Rule是JUnit 4.7之后新加入的特性,有点类似于拦截器,可以在测试类或测试方法执行前后添加额外的处理,本质上是对@BeforeClass, @AfterClass, ...
[源码分析]从"UDF不应有状态" 切入来剖析Flink SQL代码生成 (修订版)
[源码分析]从"UDF不应有状态" 切入来剖析Flink SQL代码生成 (修订版) 目录 [源码分析]从"UDF不应有状态" 切入来剖析Flink SQL代码 ...
Flink sql 之两阶段聚合与 TwoStageOptimizedAggregateRule（源码分析）
本文源码基于flink1.14 上一篇文章分析了<flink的minibatch微批处理>的源码乘热打铁分析一下两阶段聚合的源码,因为使用两阶段要先开启minibatch,至于为什么后面 ...
flask/app.py-add_url_rule源码分析
之前分析route方法的时候,可以看到中间会调用add_url_rule方法,add_url_rule方法和route方法一样属于Flask这个类的. add_url_rule方法主要用来连接url规 ...
Hive使用Calcite CBO优化流程及SQL优化实战
目录 Hive SQL执行流程 Hive debug简单介绍 Hive SQL执行流程 Hive 使用Calcite优化 Hive Calcite优化流程 Hive Calcite使用细则 Hive向 ...
Flink sql 之微批处理与MiniBatchIntervalInferRule (源码分析)
本文源码基于flink1.14 平台用户在使用我们的flinkSql时经常会开启minaBatch来优化状态读写所以从源码的角度具体解读一下miniBatch的原理先看一下flinksql是如何触 ...
calcite物化视图详解
概述物化视图和视图类似,反映的是某个查询的结果,但是和视图仅保存SQL定义不同,物化视图本身会存储数据,因此是物化了的视图. 当用户查询的时候,原先创建的物化视图会注册到优化器中,用户的查询命中物化 ...

随机推荐

Spring+Velocity+Mybatis入门（step by step）
一.开发工具开发过程中使用的操作系统是OS X,关于软件安装的问题请大家移步高效的Mac环境设置. 本文是我对自己学习过程的一个回顾,应该还有不少问题待改进,例如目录的设置.编码习惯和配置文件的处理 ...
CspParameters 对象已存在异常解决多应用对同一容器的访问问题
CspParameters cspParams; cspParams = new CspParameters(PROVIDER_RSA_FULL); cspParams.KeyContainerNam ...
【Oracle RAC】Linux系统Oracle18c RAC安装配置详细记录过程（图文并茂）
本文Oracle 18c GI/RAC on Oracle Linux step-by-step 的安装配置步骤,同时也包含dbca 创建数据库的过程. 1. 关闭SELINUX,防火墙vi /etc ...
进程中join方法的使用
在进程中:join方法是让主进程等待子进程运行完毕后再执行主进程的.(即主进程阻塞) 示例 # -*- coding: utf-8 -*- from multiprocessing import P ...
Linux操作系统的计划任务
Linux操作系统的计划任务作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.任务计划概述 Linux任务计划.周期性任务执行未来的某时间点执行一次任务: at: 指定时间点, ...
mybatis配置打印sql
mybatis配置打印sql: <settings> <setting name="logImpl" value="STDOUT_LOGGING&quo ...
ElementUI——动态表单验证
前言版本更新迭代的时候,需要用到一个动态表单的功能,ElementUI刚好有教程就改改用咯步骤代码  <el-form-item v-for ...
janusgraph-mgmt中的一些操作
关闭事务 mgmt = graph.openManagement(); ids = mgmt.getOpenInstances(); for(String id : ids){if(!id.conta ...
使用postman上传excel文件测试导入excel
今日思语:城市的生活很快,有时学会让自己慢下来,慢慢来对于做一些文件上传操作时,一般我们是直接在前端页面加入类型为file的input标签,也可以使用postman来进行文件的上传测试,如下: po ...
git工具免密拉取、推送
很苦恼每次都要配置明文密码才能正常工作其实也可以配置成非明文打开控制面板 →用户账号管理 Windows凭证对应修改响应网址即可

Calcite分析 - Rule

Calcite分析 - Rule的更多相关文章

随机推荐

热门专题