Joins https://docs.oracle.com/database/121/TGSQL/tgsql_join.htm#TGSQL242

tidb/index_lookup_hash_join.go at master · pingcap/tidb https://github.com/pingcap/tidb/blob/master/executor/index_lookup_hash_join.go

Hash Join: Basic Steps

The optimizer uses the smaller data source to build a hash table on the join key in memory, and then scans the larger table to find the joined rows.

The basic steps are as follows:

  1. The database performs a full scan of the smaller data set, called the build table, and then applies a hash function to the join key in each row to build a hash table in the PGA.

    In pseudocode, the algorithm might look as follows:

    FOR small_table_row IN (SELECT * FROM small_table)
    LOOP
    slot_number := HASH(small_table_row.join_key);
    INSERT_HASH_TABLE(slot_number,small_table_row);
    END LOOP;
  2. The database probes the second data set, called the probe table, using whichever access mechanism has the lowest cost.

    Typically, the database performs a full scan of both the smaller and larger data set. The algorithm in pseudocode might look as follows:

    FOR large_table_row IN (SELECT * FROM large_table)
    LOOP
    slot_number := HASH(large_table_row.join_key);
    small_table_row = LOOKUP_HASH_TABLE(slot_number,large_table_row.join_key);
    IF small_table_row FOUND
    THEN
    output small_table_row + large_table_row;
    END IF;
    END LOOP;

    For each row retrieved from the larger data set, the database does the following:

    1. Applies the same hash function to the join column or columns to calculate the number of the relevant slot in the hash table.

      For example, to probe the hash table for department ID 30, the database applies the hash function to 30, which generates the hash value 4.

    2. Probes the hash table to determine whether rows exists in the slot.

      If no rows exist, then the database processes the next row in the larger data set. If rows exist, then the database proceeds to the next step.

    3. Checks the join column or columns for a match. If a match occurs, then the database either reports the rows or passes them to the next step in the plan, and then processes the next row in the larger data set.

      If multiple rows exist in the hash table slot, the database walks through the linked list of rows, checking each one. For example, if department 30 hashes to slot 4, then the database checks each row until it finds 30.

Example 9-4 Hash Joins

An application queries the oe.orders and oe.order_items tables, joining on the order_id column.

SELECT o.customer_id, l.unit_price * l.quantity
FROM orders o, order_items l
WHERE l.order_id = o.order_id;

The execution plan is as follows:

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 665 | 13300 | 8 (25)|
|* 1 | HASH JOIN | | 665 | 13300 | 8 (25)|
| 2 | TABLE ACCESS FULL | ORDERS | 105 | 840 | 4 (25)|
| 3 | TABLE ACCESS FULL | ORDER_ITEMS | 665 | 7980 | 4 (25)|
-------------------------------------------------------------------------- Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("L"."ORDER_ID"="O"."ORDER_ID")

Because the orders table is small relative to the order_items table, which is 6 times larger, the database hashes orders. In a hash join, the data set for the build table always appears first in the list of operations (Step 2). In Step 3, the database performs a full scan of the larger order_items later, probing the hash table for each row.

How Hash Joins Work When the Hash Table Does Not Fit in the PGA

The database must use a different technique when the hash table does not fit entirely in the PGA. In this case, the database uses a temporary space to hold portions (called partitions) of the hash table, and sometimes portions of the larger table that probes the hash table.

The basic process is as follows:

  1. The database performs a full scan of the smaller data set, and then builds an array of hash buckets in both the PGA and on disk.

    When the PGA hash area fills up, the database finds the largest partition within the hash table and writes it to temporary space on disk. The database stores any new row that belongs to this on-disk partition on disk, and all other rows in the PGA. Thus, part of the hash table is in memory and part of it on disk.

  2. The database takes a first pass at reading the other data set.

    For each row, the database does the following:

    1. Applies the same hash function to the join column or columns to calculate the number of the relevant hash bucket.

    2. Probes the hash table to determine whether rows exist in the bucket in memory.

      If the hashed value points to a row in memory, then the database completes the join and returns the row. If the value points to a hash partition on disk, however, then the database stores this row in the temporary tablespace, using the same partitioning scheme used for the original data set.

  3. The database reads each on-disk temporary partition one by one

  4. The database joins each partition row to the row in the corresponding on-disk temporary partition.

Hash Join Controls

The USE_HASH hint instructs the optimizer to use a hash join when joining two tables together.

Hash Join: Basic Steps的更多相关文章

  1. SQL Tuning 基础概述06 - 表的关联方式:Nested Loops Join,Merge Sort Join & Hash Join

    nested loops join(嵌套循环)   驱动表返回几条结果集,被驱动表访问多少次,有驱动顺序,无须排序,无任何限制. 驱动表限制条件有索引,被驱动表连接条件有索引. hints:use_n ...

  2. Sort merge join、Nested loops、Hash join(三种连接类型)

    目前为止,典型的连接类型有3种: Sort merge join(SMJ排序-合并连接):首先生产driving table需要的数据,然后对这些数据按照连接操作关联列进行排序:然后生产probed ...

  3. 视图合并、hash join连接列数据分布不均匀引发的惨案

    表大小 SQL> select count(*) from agent.TB_AGENT_INFO; COUNT(*) ---------- 1751 SQL> select count( ...

  4. 最新电Call记录统计-full hash join用法

    declare @time datetime set @time='2016-07-01' --最新的电Call记录统计查询--SELECT t.zuoxi1,t.PhoneCount,t.Phone ...

  5. Sql优化(一) Merge Join vs. Hash Join vs. Nested Loop

    原创文章,首发自本人个人博客站点,转载请务必注明出自http://www.jasongj.com Nested Loop,Hash Join,Merge Join介绍 Nested Loop: 对于被 ...

  6. Oracle 表的连接方式(2)-----HASH JOIN的基本机制3

    HASH JOIN的模式 hash join有三种工作模式,分别是optimal模式,onepass模式和multipass模式,分别在v$sysstat里面有对应的统计信息: SQL> sel ...

  7. Oracle 表的连接方式(2)-----HASH JOIN的基本机制2

    Hash算法原理 对于什么是Hash算法原理?这个问题有点难度,不是很好说清楚,来做一个比喻吧:我们有很多的小猪,每个的体重都不一样,假设体重分布比较平均(我们考虑到公斤级别),我们按照体重来分,划分 ...

  8. Oracle 表的连接方式(2)-----HASH JOIN的基本机制1

    我们对hash join的常见误解,一般包括两个: 第一个误解:是我们经常以为hash join需要对两个做join的表都做全表扫描 第二个误解:是经常以为hash join会选择比较小的表做buil ...

  9. SQL Server的三种物理连接之Hash Join(三)

    简介 在 SQL Server 2012 在一些特殊的例子下会看到下面的图标: Hash Join分为两个阶段,分别为生成和探测阶段. 首先是生成阶段,将输入源中的每一个条目经过散列函数的计算都放到不 ...

随机推荐

  1. [Python] iupdatable包:Status 模块使用介绍

    常用状态做的一个集合,方便用在函数返回值中区分不同状态结果. 简单举例: from iupdatable import Status def fun(): print("do somethi ...

  2. mysql数据安全之利用二进制日志mysqlbinlog恢复数据

    mysql数据安全之利用二进制日志mysqlbinlog恢复数据 简介:如何利用二进制日志来恢复数据 查看二进制日志文件的内容报错: [root@xdclass-public log_bin]# my ...

  3. centos7中redis安装配置

    1.官网下载对应版本,本例以5.0.5为例 2.tar -zxvf xxxxx 并mv到安装目录 3.进入redis-5.0.5目录下,执行编译命令 make 4.编译完成后,经redis安装到指定目 ...

  4. java内部类对象使用.this,.new

    public class InnerClass { class Content { private int i; public int getVlaue() { return i; } } class ...

  5. 多线程,线程类三种方式,线程调度,线程同步,死锁,线程间的通信,阻塞队列,wait和sleep区别?

    重难点梳理 知识点梳理 学习目标 1.能够知道什么是进程什么是线程(进程和线程的概述,多进程和多线程的意义) 2.能够掌握线程常见API的使用 3.能够理解什么是线程安全问题 4.能够知道什么是锁 5 ...

  6. 如何在Spring Boot项目中集成微信支付V3

    Payment Spring Boot 是微信支付V3的Java实现,仅仅依赖Spring内置的一些类库.配置简单方便,可以让开发者快速为Spring Boot应用接入微信支付. 演示例子: paym ...

  7. 使用sublime text3搭建Python编辑环境

    最近在工作遇到一个难题. 我所在的测试组有一套PC软件前端自动化工程,在进行自动化测试时,需要在一台古老的xp机器上运行,但这台古老的xp机器带给我诸多烦恼,特别是使用Pycharm编辑器时,我遇到了 ...

  8. P3714 [BJOI2017]树的难题 点分治+线段树合并

    题目描述 题目传送门 分析 路径问题考虑点分治 对于一个分治中心,我们可以很容易地得到从它开始的一条路径的价值和长度 问题就是如何将不同的路径合并 很显然,对于同一个子树中的所有路径,它们起始的颜色是 ...

  9. LeetCode704 二分查找

    给定一个 n 个元素有序的(升序)整型数组 nums 和一个目标值 target  ,写一个函数搜索 nums 中的 target,如果目标值存在返回下标,否则返回 -1. 示例 1: 输入: num ...

  10. 剑指offer 查找和排序的基本操作:查找排序算法大集合

    重点 查找算法着重掌握:顺序查找.二分查找.哈希表查找.二叉排序树查找. 排序算法着重掌握:冒泡排序.插入排序.归并排序.快速排序. 顺序查找 算法说明 顺序查找适合于存储结构为顺序存储或链接存储的线 ...