Redundant data in update statements

Hibernate generates UPDATE statements, which include all columns, regardless of whether I'm changing the value in that columns, eg:

tx.begin();

Item i = em.find(Item.class, 12345);

i.setA("a-value");

tx.commit();

issues this UPDATE statement:

update Item set A = $1, B = $2, C = $3, D = $4 where id = $5

so columns B, C, D are updated, while I didn't change them.

Say, Items are updated frequently and all columns are indexed. The question is: does it make sense to optimize the Hibernate part to something like this:

tx.begin();

em.createQuery("update Item i set i.a = :a where i.id = :id")

    .setParameter("a", "a-value")

    .setParameter("id", 12345)

    .executeUpdate();

tx.commit();

What confuses me most is that the EXPLAIN plans of the 'unoptimized' and the 'optimized' query version are identical!

Due to PostgreSQL MVCC, an UPDATE is effectively a DELETE plus an INSERT. (To be precise, the "deleted" row is just invisible to any transaction starting after the delete and vacuumed later.) Therefore, on the database side, including index manipulation, there is in effect no difference between the two statements. It increases network traffic a bit (depending on your data) and needs a bit of parsing.

_{I studied HOT updates after araqnid's input and ran some tests. Updates on columns that don't actually change the value make no difference whatsoever as far as HOT updates are concerned. My answer holds. See details below.}

However, if you use per-column triggers (introduced with v9.0), this my have undesired side effects!

I quote the manual on triggers:

... a command such as UPDATE ... SET x = x ... will fire a trigger on column x, even though the column's value did not change.

Abstraction layers are for convenience. They are useful for SQL-illiterate developers or if the application needs to be portable between different RDBMS. On the downside, they can butcher performance and introduce additional points of failure. I avoid them wherever possible.

Concerning HOT (Heap-only tuple) updates

Heap-Only Tuples were introduced with Postgres 8.3, with important improvements in 8.3.4 and 8.4.9.
The release notes for Postgres 8.3:

UPDATEs and DELETEs leave dead tuples behind, as do failed INSERTs. Previously only VACUUM could reclaim space taken by dead tuples. 
With HOT dead tuple space can be automatically reclaimed at the time of INSERT or UPDATE if no 
changes are made to indexed columns. This allows for more consistent performance. Also, HOT avoids adding duplicate index entries.

Emphasis mine. And "no changes" includes cases where columns are updated with the same value as they already hold. I actually tested that just now, as I wasn't sure.

You don't have to take my word for it. See for yourself, Postgres provides a couple of functions to check statistics. Run your UPDATE with and without all columns and check if it makes any difference.

-- Number of rows HOT-updated in table:

SELECT pg_stat_get_tuples_hot_updated('table_name'::regclass::oid)

-- Number of rows HOT-updated in table, in the current transaction:

SELECT pg_stat_get_xact_tuples_hot_updated('table_name'::regclass::oid)

Or use pgAdmin. Select your table and inspect the "Statistics" tab in the main window.

Be aware that HOT updates are only when there is room for the new tuple version on the same page. One simple way to force that condition is to test with a small table that holds only a few rows. Page size is typically 8k, so there must be free space on the page.

其中araqnid论证的过程如下：

create temp table t1(t1_id serial primary key, reference varchar(16) not null unique, value varchar(16) not null);

copy t1(reference, value) from stdin;

FOO    foo

BAR    bar

QUUX    quux

\.

create temp view t1_combined as

    select t1_id, reference, value, ctid, lp_flags, lp_off, case when t_ctid <> ctid then t_ctid end as t_ctid,

           t_xmin, xmin_visible, case when t_xmax::text <> '' then t_xmax end as t_xmax, xmax_visible,

           xmin_visible and (xmax_visible is null or not xmax_visible or t_locked <> '') as visible, t_hot_updated, t_heap_only

    from (select *,

                 t_xmin_valid and txid_visible_in_snapshot(t_xmin::text::bigint, txid_current_snapshot()) as xmin_visible,

                 t_xmax_valid and txid_visible_in_snapshot(t_xmax::text::bigint, txid_current_snapshot()) as xmax_visible

          from (select ('(' || 0 || ',' || lp || ')')::tid as ctid,

                       lp, lp_off, case lp_flags when 0 then 'UNUSED' when 1 then 'NORMAL' when 2 then 'REDIRECT' when 3 then 'DEAD' end as lp_flags,

                       lp_len, t_xmin, t_xmax, t_field3, t_ctid, (t_infomask&1)<>0 as t_hasnull, (t_infomask&2)<>0 as t_hasvarwidth,

                       (t_infomask&4)<>0 as t_hasexternal, (t_infomask&8)<>0 as t_hasoid, (t_infomask&32)<>0 as t_combocid,

                       case t_infomask & 192 when 64 then 'EXCL' when 128 then 'SHARE' when 0 then '' when 192 then 'INVALID' end as t_locked,

                       (t_infomask&256)<>0 as t_xmin_committed, (t_infomask&512)=0 as t_xmin_valid,

                       (t_infomask&1024)<>0 as t_xmax_committed, (t_infomask&2048)=0 as t_xmax_valid,

                       (t_infomask&4096)<>0 as t_xmax_is_multi, (t_infomask&8192)<>0 as t_updated,

                       (t_infomask&16384)<>0 as t_moved_off, (t_infomask&32768)<>0 as t_moved_in,

                       t_infomask2&2047 as t_natts, (t_infomask2&16384)<>0 as t_hot_updated,

                       (t_infomask2&32768)<>0 as t_heap_only,

                       t_hoff, t_bits, t_oid

                from heap_page_items(get_raw_page('t1', 0))) format_heap_page_items

         ) heap

         full outer join (select ctid, * from t1) t1 using (ctid);

create temp view t1_indices as

    select ctid, pkey_content.itemoffset as pkey_itemoffset, pkey_content.data as pkey_data, auxkey_content.itemoffset as auxkey_itemoffset, auxkey_content.data as auxkey_data

    from bt_page_items('t1_pkey', 1) pkey_content

         full outer join bt_page_items('t1_reference_key', 1) auxkey_content using (ctid);

\echo ********************************************************************************

\echo * Initial table

\echo

select * from t1_combined;

select * from t1_indices;

\echo ********************************************************************************

\echo * Update non-indexed column

\echo * - index entries untouched

\echo * - old tuple at ctid (0,1) has t_hot_updated set

\echo * - new tuple at ctid (0,4) has t_heap_only set

\echo * - t_ctid of (0,1) points to (0,4)

\echo

begin;

update t1 set value = 'mumble' where t1_id = 1;

end;

select * from t1_combined;

select * from t1_indices;

\echo ********************************************************************************

\echo * Update non-indexed column again

\echo * - tuple at ctid (0,4) now just points to ctid (0,5) and is redundant

\echo

begin;

update t1 set value = 'womble' where t1_id = 1;

end;

select * from t1_combined;

select * from t1_indices;

\echo ********************************************************************************

\echo * Vacuum table

\echo * - line pointer ctid (0,1) converted to REDIRECT since index entries still point to it

\echo * - redundant tuple at ctid (0,4) reclaimed for reuse

\echo

vacuum t1;

select * from t1_combined;

select * from t1_indices;

\echo ********************************************************************************

\echo * Update indexed column

\echo * - New index entries written for new tuple at ctid (0,4) which is now reused

\echo

update t1 set reference = 'WOMBLE' where t1_id = 1;

select * from t1_combined;

select * from t1_indices;

\echo ********************************************************************************

\echo * Update indexed column to contain same value

\echo * - even though indexed column is mentioned in update, this makes a heap-only change

\echo * - current version is now (0,6) but indices still indicate (0,4)

\echo

update t1 set reference = 'WOMBLE', value = 'womble2' where t1_id = 1;

select * from t1_combined;

select * from t1_indices;

\echo ********************************************************************************

\echo * Vacuum table

\echo * - ctid (0,1) now reclaimed, index entries pointing to it removed

\echo * - ctid (0,5) reclaimed too, it never had index entries pointing to it

\echo

vacuum t1;

select * from t1_combined;

select * from t1_indices;

执行结果可以根据脚本自测。在此不再列出。

注：

HOT中，即使是更新加有索引的一列，如果更新的数值不变，也不会产生新的index 记录的。

参考：https://stackoverflow.com/questions/7806058/redundant-data-in-update-statements/7806610#7806610

Redundant data in update statements的更多相关文章

Map Columns From Different Tables and Create Insert and Update Statements in Oracle Forms
This is one of my most needed tool to create Insert and Update statements using select or alias from ...
spring data jpa update
一:在controller 加上: @Controller @RequestMapping("/user") public class UserController { @Aut ...
[转]Creating an Entity Framework Data Model for an ASP.NET MVC Application (1 of 10)
本文转自:http://www.asp.net/mvc/overview/older-versions/getting-started-with-ef-5-using-mvc-4/creating-a ...
INSERT ... ON DUPLICATE KEY UPDATE Syntax
一 mybatis中返回自动生成的id 当有时我们插入一条数据时,由于id很可能是自动生成的,如果我们想要返回这条刚插入的id怎么办呢.在mysql数据中我们可以在insert下添加一个selectK ...
Data Types
原地址: Home / Database / Oracle Database Online Documentation 11g Release 2 (11.2) / Database Administ ...
Data Block Compression
The database can use table compression to eliminate duplicate values in a data block. This section d ...
How To Commit Just One Data Block Changes In Oracle Forms
You have an Oracle Form in which you have multiple data blocks and requirement is to commit just one ...
Indexing Sensor Data
In particular embodiments, a method includes, from an indexer in a sensor network, accessing a set o ...
INSERT ... ON DUPLICATE KEY UPDATE Syntax 专题
ON DUPLICATE KEY UPDATE :不用用于批量,除 insert into t1 select * from t2 on duplicated key update k1=v1,k2 ...

随机推荐

用Python爬下今日头条所有美女，美滋滋！
我们的学习爬虫的动力是什么? 有人可能会说:如果我学好了,我可以找一个高薪的工作. 有人可能会说:我学习编程希望能够为社会做贡献(手动滑稽) 有人可能会说:为了妹子! ..... 其实我们会发现妹 ...
VSCode打开已有vuejs项目
转载自 https://blog.csdn.net/yoryky/article/details/78290443 下载安装并配置VSCode 随便百度上搜个最新的VSCode安装好后,点击Ctrl ...
关于Amazon.com Seller 网络以及IP地址更换官方回答
Greetings from Amazon Seller Support, I understand your concern that there will be a change of IP ad ...
Thunder团队——选题展示
团队名称:Thunder 组长:王航成员:李传康.代秋彤.邹双黛.苗威.宋雨.胡佑蓉.杨梓瑞项目名称:爱阅app 视频展示: http://www.cnblogs.com/lick468/p/76 ...
int 和 Integer的区别
int是基本类型,默认值为0,int a=5;a只能用来计算,一般作为数值参数. Integer是引用类型,默认值为null, Integer b=5;b是一个对象,它可以有很多方法,一般做数值转换, ...
机器学习笔记（4）Logistic回归
模型介绍对于分类问题,其得到的结果值是离散的,所以通常情况下,不适合使用线性回归方法进行模拟. 所以提出Logistic回归模型. 其假设函数如下: \[ h_θ(x)=g(θ^Tx) \] 函数g ...
React Native 学习－组件说明和生命周期
组件的详细说明(Component Specifications) 当通过调用 React.createClass() 来创建组件的时候,你应该提供一个包含 render 方法的对象,并且也可以包含其 ...
(转)用MongoDB 实现优酷API 缓存
由于众所周知的原因, 邪恶的企业优酷于九月的某一天开始禁止第三方播放器加载视频API, 我不得不设置一个反向代理来绕过Flash 的跨域限制. 自此服务器压力激增, 导致用户体验大为劣化. 为了减少服 ...
PreparedStatement的execute误解
boolean execute() throws SQLException在此 PreparedStatement 对象中执行 SQL 语句,该语句可以是任何种类的 SQL 语句.一些特别处理过的语 ...
mysql中(存储)函数
(存储)函数: 函数,也说成“存储函数”,其实就是js或php中所说的函数! 唯一的区别: 这里的函数必须返回一个数据(值): 定义形式: 注意事项: 1, 在函数内部,可以有各种变量和流程控制的使用 ...

Redundant data in update statements

Concerning HOT (Heap-only tuple) updates

Redundant data in update statements的更多相关文章

随机推荐

热门专题