Q:
 

I have a PostgreSQL table. select * is very slow whereas select id is nice and quick. I think it may be that the size of the row is very large and it's taking a while to transport, or it may be some other factor.

I need all of the fields (or nearly all of them), so selecting just a subset isn't a quick fix. Selecting the fields that I want is still slow.

Here's my table schema minus the names:

integer                  | not null default nextval('core_page_id_seq'::regclass)
character varying(255) | not null
character varying(64) | not null
text | default '{}'::text
character varying(255) |
integer | not null default 0
text | default '{}'::text
text |
timestamp with time zone |
integer |
timestamp with time zone |
integer |

The size of the text field may be any size. But still, no more than a few kilobytes in the worst case.

Questions

  1. Is there anything about this that screams 'crazy inefficient'?
  2. Is there a way to measure page size at the Postgres command-line to help me debug this?
 
 
A1:
 
 
 
 

Q2: way to measure page size

PostgreSQL provides a number of Database Object Size Functions, you can use. I packed the most interesting ones in this query and added some Statistics Access Functions.

This is going to demonstrate that the various methods to measure the "size of a row" can lead to very different results. It all depends what you want to measure exactly.

Replace public.tbl with your (optionally schema-qualified) table name to get a compact view of collected statistics about the size of your rows.

WITH x AS (
SELECT count(*) AS ct
, sum(length(t::text)) AS txt_len -- length in characters
, 'public.tbl'::regclass AS tbl -- provide (qualified) table name here
FROM public.tbl t -- ... and here
)
, y AS (
SELECT ARRAY [pg_relation_size(tbl)
, pg_relation_size(tbl, 'vm')
, pg_relation_size(tbl, 'fsm')
, pg_table_size(tbl)
, pg_indexes_size(tbl)
, pg_total_relation_size(tbl)
, txt_len
] AS val
, ARRAY ['core_relation_size'
, 'visibility_map'
, 'free_space_map'
, 'table_size_incl_toast'
, 'indexes_size'
, 'total_size_incl_toast_and_indexes'
, 'live_rows_in_text_representation'
] AS name
FROM x
)
SELECT unnest(name) AS what
, unnest(val) AS "bytes/ct"
, pg_size_pretty(unnest(val)) AS bytes_pretty
, unnest(val) / ct AS bytes_per_row
FROM x, y UNION ALL SELECT '------------------------------', NULL, NULL, NULL
UNION ALL SELECT 'row_count', ct, NULL, NULL FROM x
UNION ALL SELECT 'live_tuples', pg_stat_get_live_tuples(tbl), NULL, NULL FROM x
UNION ALL SELECT 'dead_tuples', pg_stat_get_dead_tuples(tbl), NULL, NULL FROM x;

I only pack the values in arrays and unnest() again, so I don't have to spell out calculations for every single row repeatedly.

General row count statistics are appended at the end with unconventional SQL-foo to get everything in one query. You could wrap it into a plpgsql function for repeated use, hand in the table name as parameter and use EXECUTE.

Result:

              what                | bytes/ct | bytes_pretty | bytes_per_row
-----------------------------------+----------+--------------+---------------
core_relation_size | 44138496 | 42 MB | 91
visibility_map | 0 | 0 bytes | 0
free_space_map | 32768 | 32 kB | 0
table_size_incl_toast | 44179456 | 42 MB | 91
indexes_size | 33128448 | 32 MB | 68
total_size_incl_toast_and_indexes | 77307904 | 74 MB | 159
live_rows_in_text_representation | 29987360 | 29 MB | 62
------------------------------ | | |
row_count | 483424 | |
live_tuples | 483424 | |
dead_tuples | 2677 | |

The additional module pgstattuple provides more useful functions.

Update for Postgres 9.3+

We could use the new form of unnest() in pg 9.4 taking multiple parameters to unnest arrays in parallel.
But using LATERAL and a VALUES expression, this can be simplified further. Plus some other improvements:

SELECT l.what, l.nr AS "bytes/ct"
, CASE WHEN is_size THEN pg_size_pretty(nr) END AS bytes_pretty
, CASE WHEN is_size THEN nr / x.ct END AS bytes_per_row
FROM (
SELECT min(tableoid) AS tbl -- same as 'public.tbl'::regclass::oid
, count(*) AS ct
, sum(length(t::text)) AS txt_len -- length in characters
FROM public.tbl t -- provide table name *once*
) x
, LATERAL (
VALUES
(true , 'core_relation_size' , pg_relation_size(tbl))
, (true , 'visibility_map' , pg_relation_size(tbl, 'vm'))
, (true , 'free_space_map' , pg_relation_size(tbl, 'fsm'))
, (true , 'table_size_incl_toast' , pg_table_size(tbl))
, (true , 'indexes_size' , pg_indexes_size(tbl))
, (true , 'total_size_incl_toast_and_indexes', pg_total_relation_size(tbl))
, (true , 'live_rows_in_text_representation' , txt_len)
, (false, '------------------------------' , NULL)
, (false, 'row_count' , ct)
, (false, 'live_tuples' , pg_stat_get_live_tuples(tbl))
, (false, 'dead_tuples' , pg_stat_get_dead_tuples(tbl))
) l(is_size, what, nr);

Same result.

Q1: anything inefficient?

You could optimize column order to save some bytes per row, currently wasted to alignment padding:

integer                  | not null default nextval('core_page_id_seq'::regclass)
integer | not null default 0
character varying(255) | not null
character varying(64) | not null
text | default '{}'::text
character varying(255) |
text | default '{}'::text
text |
timestamp with time zone |
timestamp with time zone |
integer |
integer |

This saves between 8 and 18 bytes per row. I call it "column tetris". Details:

Also consider:

A2:
 
An approximation of the size of a row, including the TOAST'ed contents, is easy to get by querying the length of the TEXT representation of the entire row:
SELECT octet_length(t.*::text) FROM tablename AS t WHERE primary_key=:value;

This is a close approximation to the number of bytes that will be retrieved client-side when executing:

SELECT * FROM tablename WHERE primary_key=:value;

...assuming that the caller of the query is requesting results in text format, which is what most programs do (binary format is possible, but it's not worth the trouble in most cases).

The same technique could be applied to locate the N "biggest-in-text" rows of tablename:

SELECT primary_key, octet_length(t.*::text) FROM tablename AS t
ORDER BY 2 DESC LIMIT :N;
注:
1、octet_length(string)函数表示的是Number of bytes in binary string,而length则表示的字符个数。
 
 
 
 
 

Measure the size of a PostgreSQL table row的更多相关文章

  1. Limits on Table Column Count and Row Size Databases and Tables Table Size 最大行数

    MySQL :: MySQL 8.0 Reference Manual :: C.10.4 Limits on Table Column Count and Row Size https://dev. ...

  2. SSMS查看表行数以及使用空间 How to show table row count and space used in SSMS - SSMS Tutorials

    原文:How to show table row count and space used in SSMS - SSMS Tutorials There's a quick and convenien ...

  3. Mysql [Err] 1118 - Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535.

    对于越来越多的数据,数据库的容量越来越大,压缩也就越来越常见了.在我的实际工作中进行过多次压缩工作,也遇到多次问题,在此和大家分享一下. 首先,我们先说说怎么使用innodb的压缩. 第一,mysql ...

  4. PostgreSQL Table Partitioning<转>

    原创文章,转载请务必将下面这段话置于文章开头处(保留超链接).本文转发自Jason’s Blog,原文链接 http://www.jasongj.com/2015/12/13/SQL3_partiti ...

  5. Fill Table Row(it’s an IQ test question)

    Here is a table include the 2 rows. And the cells in the first row have been filled with 0~4. Now yo ...

  6. datable中table.row() not a funtion 解决方法

    解决办法一: 改为.DataTable({ (初始化时候) 解决办法二: 或者改为var data = myTable.api().row( this ).data();(获取值的时候)

  7. diff函数的实现——LCS的变种问题

    昨天去去哪儿笔试,碰到了一个我们一直很熟悉的命令(diff——ubuntu下面),可以比较字符串,即根据最长公共子串问题,如果A中有B中没有的字符输出形式如下(-ch),如果A中没有,B中有可以输出如 ...

  8. (算法)AA制

    题目: A.B.C.D四个人去吃大餐,吃饭去说好,付钱时AA制,但最后结账时,因为4个人带的钱不一样多,最后A付了112元,B付了86元,C付了10元,D没带钱,所以没有付: 但AA制需要平摊餐费,所 ...

  9. mysql 报Row size too large 65535 原因与解决方法

    报错信息:Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535 ...

随机推荐

  1. 十几行代码带你用Python批量实现txt转xls,方便快捷

    前天看到后台有一兄弟发消息说目前自己有很多txt 文件,领导要转成xls文件,问用python怎么实现,我在后台简单回复了下,其实完成这个需求方法有很多,因为具体的txt格式不清楚,当然如果是有明确分 ...

  2. POJ 3258(二分求最大化最小值)

    题目链接:http://poj.org/problem?id=3258 题目大意是求删除哪M块石头之后似的石头之间的最短距离最大. 这道题目感觉大致代码写起来不算困难,难点在于边界处理上.我思考边界思 ...

  3. DataTable转List<T>集合

    #region DataTable转List集合 +static IList<T> DataTableToList<T>(DataTable dt) where T : cla ...

  4. lintcode-65-两个排序数组的中位数

    65-两个排序数组的中位数 两个排序的数组A和B分别含有m和n个数,找到两个排序数组的中位数,要求时间复杂度应为O(log (m+n)). 样例 给出数组A = [1,2,3,4,5,6] B = [ ...

  5. git向github提交时不输入账号密码

    缘由:每次向github提交代码时都要输入用户名密码,太麻烦了,影响效率 解决方案: 方案一: 在你的用户目录下新建一个文本文件.git-credentials Windows:C:/Users/us ...

  6. 4th 课堂SCRM会议旁观记录

    项目名称:基于C#的连连看设计 小组名称:待定 小组成员:张政.张金生.武志远.李权 Master:张政 项目已完成部分: 现阶段已经实现了一定的功能,可以运行使用,进行第一关的游戏. 今天计划要完成 ...

  7. web.py 中文模版报错

    1. 作为模板的html文件,必须是utf-8编码; 2. html文件内容中的charset必须为utf-8,也就是必须包含 <meta http-equiv="Content-Ty ...

  8. hdu-题目:1058 Humble Numbes

    http://acm.hdu.edu.cn/showproblem.php?pid=1058 Problem Description A number whose only prime factors ...

  9. Spring Autowired原理

    今天来整理一下Spring的自动装配 autowire一节,在这里我们要解决以下问题: 什么是自动装配? 自动装配的意义? 自动装配有几种类型? 如何启用自动装配? 自动装配将引发的问题? 一.什么是 ...

  10. 如何更好的使用JAVA线程池

    这篇文章结合Doug Lea大神在JDK1.5提供的JCU包,分别从线程池大小参数的设置.工作线程的创建.空闲线程的回收.阻塞队列的使用.任务拒绝策略.线程池Hook等方面来了解线程池的使用,其中涉及 ...