Measure the size of a PostgreSQL table row
I have a PostgreSQL table. select * is very slow whereas select id is nice and quick. I think it may be that the size of the row is very large and it's taking a while to transport, or it may be some other factor.
I need all of the fields (or nearly all of them), so selecting just a subset isn't a quick fix. Selecting the fields that I want is still slow.
Here's my table schema minus the names:
integer                  | not null default nextval('core_page_id_seq'::regclass)
character varying(255)   | not null
character varying(64)    | not null
text                     | default '{}'::text
character varying(255)   |
integer                  | not null default 0
text                     | default '{}'::text
text                     |
timestamp with time zone |
integer                  |
timestamp with time zone |
integer                  | 
The size of the text field may be any size. But still, no more than a few kilobytes in the worst case.
Questions
- Is there anything about this that screams 'crazy inefficient'?
 - Is there a way to measure page size at the Postgres command-line to help me debug this?
 
Q2: way to measure page size
PostgreSQL provides a number of Database Object Size Functions, you can use. I packed the most interesting ones in this query and added some Statistics Access Functions.
This is going to demonstrate that the various methods to measure the "size of a row" can lead to very different results. It all depends what you want to measure exactly.
Replace public.tbl with your (optionally schema-qualified) table name to get a compact view of collected statistics about the size of your rows.
WITH x AS (
SELECT count(*) AS ct
, sum(length(t::text)) AS txt_len -- length in characters
, 'public.tbl'::regclass AS tbl -- provide (qualified) table name here
FROM public.tbl t -- ... and here
)
, y AS (
SELECT ARRAY [pg_relation_size(tbl)
, pg_relation_size(tbl, 'vm')
, pg_relation_size(tbl, 'fsm')
, pg_table_size(tbl)
, pg_indexes_size(tbl)
, pg_total_relation_size(tbl)
, txt_len
] AS val
, ARRAY ['core_relation_size'
, 'visibility_map'
, 'free_space_map'
, 'table_size_incl_toast'
, 'indexes_size'
, 'total_size_incl_toast_and_indexes'
, 'live_rows_in_text_representation'
] AS name
FROM x
)
SELECT unnest(name) AS what
, unnest(val) AS "bytes/ct"
, pg_size_pretty(unnest(val)) AS bytes_pretty
, unnest(val) / ct AS bytes_per_row
FROM x, y UNION ALL SELECT '------------------------------', NULL, NULL, NULL
UNION ALL SELECT 'row_count', ct, NULL, NULL FROM x
UNION ALL SELECT 'live_tuples', pg_stat_get_live_tuples(tbl), NULL, NULL FROM x
UNION ALL SELECT 'dead_tuples', pg_stat_get_dead_tuples(tbl), NULL, NULL FROM x;
I only pack the values in arrays and unnest() again, so I don't have to spell out calculations for every single row repeatedly.
General row count statistics are appended at the end with unconventional SQL-foo to get everything in one query. You could wrap it into a plpgsql function for repeated use, hand in the table name as parameter and use EXECUTE.
Result:
what | bytes/ct | bytes_pretty | bytes_per_row
-----------------------------------+----------+--------------+---------------
core_relation_size | 44138496 | 42 MB | 91
visibility_map | 0 | 0 bytes | 0
free_space_map | 32768 | 32 kB | 0
table_size_incl_toast | 44179456 | 42 MB | 91
indexes_size | 33128448 | 32 MB | 68
total_size_incl_toast_and_indexes | 77307904 | 74 MB | 159
live_rows_in_text_representation | 29987360 | 29 MB | 62
------------------------------ | | |
row_count | 483424 | |
live_tuples | 483424 | |
dead_tuples | 2677 | |
The additional module pgstattuple provides more useful functions.
Update for Postgres 9.3+
We could use the new form of unnest() in pg 9.4 taking multiple parameters to unnest arrays in parallel.
But using LATERAL and a VALUES expression, this can be simplified further. Plus some other improvements:
SELECT l.what, l.nr AS "bytes/ct"
, CASE WHEN is_size THEN pg_size_pretty(nr) END AS bytes_pretty
, CASE WHEN is_size THEN nr / x.ct END AS bytes_per_row
FROM (
SELECT min(tableoid) AS tbl -- same as 'public.tbl'::regclass::oid
, count(*) AS ct
, sum(length(t::text)) AS txt_len -- length in characters
FROM public.tbl t -- provide table name *once*
) x
, LATERAL (
VALUES
(true , 'core_relation_size' , pg_relation_size(tbl))
, (true , 'visibility_map' , pg_relation_size(tbl, 'vm'))
, (true , 'free_space_map' , pg_relation_size(tbl, 'fsm'))
, (true , 'table_size_incl_toast' , pg_table_size(tbl))
, (true , 'indexes_size' , pg_indexes_size(tbl))
, (true , 'total_size_incl_toast_and_indexes', pg_total_relation_size(tbl))
, (true , 'live_rows_in_text_representation' , txt_len)
, (false, '------------------------------' , NULL)
, (false, 'row_count' , ct)
, (false, 'live_tuples' , pg_stat_get_live_tuples(tbl))
, (false, 'dead_tuples' , pg_stat_get_dead_tuples(tbl))
) l(is_size, what, nr);
Same result.
Q1: anything inefficient?
You could optimize column order to save some bytes per row, currently wasted to alignment padding:
integer                  | not null default nextval('core_page_id_seq'::regclass)
integer                  | not null default 0
character varying(255)   | not null
character varying(64)    | not null
text                     | default '{}'::text
character varying(255)   |
text                     | default '{}'::text
text                     |
timestamp with time zone |
timestamp with time zone |
integer                  |
integer                  |
This saves between 8 and 18 bytes per row. I call it "column tetris". Details:
Also consider:
SELECT octet_length(t.*::text) FROM tablename AS t WHERE primary_key=:value;
This is a close approximation to the number of bytes that will be retrieved client-side when executing:
SELECT * FROM tablename WHERE primary_key=:value;
...assuming that the caller of the query is requesting results in text format, which is what most programs do (binary format is possible, but it's not worth the trouble in most cases).
The same technique could be applied to locate the N "biggest-in-text" rows of tablename:
SELECT primary_key, octet_length(t.*::text) FROM tablename AS t
ORDER BY 2 DESC LIMIT :N;
string)函数表示的是Number of bytes in binary string,而length则表示的字符个数。Measure the size of a PostgreSQL table row的更多相关文章
- Limits on Table Column Count and Row Size  Databases and Tables  Table Size  最大行数
		
MySQL :: MySQL 8.0 Reference Manual :: C.10.4 Limits on Table Column Count and Row Size https://dev. ...
 - SSMS查看表行数以及使用空间 How to show table row count and space used in SSMS - SSMS Tutorials
		
原文:How to show table row count and space used in SSMS - SSMS Tutorials There's a quick and convenien ...
 - Mysql [Err] 1118 - Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535.
		
对于越来越多的数据,数据库的容量越来越大,压缩也就越来越常见了.在我的实际工作中进行过多次压缩工作,也遇到多次问题,在此和大家分享一下. 首先,我们先说说怎么使用innodb的压缩. 第一,mysql ...
 - PostgreSQL Table Partitioning<转>
		
原创文章,转载请务必将下面这段话置于文章开头处(保留超链接).本文转发自Jason’s Blog,原文链接 http://www.jasongj.com/2015/12/13/SQL3_partiti ...
 - Fill Table Row(it’s an IQ test question)
		
Here is a table include the 2 rows. And the cells in the first row have been filled with 0~4. Now yo ...
 - datable中table.row() not a funtion 解决方法
		
解决办法一: 改为.DataTable({ (初始化时候) 解决办法二: 或者改为var data = myTable.api().row( this ).data();(获取值的时候)
 - diff函数的实现——LCS的变种问题
		
昨天去去哪儿笔试,碰到了一个我们一直很熟悉的命令(diff——ubuntu下面),可以比较字符串,即根据最长公共子串问题,如果A中有B中没有的字符输出形式如下(-ch),如果A中没有,B中有可以输出如 ...
 - (算法)AA制
		
题目: A.B.C.D四个人去吃大餐,吃饭去说好,付钱时AA制,但最后结账时,因为4个人带的钱不一样多,最后A付了112元,B付了86元,C付了10元,D没带钱,所以没有付: 但AA制需要平摊餐费,所 ...
 - mysql 报Row size too large  65535 原因与解决方法
		
报错信息:Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535 ...
 
随机推荐
- Android 不同分辨率下调整界面
			
Android Settings中有修改Disaply size的界面,通过修改Display size,能够修改屏幕分辨率. 由于修改了屏幕分辨率,有可能导致同一界面在不同的分辨率下显示出错(内容显 ...
 - leetcode- 将有序数组转换为二叉搜索树(java)
			
将一个按照升序排列的有序数组,转换为一棵高度平衡二叉搜索树. 本题中,一个高度平衡二叉树是指一个二叉树每个节点 的左右两个子树的高度差的绝对值不超过 1. 示例: 给定有序数组: [-10,-3,0, ...
 - Viper--方便好用的Golang 配置库
			
前言 本文主要是为读者介绍一个轻便好用的Golang配置库viper 正文 viper 的功能 viper 支持以下功能: 1. 支持Yaml.Json. TOML.HCL 等格式的配置 ...
 - POJ 3579 Median 二分加判断
			
Median Time Limit: 1000MS Memory Limit: 65536K Total Submissions: 12453 Accepted: 4357 Descripti ...
 - Python基础灬高阶函数(lambda,filter,map,reduce,zip)
			
高阶函数 lambda函数 关键字lambda表示匿名函数,当我们在传入函数时,有些时候,不需要显式地定义函数,直接传入匿名函数更方便. lambda函数省略函数名,冒号前为参数,冒号后函数体. # ...
 - 剑指 Offer——和为 S 的两个数字
			
1. 题目 2. 解答 由于数组是已经排好序的,我们可以定义两个指针,第一个指针指向第一个元素,第二个指针指向最后一个元素,然后求出这两个元素的和,与目标和进行比较.若小于目标和,第一个指针向前移动: ...
 - “Hello World!”团队第六周的第五次会议
			
今天是我们团队“Hello World!”团队第六周召开的第五次会议.博客内容: 一.会议时间 二.会议地点 三.会议成员 四.会议内容 五.todo list 六.会议照片 七.燃尽图 八.代码 一 ...
 - 我是IT小小鸟读书笔记
			
阅读了我是IT小小鸟后发现,自己开发程序是真的很苦难的,在现在这个空对空的时期,我们学习到大部分的全都是理论知识,而没有真正的去进行实践.没有经过实践,我们在程序开发过程中也就无法发现自身的困难. 在 ...
 - sql数据库表容量
			
标题:SQL Server 的最大容量规范 数据库的文件大小,文件数量都有限制. 表的大小也有限制,如果表过大,查询效率就会下降,考虑对数据进行分割,对历史数据进行独立存储.
 - 一次WebSphere性能问题诊断过程
			
一次接到用户电话,说某个应用在并发量稍大的情况下就会出现响应时间陡然增大,同时管理控制台的响应时间也很慢,几乎无法进行正常工作. 赶到现场后,查看平台版本为Webshpere6.0.2.29,操作系统 ...