Following a Select Statement Through Postgres Internals
This is the third of a series of posts based on a presentation I did at the Barcelona Ruby Conference called “20,000 Leagues Under ActiveRecord.” (posts: one two and video).
Preparing for this presentation over the Summer, I decided to read through parts of the PostgreSQL C source code. I executed a very simple select statement and watched what Postgres did with it using LLDB, a C debugger. How did Postgres understand my query? How did it actually find the data I was looking for?
This post is an informal journal of my trip through the guts of Postgres. I’ll describe the path I took and what I saw along the way. I’ll use a series of simple, conceptual diagrams to explain how Postgres executed my query. In case you understand C, I’ll also leave you a few landmarks and signposts you can look for if you ever decide to hack on Postgres internals.
In the end, the Postgres source code delighted me. It was clean, well documented and easy to follow. Find out for yourself how Postgres works internally by following me on a journey deep inside a tool you use everyday.
Finding Captain Nemo
Here’s the example query from the first half of my presentation; we’ll follow Postgres as it searches for Captain Nemo:


Professor Aronnax and Captain Nemo
plot the course of the Nautilus.
Finding a single name in string column like this should be straightforward, shouldn’t it? We’ll hold tightly onto this select statement while we explore Postgres internals, like a rope deep sea divers use to find their way back to the surface.
The Big Picture
What does Postgres do with this SQL string? How does it understand what we meant? How does it know what data we are looking for?
Postgres processes each SQL command we send it using a four step process.

In the first step, Postgres parses our SQL statement and converts it into a series of C memory structures, a parse tree. Next Postgres analyzes and rewrites our query, optimizing and simplifying it using a series of complex algorithms. After that, Postgres generates a plan for finding our data. Like an obsessive compulsive person who won’t leave home without every suitcase packed perfectly, Postgres doesn’t run our query until it has a plan. Finally, Postgres actually executes our query. In this presentation I’ll briefly touch on the first three topics, and then focus more on the last step: Execute.
The C function inside of Postgres that implements this 4 step process is called exec_simple_query. You can find a link to it below, along with an LLDB backtrace which gives some context about exactly when and how Postgres callsexec_simple_query.
Parse
How does Postgres understand the SQL string we sent it? How does it make sense of the SQL keywords and expressions in our select statement? Through a process called parsing, Postgres converts our SQL string into an internal data structure it understands, the parse tree.
It turns out that Postgres uses the same parsing technology that Ruby does, a parser generator called Bison. Bison runs during the Postgres C build process and generates parser code based on a series of grammar rules. The generated parser code is what runs inside of Postgres when we send it SQL commands. Each grammar rule is triggered when the generated parser finds a corresponding pattern or syntax in the SQL string, and inserts a new C memory structure into the parse tree data structure.
I won’t take the time today to explain how parsing algorithms work in detail. If you’re interested in that sort of thing, I’d suggest taking a look at my book Ruby Under a Microscope. In Chapter One I go through a detailed example of the LALR parse algorithm used by Bison and Ruby. Postgres parses SQL statements in exactly the same way.
Using LLDB and enabling some C logging code, I observed the Postgres parser produce this parse tree for our Captain Nemo query:

At the top is a node representing the entire SQL statement, and below that are child nodes or branches that represent the different portions of the SQL statement syntax: the target list (a list of columns), the from clause (a list of tables), the where clause, the sort order and a limit count.
If you want to learn more about how Postgres parses SQL statements, follow the flow of control down fromexec_simple_query through another C function called pg_parse_query.
As you can see there are many helpful and detailed comments in the Postgres source code that not only explain what is happening but also point out important design decisions.
All That Hard Work For Nothing
The parse tree above should look familiar – it’s almost precisely the same as the abstract syntax tree (AST) we saw ActiveRecord create earlier. Recall from the first half of the presentation ActiveRecord generated our Captain Nemo select statement when we executed this Ruby query:

We saw that ActiveRecord internally created an AST when we called methods such as where and first. Later (see thesecond post), we watched as the Arel gem converted the AST into our example select statement using an algorithm based on the visitor pattern.
Thinking about this, it’s ironic that the first thing Postgres does with your SQL statement is convert it from a string back into an AST. Postgres’s parse process reverses everything ActiveRecord did earlier; all of that hard work the Arel gem did was for nothing! The only reason for creating the SQL string at all was to communicate with Postgres over a network connection. Once Postgres has the string, it converts it back into an AST, which is a much more convenient and useful way of representing queries.
Learning this you might ask: Is there a better way? Is there some way of conceptually specifying the data we want to Postgres without writing a SQL statement? Without learning the complex SQL language or paying the performance overhead of using ActiveRecord and Arel? It seems like a waste of time to go to such lengths to generate a SQL string from an AST, just to convert it back to an AST again. Maybe we should be using a NoSQL database solution instead?
Of course, the AST Postgres uses is much different from the AST used by ActiveRecord. ActiveRecord’s AST was comprised of Ruby objects, while Postgres’s AST is formed of a series of C memory structures. Same idea but very different implementations.
Analyze and Rewrite
Once Postgres has generated a parse tree, it then converts it into a another tree structure using a different set of nodes. This is known as the query tree. Returning to the exec_simple_query C function, you can see it next calls another C functionpg_analyze_and_rewrite.
Waving my hands a bit and glossing over many important details, the analyze and rewrite process applies a series of sophisticated algorithms and heuristics to try to optimize and simplify your SQL statement. If you had executed a complex select statement with sub-selects and multiple inner and outer joins, then there is a lot of room for optimization. It’s quite possible that Postgres could reduce the number of sub-select clauses or joins to produce a simpler query that runs faster.
For our simple select statement, here’s the query tree that pg_analyze_and_rewrite produces:

I don’t pretend to understand the detailed algorithms behind pg_analyze_and_rewrite. I simply observed that for our example the query tree largely resembled the parse tree. This means the select statement was so straightforward Postgres wasn’t able to simplify it further.
Plan
The last step Postgres takes before starting to execute our query is to create a plan. This involves generating a third tree of nodes that form a list of instructions for Postgres to follow. Here’s the plan tree for our select statement.

Imagine that each node in the plan tree is a machine or worker of some kind. The plan tree resembles a pipeline of data or a conveyor belt in a factory. In my simple example there is only one branch in the tree. Each node in the plan tree takes some the output data from the node below, processes it, and returns results as input to the node above. We’ll follow Postgres as it executes the plan in the next section.
The C function that starts the query planning process is called pg_plan_queries.
Note the startup_cost and total_cost values in each plan node. Postgres uses these values to estimate how long the plan will take to complete. You don’t have to use a C debugger to see the execution plan for your query. Just prepend the SQLEXPLAIN command to your query, like this:

This is a powerful way to understand what Postgres is doing internally with one of your queries, and why it might be slow or inefficient – despite the sophisticated planning algorithms in pg_plan_queries.
Executing a Limit Plan Node
By now, Postgres has parsed your SQL statement and converted it back into an AST. Then it optimized and rewrote your query, possibly in a simpler way. Third, Postgres wrote a plan which it will follow to find and return the data you are looking for. Finally it’s time for Postgres to actually execute your query. How does it do this? It follows the plan, of course!
Let’s start at the top of the plan tree and move down. Skipping the root node, the first worker that Postgres uses for our Captain Nemo query is called Limit. The Limit node, as you might guess, implements the LIMIT SQL command, which limits the result set to the specified number of records. The same plan node also implements the OFFSET command, which starts the result set window at the specified row.

The first time Postgres calls the Limit node, it calculates what the limit and offset values should be, because they might be set to the result of some dynamic calculation. In our example, offset is 0 and limit is 1.
Next, the Limit plan node repeatedly calls the subplan, in our case Sort, counting until it reaches the offset value:

In our example the offset value is zero, so this loop will load the first data value and stop iterating. Then Postgres returns the last data value loaded from the subplan to the calling or upper plan. For us, this will be that first value from the subplan.
Finally when Postgres continues to call the Limit node, it will pass the data values through from the subplan one at a time:

In our example, because the limit value is 1 Limit will immediately return NULL indicating to the upper plan there is no more data available.
Postgres implements the Limit node using code in a file called nodeLimit.c
You can see the Postgres source code uses words such as tuple (a set a values, one from each column) and subplan. The subplan in this example is the Sort node, which appears below Limit in the plan.
Executing a Sort Plan Node
Where do the data values Limit filters come from? From the Sort plan node that appears under Limit in the plan tree. Sort loads data values from its subplan and returns them to its calling plan, Limit. Here’s what Sort does when the Limit node calls it for the first time, to get the first data value:

You can see that Sort functions very differently from Limit. It immediately loads all of the available data from the subplan into a buffer, before returning anything. Then it sorts the buffer using the Quicksort algorithm, and finally returns the first sorted value.
For the second and subsequent calls, Sort simply returns additional values from the sorted buffer, and never needs to call the subplan again:

The Sort plan node is implemented by a C function called ExecSort:
Executing a SeqScan Plan Node
Where does ExecSort get its values? From its subplan, or the SeqScan node that appears at the bottom of the plan tree. SeqScan stands for sequence scan, which means to look through the values in a table, returning values that match a given filter. To understand how the scan works with our filter, let’s step through an imaginary users table filled with fake names, looking for Captain Nemo.

Postgres starts at the first record in a table (known as a relation in the Postgres source code) and executes the boolean expression from the plan tree. In simple terms, Postgres asks the question: “Is this Captain Nemo?” Because Laurianne Goodwin is not Captain Nemo, Postgres steps down to the next record.

No, Candace is also not Captain Nemo. Postgres continues:

… and eventually finds Captain Nemo!

Postgres implements the SeqScan node using a C function called ExecSeqScan.
What Are We Doing Wrong?
Now we’re done! We’ve followed a simple select statement all the way through the guts of Postgres, and have seen how it was parsed, rewritten, planned and finally executed. After executing many thousands of lines of C code, Postgres has found the data we are looking for! Now all Postgres has to do is return the Captain Nemo string back to our Rails application and ActiveRecord can create a Ruby object. We can finally return to the surface of our application.
But Postgres doesn’t stop! Instead of simply returning, Postgres continues to scan through the users table, even though we’ve already found Captain Nemo:


While returning from the South Pole, the air
supply inside the Nautilus began to run out.
What’s going on here? Why is Postgres wasting its time, continuing to search even though it’s already found the data we’re looking for?
The answer lies farther up the plan tree in the Sort node. Recall that in order to sort all of the users, ExecSort first loads all of the values into a buffer, by calling the subplan repeatedly until there are no values left. That means that ExecSeqScan will continue to scan to the end of the table, until it has all of the matching users. If our users table contained thousands or even millions of records (imagine we work at Facebook or Twitter), ExecSeqScan will have to loop over every single user record and execute the string comparison for each one. This is obviously inefficient and slow, and will get slower as more and more user records are added.
If we have only one Captain Nemo record, then ExecSort will “sort” just that single matching record, and ExecLimit will pass that single record through its offset/limit filter… but only after ExecSeqScan has iterated over all of the names.
Next Time
How do we fix this problem? What should we do if our SQL queries on the users table take more and more time to execute? The answer is simple: we create an index.
In the next and final post in this series we’ll learn how to create a Postgres index and to avoid the use of ExecSeqScan. More importantly, I’ll show you what a Postgres index looks like: howit works and why it speeds up queries like this one.
注:非常赞的文章,讲解了pg中执行select操作时优化器的所做。
关于lldb具体的操作步骤参考:
https://gist.github.com/patshaughnessy/70519495343412504686
参考:
http://patshaughnessy.net/2014/10/13/following-a-select-statement-through-postgres-internals
Following a Select Statement Through Postgres Internals的更多相关文章
- Select Statement Syntax [AX 2012]
Applies To: Microsoft Dynamics AX 2012 R3, Microsoft Dynamics AX 2012 R2, Microsoft Dynamics AX 2012 ...
- SQL Fundamentals: Basic SELECT statement基本的select语句(控制操作的现实列)(FROM-SELECT)
SQL Fundamentals || Oracle SQL语言 Capabilities of the SELECT Statement(SELECT语句的功能) Data retrieval fr ...
- A SELECT statement that assigns a value to a variable must ... (向变量赋值的 SELECT 语句不能与数据检索操作结合使用 )
A SELECT statement that assigns a value to a variable must ... (向变量赋值的 SELECT 语句不能与数据检索操作结合使用 ) 总结一句 ...
- Cannot resolve collation conflict between "Chinese_Taiwan_Stroke_CI_AS" and "Chinese_PRC_CI_AS" in UNION ALL operator occurring in SELECT statement column 1.
Cannot resolve collation conflict between . 解决方案: COLLATE Chinese_PRC_CI_AS 例子: SELECT A.Name FROM A ...
- Part 5 Select statement in sql server
Select specific or all columns select * from 表名 select * from Student select 列名,列名... from 表名 select ...
- pg 资料大全1
https://github.com/ty4z2008/Qix/blob/master/pg.md?from=timeline&isappinstalled=0 PostgreSQL(数据库) ...
- using the library to generate a dynamic SELECT or DELETE statement mysqlbaits xml配置文件 与 sql构造器 对比
https://github.com/mybatis/mybatis-dynamic-sql MyBatis Dynamic SQL What Is This? This library is ...
- Discovering the Computer Science Behind Postgres Indexes
This is the last in a series of Postgres posts that Pat Shaughnessy wrote based on his presentation ...
- MySQL and Postgres command equivalents (mysql vs psql)
MySQL and Postgres command equivalents (mysql vs psql) 博客分类: Database From: http://blog.endpoint.c ...
随机推荐
- JDBC连接sql server数据库及其它
JDBC连接sql server数据库的步骤如下: 1.加载JDBC驱动程序: 在连接数据库之前,首先要加载想要连接的数据库的驱动到JVM(Java虚拟机), 这通过java.lang.Class类的 ...
- Android Studio - HPROF文件查看和分析工具
Android Studio 翻译的官方文章 原文链接 当你在Android Studio中使用Android Monitor里的Memory Monitor工具监视内存使用情况时,可以把Java堆快 ...
- div水平居中与垂直居中的方法【摘自美浩工作室官方博客 】
大家往往在写页面中会遇到不固定宽和高的div如果水平和垂直都居中呢?在写css的时候经常遇到的一个问题,当div没有固定的宽度或者高度的时候,如何才能让div水平或者垂直居中显示.如果div有固定宽度 ...
- 读<jquery 权威指南>[2]-事件
1. 事件冒泡 阻止事件冒泡的两种方式: event.stopPropagation(); return false ; 2. 绑定事件——bind(type,[data],function) ty ...
- iframe自适应方法
ifram高度自适应,两种方法 方法一:父页面获取子页面高度,改变父页面高度以下代码加到父页面 <script language="javascript"> //获取i ...
- .net中大数据的处理
前提:sqlserver数据库 问题:需要从数据量达千万级或是更大的数据库中找出一条需要的数据. 1,先分析大数据的结构,以及数据的流向.以及此过程中那些地方会有瓶颈. 2,针对瓶颈制定方案: 1.1 ...
- python讲一个列表写入excel表中
连接为http://blog.csdn.net/a491057947/article/details/47614263 http://www.crifan.com/export_data_to_exc ...
- Halcon 10.0:Sample 分割边缘拟合圆Circles.hdev
处理流程:快速二值化(区域)->获取区域边缘->截取边缘->膨胀边缘区域(定位)->定位区域进行边缘检测->边缘分割:线和圆->选择属性为圆的弧->拟合圆 * ...
- SRM 657 DIV2
-------一直想打SRM,但是感觉Topcoder用起来太麻烦了.题目还是英文,不过没什么事干还是来打一打好了.但是刚注册的号只能打DIV2,反正我这么弱也只适合DIV2了.. T1: 题目大意: ...
- jquery事件代理
在jQuery中,事件代理是指:把事件绑定到父级元素,然后等待事件通过DOM冒泡到该元素时再执行. 在事件侦听过程中有两种触发事件的方式:事件捕获和事件冒泡.事件冒泡更快,效率更高. 事件捕获:事件在 ...













