This issue's Ask Tom column is a little different from the typical column. I receive many questions about how to perform top- N and
pagination queries in Oracle Database, so I decided to provide an excerpt from the book Effective Oracle by Design (Oracle
Press, 2003) in hopes of answering a lot of these questions with this one column. Note that the content here has been modified from the original to fit the space and format.

Limiting Result Sets

ROWNUM is a magic column in Oracle Database that gets many people into trouble. When you learn what it is and how it works, however, it can be very useful. I use it for two main things:

  • To perform top- N processing. This is similar to using the LIMIT clause, available in some other databases.

  • To paginate through a query, typically in a stateless environment such as the Web. I use this technique on the asktom.oracle.comWeb
    site.

I'll take a look at each of these uses after I review how ROWNUM works.

How ROWNUM Works

ROWNUM is a pseudocolumn (not a real column) that is available in a query. ROWNUM will be assigned the numbers 1, 2, 3, 4, ... N , where N is
the number of rows in the set ROWNUM is used with. A ROWNUM value is not assigned permanently to a row (this is a common misconception). A row in a table does not have a number; you cannot ask for row 5 from a table—there is no such thing.

Also confusing to many people is when a ROWNUM value is actually assigned. A ROWNUM value is assigned to a row after it passes the predicate phase of the query but before the query does any sorting or aggregation. Also, a ROWNUM value
is incremented only after it is assigned, which is why the following query will never return a row:

select *
from t
where ROWNUM > 1;

Because ROWNUM > 1 is not true for the first row, ROWNUM does not advance to 2. Hence, no ROWNUM value ever gets to be greater than 1. Consider a query with this structure:

select ..., ROWNUM
from t
where <where clause>
group by <columns>
having <having clause>
order by <columns>;

Think of it as being processed in this order:

1. The FROM/WHERE clause goes first. 

2. ROWNUM is assigned and incremented to each output row from the FROM/WHERE clause. 

3. SELECT is applied. 

4. GROUP BY is applied. 

5. HAVING is applied. 

6. ORDER BY is applied.

That is why a query in the following form is almost certainly an error:

select *
from emp
where ROWNUM <= 5
order by sal desc;

The intention was most likely to get the five highest-paid people—a top- N query. What the query will return is five random records (the first five the query happens to hit), sorted by salary.
The procedural pseudocode for this query is as follows:

ROWNUM = 1
for x in
( select * from emp )
loop
exit when NOT(ROWNUM <= 5)
OUTPUT record to temp
ROWNUM = ROWNUM+1
end loop
SORT TEMP

It gets the first five records and then sorts them. A query with WHERE ROWNUM = 5 or WHERE ROWNUM > 5 doesn't make sense. This is because a ROWNUM value is assigned to a row during the predicate evaluation and gets incremented only
after a row passes the WHERE clause.

Here is the correct version of this query:

select *
from
( select *
from emp
order by sal desc )
where ROWNUM <= 5;

This version will sort EMP by salary descending and then return the first five records it encounters (the top-five records). As you'll see in the top- N discussion coming up shortly, Oracle
Database doesn't really sort the entire result set—it is smarter than that—but conceptually that is what takes place.

Top- N Query Processing with ROWNUM

In a top- N query, you are generally interested in taking some complex query, sorting it, and then retrieving just the first N rows (the top N rows).
ROWNUM has a top- N optimization that facilitates this type of query. You can use ROWNUM to avoid a massive sort of large sets. I'll discuss how it does this conceptually and then look at an example.

Suppose you have a query in this form:

select ...
from ...
where ...
order by columns;

Assume that this query returns a lot of data: thousands, hundreds of thousands, or more rows. However, you are interested only in the top N —say the top 10 or top 100. There are two ways
to approach this:

  • Have the client application run that query and fetch just the first N rows.

  • Use that query as an inline view, and use ROWNUM to limit the results, as in SELECT * FROM ( your_query_here ) WHERE ROWNUM <= N.

The second approach is by far superior to the first, for two reasons. The lesser of the two reasons is that it requires less work by the client, because the database takes care of limiting the result set. The more important reason
is the special processing the database can do to give you just the top N rows. Using the top- N query means that you have given the database extra information. You have
told it, "I'm interested only in getting N rows; I'll never consider the rest." Now, that doesn't sound too earth-shattering until you think about sorting—how sorts work and what the server would need to do. Let's
walk through the two approaches with a sample query:

select *
from t
order by unindexed_column;

Now, assume that T is a big table, with more than one million records, and each record is "fat"—say, 100 or more bytes. Also assume that UNINDEXED_COLUMN is, as its name implies, a column that is not indexed. And assume that you are
interested in getting just the first 10 rows. Oracle Database would do the following:

1. Run a full-table scan on T. 

2. Sort T by UNINDEXED_COLUMN. This is a full sort. 

3. Presumably run out of sort area memory and need to swap temporary extents to disk. 

4. Merge the temporary extents back to get the first 10 records when they are requested. 

5. Clean up (release) the temporary extents as you are finished with them.

Now, that is a lot of I/O. Oracle Database has most likely copied the entire table into TEMP and written it out, just to get the first 10 rows.

Next, let's look at what Oracle Database can do conceptually with a top- N query:

select *
from
(select *
from t
order by unindexed_column)
where ROWNUM < :N;

In this case, Oracle Database will take these steps:

1. Run a full-table scan on T, as before (you cannot avoid this step). 

2. In an array of : N elements (presumably in memory this time), sort only : N rows.

The first N rows will populate this array of rows in sorted order. When the N +1 row is fetched, it will be compared to the last row in the array.
If it would go into slot N +1 in the array, it gets thrown out. Otherwise, it is added to this array and sorted and one of the existing rows is discarded. Your sort area holds N rows
maximum, so instead of sorting one million rows, you sort N rows.

This seemingly small detail of using an array concept and sorting just N rows can lead to huge gains in performance and resource usage. It takes a lot less RAM to sort 10 rows than it does
to sort one million rows (not to mention TEMP space usage).

Using the following table T, you can see that although both approaches get the same results, they use radically different amounts of resources:

create table t
as
select dbms_random.value(1,1000000)
id,
rpad('*',40,'*' ) data
from dual
connect by level <= 100000; begin
dbms_stats.gather_table_stats
( user, 'T');
end;
/ Now enable tracing, via exec
dbms_monitor.session_trace_enable
(waits=>true);

And then run your top- N query with ROWNUM:

select *
from
(select *
from t
order by id)
where rownum <= 10;

And finally run a "do-it-yourself" query that fetches just the first 10 records:

declare
cursor c is
select *
from t
order by id;
l_rec c%rowtype;
begin
open c;
for i in 1 .. 10
loop
fetch c into l_rec;
exit when c%notfound;
end loop;
close c;
end;
/

After executing this query, you can use TKPROF to format the resulting trace file and review what happened. First examine the top- N query, as shown in Listing 1.

Code Listing 1: Top- N query using ROWNUM

select *
from
(select *
from t
order by id)
where rownum <= 10
call count cpu elapsed disk query current rows
-------- -------- ------- ------- ------- -------- -------- ------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.04 0.04 0 949 0 10
-------- -------- ------- ------- ------- -------- -------- ------
total 4 0.04 0.04 0 949 0 10 Rows Row Source Operation
----------------- ---------------------------------------------------
10 COUNT STOPKEY (cr=949 pr=0 pw=0 time=46997 us)
10 VIEW (cr=949 pr=0 pw=0 time=46979 us)
10 SORT ORDER BY STOPKEY (cr=949 pr=0 pw=0 time=46961 us)
100000 TABLE ACCESS FULL T (cr=949 pr=0 pw=0 time=400066 us)

The query read the entire table (because it had to), but by using the SORT ORDER BY STOPKEY step, it was able to limit its use of temporary space to just 10 rows. Note the final Row Source Operation line—it shows that the query did
949 logical I/Os in total (cr=949), performed no physical reads or writes (pr=0 and pw=0), and took 400066 millionths of a second (0.04 seconds). Compare that with the do-it-yourself approach shown in Listing 2.

Code Listing 2: Do-it-yourself query without ROWNUM

SELECT * FROM T ORDER BY ID
call count cpu elapsed disk query current rows
-------- -------- ------- ------- ------- -------- -------- ------
Parse 1 0.00 0.00 0 0 0 0
Execute 2 0.00 0.00 0 0 0 0
Fetch 10 0.35 0.40 155 949 6 10
-------- -------- ------- ------- ------- -------- -------- ------
total 13 0.36 0.40 155 949 6 10 Rows Row Source Operation
----------------- ---------------------------------------------------
10 SORT ORDER BY (cr=949 pr=155 pw=891 time=401610 us)
100000 TABLE ACCESS FULL T (cr=949 pr=0 pw=0 time=400060 us) Elapsed times include waiting for the following events: Event waited on Times
------------------------------ ------------
direct path write temp 33
direct path read temp 5

As you can see, this result is very different. Notably, the elapsed/CPU times are significantly higher, and the final Row Source Operation lines provide insight into why this is. You had to perform a sort to disk, which you can see
with the pw=891 (physical writes). Your query performed some direct path reads and writes—the sort of 100,000 records (instead of just the 10 we are ultimately interested in) took place on disk—adding considerably to the runtime/resource usage of your query.

Pagination with ROWNUM

My all-time-favorite use of ROWNUM is pagination. In this case, I use ROWNUM to get rows Nthrough M of a result set. The general form is as follows:

select *
from ( select /*+ FIRST_ROWS(n) */
a.*, ROWNUM rnum
from ( your_query_goes_here,
with order by ) a
where ROWNUM <=
:MAX_ROW_TO_FETCH )
where rnum >= :MIN_ROW_TO_FETCH; where
  • FIRST_ROWS(N) tells the optimizer, "Hey, I'm interested in getting the first rows, and I'll get N of them as fast as possible."

  • :MAX_ROW_TO_FETCH is set to the last row of the result set to fetch—if you wanted rows 50 to 60 of the result set, you would set this to 60.

  • :MIN_ROW_TO_FETCH is set to the first row of the result set to fetch, so to get rows 50 to 60, you would set this to 50.

The concept behind this scenario is that an end user with a Web browser has done a search and is waiting for the results. It is imperative to return the first result page (and second page, and so on) as fast as possible. If you look
at that query closely, you'll notice that it incorporates a top- N query (get the first :MAX_ROW_TO_FETCH rows from your query) and hence benefits from the top- N query
optimization I just described. Further, it returns over the network to the client only the specific rows of interest—it removes any leading rows from the result set that are not of interest.

One important thing about using this pagination query is that the ORDER BY statement should order by something unique. If what you are ordering by is not unique, you should add something to the end of the ORDER BY to make it so. If
you sort 100 records by SALARY, for example, and they all have the same SALARY value, then specifying rows 20 to 25 does not really have any meaning. In order to see this, use a small table with lots of duplicated ID values:

SQL> create table t
2 as
3 select mod(level,5) id,
trunc(dbms_random.value(1,100)) data
4 from dual
5 connect by level <= 10000;
Table created.

And then query rows 148 to 150 and 151 after sorting by the ID column:

SQL> select *
2 from
3 (select a.*, rownum rnum
4 from
5 (select id, data
6 from t
7 order by id) a
8 where rownum <= 150
9 )
10 where rnum >= 148; ID DATA RNUM
------- ---------- -----------
0 38 148
0 64 149
0 53 150 SQL>
SQL> select *
2 from
3 (select a.*, rownum rnum
4 from
5 (select id, data
6 from t
7 order by id) a
8 where rownum <= 151
9 )
10 where rnum >= 148; ID DATA RNUM
------- ---------- -----------
0 59 148
0 38 149
0 64 150
0 53 151

Note in this case that one time for row 148, the result returned DATA=38, and that the next time, the result returned DATA=59. Both queries are returning exactly the right answer, given what you've requested: Sort the data by ID,
throw out the first 147 rows, and return the next 3 or 4 rows. Both of them do that, but because ID has so many duplicate values, the query cannot do it deterministically— the same sort order is not assured from run to run of the query .
In order to correct this, you need to add something unique to the ORDER BY. In this case, just use ROWID:

SQL> select *
2 from
3 (select a.*, rownum rnum
4 from
5 (select id, data
6 from t
7 order by id, rowid) a
8 where rownum <= 150
9 )
10 where rnum >= 148; ID DATA RNUM
------- ---------- -----------
0 45 148
0 99 149
0 41 150 SQL>
SQL> select *
2 from
3 (select a.*, rownum rnum
4 from
5 (select id, data
6 from t
7 order by id, rowid) a
8 where rownum <= 151
9 )
10 where rnum >= 148; ID DATA RNUM
------- ---------- -----------
0 45 148
0 99 149
0 41 150
0 45 151

Now the query is very deterministic. ROWID is unique within a table, so if you use ORDER BY ID and then within ID you use ORDER BY ROWID, the rows will have a definite, deterministic order and the pagination query will deterministically
return the rows as expected.

ROWNUM Wrap-Up

I'll hazard a guess that you and many other readers now have a newfound respect for ROWNUM and understand these aspects:

  • How ROWNUM is assigned, so you can write bug-free queries that use it

  • How it affects the processing of your query, so you can use it to paginate a query on the Web

  • How it can reduce the work performed by your query, so that top-N queries that once consumed a lot of TEMP space now use none and return results much faster.


Referenced from: http://www.oracle.com/technetwork/issue-archive/2006/06-sep/o56asktom-086197.html

On ROWNUM and Limiting Results的更多相关文章

  1. Oracle no TOP, how to get top from order

    On ROWNUM and Limiting Results Our technologist explains how ROWNUM works and how to make it work fo ...

  2. Boto3

    https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.htmlboto3 安装pip install bot ...

  3. MySQL Crash Course #02# Chapter 3. 4 通配符. 分页

    索引 查看表.文档操作 检索必须知道的两件事 数据演示由谁负责 通配符.非必要不用 检索不同的行 限制结果集.分页查找 运用数据库.表全名 命令后加分号对于很多 DBMS 都不是必要的,但是加了也没有 ...

  4. Realm Swift

    Realm Swift 当前这个翻译,主要是方便我自己查阅api,有非常多地方写的比較晦涩或者没有翻译,敬请谅解 version 0.98.7 官方文档 參考文献 Realm支持类型 String,N ...

  5. pymongo的一些操作

    参考:http://www.yiibai.com/mongodb/mongodb_drop_collection.html http://www.cnblogs.com/zhouxuchen/p/55 ...

  6. 大数据培训班 cloudera公司讲师面对面授课 CCDH CCAH CCP

    大数据助力成就非凡.大数据正在改变着商业游戏规则,为企业解决传统业务问题带来变革的机遇.毫无疑问,当未来企业尝试分析现有海量信息以推动业务价值增值时,必定会采用大数据技术. 目前对大数据的分析工具,首 ...

  7. Index Skip Scan in Oracle in 11g

    http://viralpatel.net/blogs/oracle-index-skip-scan/ in 11g the same sql use index skip scan but in 1 ...

  8. Using ROWNUM in Oracle

    ROWNUM is an Oracle pseudo column which numbers the rows in a result set. SELECT rownum, table_nameF ...

  9. [20180608]Wrong Results with IOT, Added Column and Secondary Index.txt

    [20180608]Wrong Results with IOT, Added Column and Secondary Index.txt --//链接:http://db-oriented.com ...

随机推荐

  1. Java面试题之notify和notifyAll的区别

    锁池: 假设线程A已经拥有对象锁,线程B.C想要获取锁就会被阻塞,进入一个地方去等待锁的等待,这个地方就是该对象的锁池: 等待池: 假设线程A调用某个对象的wait方法,线程A就会释放该对象锁,同时线 ...

  2. 【bzoj1406】 AHOI2007密码箱 数论

    在一次偶然的情况下,小可可得到了一个密码箱,听说里面藏着一份古代流传下来的藏宝图,只要能破解密码就能打开箱子,而箱子背面刻着的古代图标,就是对密码的提示.经过艰苦的破译,小可可发现,这些图标表示一个数 ...

  3. 【转】SpringMVC访问静态资源的三种方式

    如何你的DispatcherServlet拦截 *.do这样的URL,就不存在访问不到静态资源的问题.如果你的DispatcherServlet拦截“/”,拦截了所有的请求,同时对*.js,*.jpg ...

  4. Team Contests - Warmup(2016年多校热身赛,2016年黑龙江省赛)

    Team Contests - Warmup A 题意:... 思路:不会 代码:... 随机 B 题意:给n个点,问是否有一个圆上有最少n/3个点 思路:随机大法好. 代码:... 递推 C 题意: ...

  5. 【NOIP2017】逛公园(最短路图,拓扑排序,计数DP)

    题意: 策策同学特别喜欢逛公园. 公园可以看成一张 N 个点 M 条边构成的有向图,且没有自环和重边.其中 1 号点是公园的入口, N 号点是公园的出口,每条边有一个非负权值,代表策策经过这条边所要花 ...

  6. HDU 6231 (二分+双指针)

    题意:给一个长度为n的数组,问在由这个数组的所有的区间第k小组成B数组中,第m大元素是多少 解法:这题较难的地方在于转化思维.如果去求所有区间的第k小,最坏复杂度是O(n*n)肯定超时. 这题正确的解 ...

  7. CMake安装或CMake Error at CMakeLists

    CMake安装或CMake Error at CMakeLists 发生情景: 使用cmake命令安装软件时,报如下错误: CMake Error at CMakeLists.txt:4 (CMAKE ...

  8. hdu 1401(单广各种卡的搜索题||双广秒速)

    Solitaire Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 65536/32768 K (Java/Others)Total S ...

  9. LeetCode OJ--Unique Paths II **

    https://oj.leetcode.com/problems/unique-paths-ii/ 图的深搜,有障碍物,有的路径不通. 刚开始想的时候用组合数算,但是公式没有推导出来. 于是用了深搜, ...

  10. ElasticSearch常用结构化搜索

    最近,需要用到ES的一些常用的结构化搜索命令,因此,看了一些官方的文档,学习了一下.结构化查询指的是查询那些具有内在结构的数据,比如日期.时间.数字都是结构化的. 它们都有精确的格式,我们可以对这些数 ...