Balancing CPU and I/O throughput is essential to achieve good overall performance and to maximize hardware utilization.

SQL Server includes two asynchronous I/O mechanisms - sequential read ahead and random prefetching - that are designed to address this challenge.

To understand why asynchronous I/O is so important, consider the CPU to I/O performance gap. The memory subsystem on a modern CPU can deliver data sequentially at roughly 5 Gbytes per second per socket (or for non-NUMA machines for all sockets sharing the same bus) and (depending on how you measure it) can fetch random memory locations at roughly 10 to 50 million accesses per second. By comparison, a high end 15K SAS hard drive can read only 125 Mbytes per second sequentially and can perform only 200 random I/Os per second (IOPS). Solid State Disks (SSDS) can reduce the gap between sequential and random I/O performance by eliminating the moving parts from the equation, but a performance gap remains. In an effort to close this performance gap, it is not uncommon for servers to have a ratio of 10 or more drives for every CPU. (It is also important to consider and balance the entire I/O subsystem including the number and type of disk controllers not just the drives themselves but that is not the focus of this post.)

Unfortunately, a single CPU issuing only synchronous I/Os can keep only one spindle active at a time. For a single CPU to exploit the available bandwidth and IOPs of multiple spindles effectively the server must issue multiple I/Os asynchronously. Thus, SQL Server includes the aforementioned read ahead and prefetching mechanisms. In this post, I'll take a look at sequential read ahead.

When SQL Server performs a sequential scan of a large table, the storage engine initiates the read ahead mechanism to ensure that pages are in memory and ready to scan before they are needed by the query processor. The read ahead mechanism tries to stay 500 pages ahead of the scan. SQL Server tries to combine up to 64 contiguous pages (512 Kbytes) into a single  scatter (asynchronous) I/O.  So, in a best case scenario, it can read ahead 500 pages in just 8 I/Os.  However, if the pages in the table are not contiguous (e.g., due to fragmentation), SQL Server cannot combine the I/Os and must issue one I/O per page (8 Kbytes).

We can see the read ahead mechanism in action by checking the output of SET STATISTICS IO ON. For example, I ran the following query on a 1GB scale factor TPC-H database. The LINEITEM table has roughly 6 million rows.

SET STATISTICS IO ON

SELECT COUNT(*) FROM LINEITEM

Table 'LINEITEM'. Scan count 3, logical reads 22328, physical reads 3, read-ahead reads 20331, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Repeating the query a second time shows that the table is now cached in the buffer pool:

SELECT COUNT(*) FROM LINEITEM

Table 'LINEITEM'. Scan count 3, logical reads 22328, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

For sequential I/O performance, it is important to distinguish between allocation ordered and index ordered scans.

An allocation ordered scan tries to read pages in the order in which they are physically stored on disk while an index ordered scan reads pages according to the order in which the data on those index pages is sorted. (Note that in many cases there are multiple levels of indirection such as RAID devices or SANS between the logical volumes that SQL Server sees and the physical disks. Thus, even an allocation ordered scan may in fact not be truly optimally ordered.)

Although SQL Server tries to sort and read pages in allocation order even for an index ordered scan, an allocation ordered scan is generally going to be faster since pages are read in the order that they are written on disk with the minimal number of seeks. Heaps have no inherent order and, thus, are always scanned in allocation order. Indexes are scanned in allocation order only if the isolation level is read uncommitted (or the NOLOCK hint is used) and only if the query process does not request an ordered scan.Defragmenting indexes can help to ensure that index ordered scans perform on par with allocation ordered scans.

Correction:

First - "The memory subsystem on a modern CPU can deliver data sequentially at roughly 5 Gbytes per second per core"

There are NUMA and non-NUMA hardware systems. Non-NUMA share FSB for exclusive access to RAM, so it becomes 5 Gbytes perl ALL CPUs on board. For NUMA systesms - all cores on the same NUMA node share that 5Gbytes. Thus its not per core.

Second is "Solid State Disks (SSDS) can reduce the gap between sequential and random I/O performance by eliminating the moving parts from the equation, but a performance gap remains." Do you mean performance = throughput here?

Thank you, Serge

The non-leaf nodes of the B-tree (specifically those nodes one level above the leaf nodes) have pointers to and can be used to prefetch multiple pages at a time.  Of course, this optimization does work for heaps which do not have non-leaf nodes.

Sequential Read Ahead For SQL Server的更多相关文章

  1. 根据SQL Server排序规则创建顺序GUID

    public static class GuidUtil { , , , , , , DateTimeKind.Utc).Ticks / 10000L; /// <summary> /// ...

  2. Microsoft SQL Server Trace Flags

    Complete list of Microsoft SQL Server trace flags (585 trace flags) REMEMBER: Be extremely careful w ...

  3. Understanding how SQL Server executes a query

    https://www.codeproject.com/Articles/630346/Understanding-how-SQL-Server-executes-a-query https://ww ...

  4. SQL Server 服务器磁盘测试之SQLIO篇(一)

    数据库调优工作中,有一部分是需要排查IO问题的,例如IO的速度或者RAID级别无法响应高并发下的快速请求.最常见的就是查看磁盘每次读写的响应速度,通过性能计数器Avg.Disk sec/Read(Wr ...

  5. Create a SQL Server Database on a network shared drive

    (原文地址:http://blogs.msdn.com/b/varund/archive/2010/09/02/create-a-sql-server-database-on-a-network-sh ...

  6. Sql Server优化之索引提示----我们为什么需要查询提示,Sql Server默认情况下优化策略选择的不足

    环境: Sql Server2012 SP3企业版,Windows Server2008 标准版 问题由来: 最近在做DB优化的时候,发现一个存储过程有非常严重的性能问题, 由于整个SP整体逻辑是一个 ...

  7. 理解SQL Server是如何执行查询的 (3/3)

    页并发访问的保护:闩锁 在多线程并发情况下,需要防止读线程读到写线程正在写的资源,在编程中,通过使用互斥器(Mutexes), 信号量(Semaphore), 临界区(Critical Section ...

  8. 理解SQL Server是如何执行查询的 (2/3)

    查询执行的内存授予(Query Execution Memory Grant) 有些操作符需要较多的内存才能完成操作.例如,SORT.HASH.HAS聚合等.执行计划通过操作符需要处理数据量的预估值( ...

  9. SQL Server 中的逻辑读与物理读

    首先要理解逻辑读和物理读: 预读:用估计信息,去硬盘读取数据到缓存.预读100次,也就是估计将要从硬盘中读取了100页数据到缓存. 物理读:查询计划生成好以后,如果缓存缺少所需要的数据,让缓存再次去读 ...

随机推荐

  1. NodeJs - 100

    Nodejs官方文档 https://nodejs.org/en/docs/ Nodejs官方网站 https://nodejs.org/en/ Nodejs的特征:  1.采用非阻塞性IO机制:—— ...

  2. 【液晶模块系列基础视频】1.3.iM_TFT30模块简介

    [液晶模块系列基础视频]1.3.iM_TFT30模块介绍 ============================== 技术论坛:http://www.eeschool.org 博客地址:http:/ ...

  3. mongodb 3.2 分片部署步骤

    #linux 网络优化1. 文件中/etc/sysctl.conf, 加入net.core.somaxconn = 2048fs.file-max = 2000000fs.nr_open = 2000 ...

  4. [办公自动化]利用Acrobat完成问卷调查或者考试卷

    整体思路:(软件环境Acrobat) 1.制作问卷. 采用word制作,制作基础页面,然后倒入.自己亲测时,发现一般的文字域是可以的,但是单选按钮就不能导入. 如果是考试卷,可以利用word制作基础页 ...

  5. Ubuntu安装Flash

    第一步:打开视频网站,随意点击一个视频,会提示需要先安装Flash,点击它所提供的链接. 第二步:根据系统选择合适的版本进行下载,有红帽的yum版本,我选择的是tar.gz for other Lin ...

  6. 《你不知道的JavaScript》读书笔记(一)作用域

    名词 引擎:从头到尾负责整个 JavaScript 程序的 编译 及 执行 过程. 编译器:负责 语法分析 及 代码生成. 作用域:负责收集并维护由所有声明的标识符(变量)组成的一系列查询,并实施一套 ...

  7. 推荐给开发者的20个优秀PHP框架

    推荐给开发者的20个优秀PHP框架 来源:developerslane   时间:2015-01-13 19:48:06   阅读数:111916 分享到:14 [导读] PHP是非常受欢迎并且很有影 ...

  8. php调用empty出现错误Can't use function return value in write context

    php调用empty出现错误Can't use function return value in write context 2012-10-28 09:33:22 | 11391次阅读 | 评论:0 ...

  9. php中json_decode()和json_encode()的使用方法

    php中json_decode()和json_encode()的使用方法 作者: 字体:[增加 减小] 类型:转载   json_decode对JSON格式的字符串进行编码而json_encode对变 ...

  10. ExtJS 刷新或者重载Tree后,默认选中刷新前最后一次选中的节点代码片段

    //tree对象 var tree = Main.getPageControler().treePanel; //获取选中的节点 var node = tree.getSelectionModel() ...