A clustered index determines the order in which the rows of a table are stored on disk. If a table has a clustered index, then the rows of that table will be stored on disk in the same exact order as the clustered index. An example will help clarify what we mean by that.

An example of a clustered index

Suppose we have a table named Employee which has a column named EmployeeID. Let’s say we create a clustered index on the EmployeeID column. What happens when we create this clustered index? Well, all of the rows inside the Employee table will bephysically – sorted (on the actual disk) – by the values inside the EmployeeID column. What does this accomplish? Well, it means that whenever a lookup/search for a sequence of EmployeeID’s is done using that clustered index, then the lookup will be much faster because of the fact that the sequence of employee ID’s are physically stored right next to each other on disk – that is the advantage with the clustered index. This is because the rows in the table are sorted in the exact same order as the clustered index, and the actual table data is stored in the leaf nodes of the clustered index.

Remember that an index is usually a tree data structure – and leaf nodes are the nodes that are at the very bottom of that tree. In other words, a clustered index basically contains the actual table level data in the index itself. This is very different from most other types of indexes as you can read about below.

When would using a clustered index make sense?

Let’s go through an example of when and why using a clustered index would actually make sense. Suppose we have a table named Owners and a table named Cars. This is what the simple schema would look like – with the column names in each table:

Owners
Owner_Name
Owner_Age Cars
Car_Type
Owner_Name

Let’s assume that a given owner can have multiple cars – so a single Owner_Name can appear multiple times in the Cars table.
Now, let’s say that we create a clustered index on the Owner_Name column in the Cars table. What does this accomplish for us? Well, because a clustered index is stored physically on the disk in the same order as the index, it would mean that a given Owner_Name would have all his/her car entries stored right next to each other on disk. In other words, if there is an owner named “Joe Smith” or “Raj Gupta”, then each owner would have all of his/her entries in the Cars table stored right next to each other on the disk.

When is using a clustered index an advantage?

What is the advantage of this? Well, suppose that there is a frequently run query which tries to find all of the cars belonging to a specific owner. With the clustered index, since all of the car entries belonging to a single owner would be right next to each other on disk,the query will run much faster than if the rows were being stored in some random order on the disk. And that is the key point to remember!

Why is it called a clustered index?

In our example, all of the car entries belonging to a single owner would be right next to each other on disk. This is the “clustering”, or grouping of similar values, which is referred to in the term “clustered” index.

Note that having an index on the Owner_Name would not necessarily be unique, because there are many people who share the same name. So, you might have to add another column to the clustered index to make sure that it’s unique.

What is a disadvantage to using a clustered index?

A disadvantage to using a clustered index is the fact that if a given row has a value updated in one of it’s (clustered) indexed columns what typically happens is that the database will have to move the entire row so that the table will continue to be sorted in the same order as the clustered index column. Consider our example above to clarify this. Suppose that someone named “Rafael Nadal” buys a car – let’s say it’s a Porsche – from “Roger Federer”. Remember that our clustered index is created on the Owner_Name column. This means that when we do a update to change the name on that row in the Cars table, the Owner_Name will be changed from “Roger Federer” to “Rafael Nadal”.

But, since a clustered index also tells the database in which order to physically store the rows on disk, when the Owner_Name is changed it will have to move an updated row so that it is still in the correct sorted order. So, now the row that used to belong to “Roger Federer” will have to be moved on disk so that it’s grouped (or clustered) with all the car entries that belong to “Rafael Nadal”. Clearly, this is a performance hit. This means that a simple UPDATE has turned into a DELETE and then an INSERT – just to maintain the order of the clustered index. For this exact reason, clustered indexes are usually created on primary keys or foreign keys, because of the fact that those values are less likely to change once they are already a part of a table.

A comparison of a non-clustered index with a clustered index with an example

As an example of a non-clustered index, let’s say that we have a non-clustered index on the EmployeeID column. A non-clustered index will store both the value of the EmployeeID AND a pointer to the row in the Employee table where that value is actually stored. But a clustered index, on the other hand, will actually store the row data for a particular EmployeeID – so if you are running a query that looks for an EmployeeID of 15, the data from other columns in the table like EmployeeName, EmployeeAddress, etc. will all actually be stored in the leaf node of the clustered index itself.

This means that with a non-clustered index extra work is required to follow that pointer to the row in the table to retrieve any other desired values, as opposed to a clustered index which can just access the row directly since it is being stored in the same order as the clustered index itself. So, reading from a clustered index is generally faster than reading from a non-clustered index.

A table can have multiple non-clustered indexes

A table can have multiple non-clustered indexes because they don’t affect the order in which the rows are stored on disk like clustered indexes.

Why can a table have only one clustered index?

Because a clustered index determines the order in which the rows will be stored on disk, having more than one clustered index on one table is impossible. Imagine if we have two clustered indexes on a single table – which index would determine the order in which the rows will be stored? Since the rows of a table can only be sorted to follow just one index, having more than one clustered index is not allowed.

Summary of the differences between clustered and non-clustered indexes

Here’s a summary of the differences:

  • A clustered index determines the order in which the rows of the table will be stored on disk – and it actually stores row level data in the leaf nodes of the index itself. A non-clustered index has no effect on which the order of the rows will be stored.
  • Using a clustered index is an advantage when groups of data that can be clustered are frequently accessed by some queries. This speeds up retrieval because the data lives close to each other on disk. Also, if data is accessed in the same order as the clustered index, the retrieval will be much faster because the physical data stored on disk is sorted in the same order as the index.
  • A clustered index can be a disadvantage because any time a change is made to a value of an indexed column, the subsequent possibility of re-sorting rows to maintain order is a definite performance hit.
  • A table can have multiple non-clustered indexes. But, a table can have only one clustered index.
  • Non clustered indexes store both a value and a pointer to the actual row that holds that value. Clustered indexes don’t need to store a pointer to the actual row because of the fact that the rows in the table are stored on disk in the same exact order as the clustered index – and the clustered index actually stores the row-level data in it’s leaf nodes.

What is the difference between a Clustered and Non Clustered Index?的更多相关文章

  1. 触发器TRIGGER 自增IDENTITY 聚集索引CLUSTERED

    在触发器的“触发”过程中,有两个临时表inserted和deleted发生了作用.这两个特殊的临时表inserted和deleted,仅仅在触发器运行时存在,它们在某一特定时间和某一特定表相关. CR ...

  2. Partitioning & Archiving tables in SQL Server (Part 1: The basics)

    Reference: http://blogs.msdn.com/b/felixmar/archive/2011/02/14/partitioning-amp-archiving-tables-in- ...

  3. Oracle 11g Articles

    发现一个比较有意思的网站,http://www.oracle-base.com/articles/11g/articles-11g.php Oracle 11g Articles Oracle Dat ...

  4. Constraint2:constraint

    一,Constraint 是表定义的一部分,用于实现数据完整性. Data Integrity 由三种类型的constraint实现: Entity Integrity:数据是唯一的.约束: prim ...

  5. SQLSERVER系统视图,系统表,sys.sql_modules视图

    SQLServer中提供了相当丰富的系统视图,能够从宏观到微观,从静态到动态反应数据库对象的存储结果.系统性能.系统等待事件等等.同时 也保留了与早期版本兼容性的视图,主要差别在于SQLServer2 ...

  6. 6SQL SERVER视图/索引

    一.视图 1.视图概念 ①视图是包含由一张或多张表的列组成的数据集.该表中的记录是由一条查询语句执行后所得到的查询结果所构成的. ②视图是一张虚拟表,它表示一张表的部分数据或多张表的综合数 据,其结构 ...

  7. 人人都是 DBA(VIII)SQL Server 页存储结构

    当在 SQL Server 数据库中创建一张表时,会在多张系统基础表中插入所创建表的信息,用于管理该表.通过目录视图 sys.tables, sys.columns, sys.indexes 可以查看 ...

  8. ejb3: message drive bean(MDB)示例

    上一篇已经知道了JMS的基本操作,今天来看一下ejb3中的一种重要bean:Message Drive Bean(mdb) 如果要不断监听一个队列中的消息,通常我们需要写一个监听程序,这需要一定的开发 ...

  9. sql是如何执行一个查询的!

    引用自:http://rusanu.com/2013/08/01/understanding-how-sql-server-executes-a-query/ Understanding how SQ ...

随机推荐

  1. C#遍历enum类型

    对于enum类型: 使用foreach遍历enum类型的元素并填充combox foreach ( HatchStyle hs1 in Enum.GetValues(typeof(HatchStyle ...

  2. Hangover[POJ1003]

    Hangover Time Limit: 1000MS   Memory Limit: 10000K Total Submissions: 121079   Accepted: 59223 Descr ...

  3. BZOJ3692 : 愚蠢的算法

    两个函数相同等价于不存在长度为$3$的下降子序列. 先考虑随意填的部分,设$f[i][j]$表示考虑了$[i,n]$,下降子序列第$2$项的最小值的是这里面第$j$个的方案数,转移则考虑往序列里插数字 ...

  4. 科技部专家王涌天:移动AR头显将“让人类重新站起来”

    目前,业内普遍认为VR和AR技术将是继移动手机之后的下一代计算平台,将给社会的方方面面带来全新的改变.近日,北京理工大学信息与电子学部主任.科技部863信息技术领域专家组成员王涌天教授对头戴式增强现实 ...

  5. change,propertychange,input事件小议

    github上关于mootools一个issue的讨论很有意思,所以就想测试记录下.感兴趣的可以点击原页面看看. 这个问题来自IE(LTE8)中对checkbox和radio change事件的实现问 ...

  6. Android MultiDex

    出现的原因: 当Android系统安装一个应用的时候,有一步是对Dex进行优化,这个过程有一个专门的工具来处理,叫DexOpt.DexOpt的执行过程是在第一次加载Dex文件的时候执行的.这个过程会生 ...

  7. 【BZOJ】3052: [wc2013]糖果公园

    http://www.lydsy.com/JudgeOnline/problem.php?id=3052 题意:n个带颜色的点(m种),q次询问,每次询问x到y的路径上sum{w[次数]*v[颜色]} ...

  8. JavaScript进阶(二)

    什么是事件 JavaScript 创建动态页面.事件是可以被 JavaScript 侦测到的行为. 网页中的每个元素都可以产生某些可以触发 JavaScript 函数或程序的事件. 比如说,当用户单击 ...

  9. 【JAVA】LOG4J使用心得

    一.LOG4J基础: 1.日志定义        简单的Log4j使用只需要导入下面的包就可以了 // import log4j packages import org.apache.log4j.Lo ...

  10. 集成IOS 环信SDK

    集成IOS SDK 在您阅读此文档时,我们假定您已经具备了基础的 iOS 应用开发经验,并能够理解相关基础概念. 下载SDK 通过Cocoapods下载地址 不包含实时语音版本SDK(EaseMobC ...