Partitioning & Archiving tables in SQL Server (Part 2: Split, Merge and Switch partitions)
Reference: http://blogs.msdn.com/b/felixmar/archive/2011/08/29/partitioning-amp-archiving-tables-in-sql-server-part-2-split-merge-and-switch-partitions.aspx
In the 1st part of this post, I explained how to create a partitioned table using a partition function as well as a partition schema. Now I’ll continue talking about how to merge or split partitions changing the partition function and the partition schema and how to move data between partitions using alter table command.
Here´s where the importance of the function definition makes sense, the way partitions split or merge depends of the RIGHT or LEFT specification. The easiest way you can understand how they work is looking at the following examples. I will use a partition function with LEFT specification, look at the process and the result after splitting and merging the table:
I already created a database PartitionDBLeft, this is how the database looks like:
And these are the definitions for partition schema, partition function and orders table:
In this case, the orders table is stored in a partition schema (psOrderDateRange) as well as the index. Also note that my primary key contains the columns OrderID plus OrderDate which is the partition key (the column by which the table is partitioned). When you create a primary key, SQL creates an index, usually a clustered index which sorts the table by that column. If the column OrderDate is not part of the primary key, you won´t be able to create a clustered key because in partitioned tables the sort order is determined by the partition key so you would need to create a non-clustered key instead. Later in this post you’ll see how important is this.
The partition function is defined using LEFT, now let´s insert some orders for 2008:
Now, let’s query the partitions, using the queries explained the Part 1, as you see the records are stored inside partition 1 (FG1)
Repeat the procedure inserting 300 for 2009 and 300 records for 2010, (just change the value for @OrderDate), this is how it looks:
Maybe you would expect to have 300 records for each year (2008, 209 and 2010) but the results are different, this is due to the function definition. When using LEFT, each value defined in the function corresponds to the UPPER limit of each partition which you can see in the the column Max Value in the last query. On the other hand if you had use RIGHT instead of LEFT, the result would be the expected. By now, I will drop the table as well as the schema and the function and will define the function again this time using a different value for each range:
The schema will remain the same as well as the table, this time using the new partition function. After inserting the same records this is how it looks:
In fact, you could insert the dates ‘2008/12/31, 2009/12/31 and 2010/12/31 and those would be the limit for each partition:
Split a partition
According to the partition schema definition, If you insert a record for 2011, that record will be inserted in partition 4 which is the PRIMARY partition. Suppose you need to insert the orders from 2011 in a new partition (inside a new filegroup, let’s say FG4). First, you have to add a new filegroup (FG4) and a new partition (an .NDF file) inside FG4:
Now you can modify your partition function to SPLIT the last range (which currently holds all records greater than 2010/12/31) to create a new range for orders from 01/01/2011 to 31/12/2010. Before doing so, you need to alter the partition schema to include the new filegroup (FG4) to let the partition function map the new range data to this filegroup, otherwise you will get this error:
The following script will alter the partition schema making FG4 the next available filegroup:
Now let´s alter the partition function to include the new range:
Now, check how the partition schema and partition function look after the change, you can use Management Studio to create the scripts:
This is the result:
This is the expected behavior in most of cases when you need to add a new partition in a rotating base of time to hold new data, in this scenario I would do it in december to prepare my orders table for the next year data. I can always use PRIMARY as the last partition for future data since I´m sure there will not be data there, however the partition function requires an additional range for all data that exceeds the last range value.
If I insert data between 2011/01/01 and 2011/12/31 that would be inserted in FG4:
For the next year, all you have to do is repeat the process, I mean: add FG5, alter partition function marking FG5 the next used filegroup and alter the partition function splitting the last range including 2012/12/31 and voilà, you have a new partition ready to hold 2012 records!
Merge Partitions
Now let´s talk about merging partitions. After some years, you decide all those partitions are difficult to maintain, it would be better to have all data for the first n years in a single partition. Let’s suppose you want to join all orders from 2008/01/01 to 20010/12/31 in a single file. Well, you can merge those 3 partitions but just 2 at time.
Merging is as simple as alter the partition function specifying the range that will be the new upper limit for the joined ranges:
Lets firs join FG1 and FG2 and see the result:
If you query the partitions, this is what you will see:
Note that all records from FG1 moved to FG2, now you only have 5 partitions instead of the 6 originals. and FG1 does not contains any data.
Now let´s join FG2 and FG3:
Let´s look what’s inside each filegroup:
Again, data from FG2 moved to FG3.
You can also check the reports included in management studio:
What about FG1 and FG2?. In fact, this filegroups are empty, you could use for future data, for other kind of data or you could remove them.
Switch Partitions
Now suppose you need to archive all data from 2008/01/01 to 2010/12/31 since this is historical data. At the current time, all this data is stored in FG3 which is now partition No. 1. By the way, If this data will never change anymore, you could mark this filegroup as READONLY:
This is not necessary to archive data but is recommended to protect historical information against writing and to simplify backup strategy which I’ll discuss in another post.
The first thing you must be aware of is index aligning. Remember: if the partition key (OrderDate column) is part of the primary Key (OrderID + OrderDate) the index is aligned with the data. This means that each corresponding portion of the index is allocated in its corresponding partition because it uses the same partition schema and you can have a clustered index. If this is not the case (suppose your primary key is just OrderID), you can´t have a clustered index in the primary key because the data needs to be sorted using the partition key (which is Order Date) and you would have to create a non-clustered index in the primary key and all the index should be stored in a different partition (usually the PRIMARY filegroup).The main problem with this design is that you would need to drop the index before switching the partitions and then recreate the index after the switch operation but also, the process to move data from 1 partition to another (switch) would be slower. Fortunately our Orders table has a aligned index which you can see in the index properties window:
You can see the index is stored in the partition schema (psOrderDateRange) so it is aligned with the data.
In this scenario, I will move data from FG3 to a different table (Orders_History) which can be in the same or in a different database, I just want to archive those records and free space from my current database. Instead of write a delete query which can be slow and use a lot of transaction log space, I will simply move FG3 to a temporary table inside the same filegroup which will be very fast since only metadata will be moved and then I will transfer those records to the historical table which can be stored in the same or in a different database using an INSERT SELECT clause. Of course this will take some time but I can move it in a simpler way and without locking my production table or affecting my users.
The process is:
- Create a temporary table with the same structure from the original table (orders) in the SAME FILEGROUP. This is necessary since you can only switch data from partitions located in the same filegroup.
- Create the destination table (orders_history) with the same structure as the original and the temporary.
- Switch the desired partition using ALTER TABLE clause
- insert the data from the temporary table to the destination table using INSERT SELECT clause.
- Drop the temporary table.
Let´s see:
This is my temporary table created on FG3:
This is my Historical table created in a different database, note I excluded INDENTITY since this value will exists in the source table.
The data I want to move lives in FG3 which is now partition 1. I just simply alter orders table this way:
What is surprising is that it doesn't matter how big partition 1 is, the data will be switched very fast!
Before moving the data to the history table, let´s query the partitions again:
FG3 has not any records.
All the data is now in the Orders_Temp table:
Now let´s move the data to Orders_History:
Finally, drop Orders_Temp table:
However you cannot drop FG3 because one range of the orders table is allocated in this filegroup, although you know there is not data there. However you can merge FG3 and FG4 this will release FG3 if you want to drop it. in the last query you can se FG3 boundary is 2010-12-31, so you can merge using this value:
If you query the partitions, FG3 is not used anymore:
Now you can remove FG3 if you want.
Conclusion
You can create a partitioning strategy in very large tables in order to simplify administration, enhance performance, create an archiving strategy to purge out data from the database and manage partitions merging or splitting them to provision space for future data. You must take care of table design and choose the correct partition key as well as index. If you make a wrong decision it could be very difficult to manage future data. Try to reproduce the same examples in this post but now using RIGHT instead of LEFT and you´ll see what I mean. Also, try to create a table without the OrderDate column in the primary key to see how the index is built.
I hope you find this information helpful, If that´s the case leave me a message in the blog.
Thanks.
Partitioning & Archiving tables in SQL Server (Part 2: Split, Merge and Switch partitions)的更多相关文章
- Partitioning & Archiving tables in SQL Server (Part 1: The basics)
Reference: http://blogs.msdn.com/b/felixmar/archive/2011/02/14/partitioning-amp-archiving-tables-in- ...
- Part 17 Temporary tables in SQL Server
Temporary tables in SQL Server
- SQL Server 2008中的MERGE(不仅仅是合并)
SQL Server 2008中的MERGE语句能做很多事情,它的功能是根据源表对目标表执行插入.更新或删除操作.最典型的应用就是进行两个表的同步. 下面通过一个简单示例来演示MERGE语句的使用方法 ...
- SQL Server 2008中的MERGE(数据同步)
OK,就像标题呈现的一样,SQL Server 2008中的MERGE语句能做很多事情,它的功能是根据源表对目标表执行插入.更新或删除操作.最典型的应用就是进行两个表的同步. 下面通过一个简单示例来演 ...
- Sql Server字符串拆分(Split)方法汇总
详细链接:https://shop499704308.taobao.com/?spm=a1z38n.10677092.card.11.594c1debsAGeak--方法0:动态SQL法 declar ...
- SQL SERVER 自定义函数 split
Create function [dbo].[split] ( ), ) ) )) as begin declare @i int set @SourceSql=rtrim(ltrim(@Source ...
- sql server 存储过程分隔split
CREATE FUNCTION [dbo].[F_split] ( ), ) ) , ), f )) --实现split功能 的函数 AS BEGIN DECLARE @i INT SET @Sour ...
- sql server:compare data from two tables
--Comparing data between two tables in SQL Server --Create two Tables-- CREATE TABLE TableA(ID Int, ...
- Microsoft SQL Server Trace Flags
Complete list of Microsoft SQL Server trace flags (585 trace flags) REMEMBER: Be extremely careful w ...
随机推荐
- socket实现手机连接网络打印机打印pos单
打印的工具类: public class PrintLine { String TAG = "xxl"; static String PRINT_IP = "192. ...
- CSS样式--实际开发总结
1. div 嵌套,子div中内容超出范围可以设置: display:inline-block; overflow:auto 即可让子div中出现滚轴 2. 让div中内容垂直方向居中 设置: ...
- php日历
一.计算数据 1.new一个Calendar类 2.初始化两个下拉框中的数据,年份与月份 3.初始化要搜索的年份和月份 4.计算得出日历中每一天的数据信息,包括css.天数 <?php requ ...
- wordpress(三)wordpress手动更新
第一:备份数据库还有文件 第二:从WP中文官网下载最新版WordPress,下载完毕解压到你电脑上. 第三:删除博客主机上的wp-includes和wp-admin目录. 第四:将解压在本地电脑的wo ...
- 如何分析解决Android ANR
来自: http://blog.csdn.net/tjy1985/article/details/6777346 http://blog.csdn.net/tjy1985/article/detail ...
- ZipArchive之C++编译和调用
由于要用到zip的解压,就上网下载了CZipArchive类的源码(还是2000年的),里面有个示例,稍微修改下,就能正常运行. 就高兴地把lib拿出来,放到项目中了.捣鼓了快一个下午了,死活编译不通 ...
- [BZOJ 3191][JLOI 2013]卡牌游戏
觉得这题很有必要讲一下! 现在发现在做概率题,基本是向 dp 和 马尔可夫链 靠齐 但是这一题真是把我坑了,因为状态太多,马式链什么的直接死了 我一开始的想法就是用 f[i][j] 表示剩余 i 个人 ...
- grunt压缩js文件
grunt是node中很好的管理项目的工具,利用它可以实现对整个项目的管理,避免很多重复性的工作如合并.压缩,检查语法等. 使用grunt首先要安装node环境,nodejs官网http://node ...
- Node.JS 学习路线图
转载自:http://www.admin10000.com/document/4624.html 从零开始nodejs系列文章, 将介绍如何利Javascript做为服务端脚本,通过Nodejs框架w ...
- gui2
事件:描述发生了什么的对象. 存在各种不同类型的事件类用来描述各种类型的用户交互. 事件源:事件的产生器. 事件处理器:接收事件.解释事件并处理用户交互的方法. 比如在Button组件上点击鼠标会产生 ...