Introduction To Database Refactoring

原文链接:by Scott W. Ambler;http://www.tdan.com/view-articles/5010/

Published: July 1, 2006
Published in TDAN.com July 2006

Material for this article was modified from Refactoring Databases: Evolutionary Database Design by Scott Ambler and Pramod Sadalage (Addison Wesley 2006).www.ambysoft.com/books/refactoringDatabases.html

Abstract:

A database refactoring is a small change to a database schema which improves its design without changing, at a practical level, the semantics of the database. In other words, it is a simple database transformation which neither adds nor breaks anything. The process of database refactoring defines how to safely evolve a database schema in small steps. Database refactoring enables data professionals to work in an evolutionary manner, just as modern application developers do. It also provides a coherent strategy for organizations to dig their way out of the legacy database hole.

1. What is Database Refactoring?

In the seminal text Refactoring, Martin Fowler [1] describes the programming technique called refactoring, which is a disciplined way to restructure code in small steps. Refactoring enables you to evolve your code slowly over time, to take an evolutionary (iterative and incremental) approach to programming. A critical aspect of a refactoring is that it retains the behavioral semantics of your code. You do not add functionality when you are refactoring, nor do you take it away. A refactoring merely improves the design of your code - nothing more and nothing less.

A database refactoring [2, 3] is a simple change to a database schema that improves its design while retaining both its behavioral and informational semantics - in other words, you cannot add new functionality or break existing functionality, you cannot add new data, and you cannot change the meaning of existing data. A database schema includes both structural aspects, such as table and view definitions, and functional aspects, such as stored procedures and triggers. I use the terms code refactoring to refer to traditional refactoring as described by Martin Fowler and database refactoring to refer to the refactoring of database schemas. The process of database refactoring is the act of making these simple changes to your database schema.

2. Why Database Refactoring?

There are two fundamental reasons why you want to adopt database refactoring:

  1. To repair existing legacy databases. Database refactoring enables you to safely evolve your database design in small steps, making it an important technique for improving the legacy assets within your organization. This is clearly much less risky than a "big bang" approach where you rewrite all of your applications and rework your database schema and release them all into production at once. Furthermore, it is much better than the "let's try not to allow things to get any worse" strategy currently employed by the vast majority of data management groups which I've run into, a strategy which has no hope of success because all it takes is one development team to go around the data management group and do an imperfect database design.
  2. To support evolutionary software development. Modern software development processes, including the Rational Unified Process (RUP), Extreme Programming (XP), Agile Unified Process (AUP), Scrum, and Dynamic System Development Method (DSDM), are all evolutionary in nature. Craig Larman [4] summarizes the research evidence, as well as the overwhelming support among the thought leaders within the IT community, in support of evolutionary approaches. Unfortunately, most data-oriented techniques are serial in nature, relying on specialists performing relatively narrow tasks, such as logical data modeling or physical data modeling. Therein lies the rub - the two groups need to work together, but both want to do so in different manners. I believe that data professionals need to adopt evolutionary techniques, such as database refactoring, which enable them to be relevant to modern development teams. Luckily these techniques exist [3], and they work quite well, it is now up to data professionals to choose to adopt them.

3. Implementing a Database Refactoring

Sometimes a project team finds itself in a relatively simple, "single-application database" situation, and if so they should consider themselves lucky. With this simple architecture database refactoring is fairly simple - you merely change your database schema and update your application to use the new version of the schema. What is more typical is to have many external programs interacting with your database, some of which are beyond the scope of your control. In this situation you cannot assume that all the external programs will be deployed at once, and must therefore support a transition period (also referred to as a deprecation period) during which both the old schema and the new schema are supported in parallel. For the rest of this article I will assume that you're in this situation.

To put database refactoring into context, let's step through a quick example. You have been working on a banking application for a few weeks and have noticed something strange about the Customer table depicted in Figure 1[1] - one of the column names isn't easy to understand. You decide to apply the Rename Column refactoring to theFName column to rename it to FirstName.

Figure 1. The initial database schema for Customer.

Agilists typically work together in pairs; one person should have application programming skills and the other data skills, and ideally both people have both sets of skills. The pair begins by determining whether the database schema needs to be refactored. Perhaps the programmer is mistaken about the need to evolve the schema, and how best to go about the refactoring. The refactoring is first developed and tested within the developer's sandbox. When it is finished, the changes are promoted into the project-integration environment, and the system is rebuilt, tested, and fixed as needed.

To apply the Rename Column refactoring in the development sandbox, the pair first runs all the tests to see that they pass. Next, they write a test because they are taking a Test-Driven Design (TDD) approach [5, 6, 7]. A likely test is to access a value in the FirstName column. After running the tests and seeing them fail, they implement the actual refactoring. To do this they introduce the FirstName column and the SynchronizeFirstName trigger as you see in Figure 2.

Figure 2. The database schema during the transition period.

The trigger is required to keep the values in the columns synchronized - each external program accessing theCustomer table will at most work with one but not both columns. At first, all production applications will work withFName, but over time they will be reworked to access FirstName instead. There are other options to do this, such as views or synchronization after the fact, but I find that triggers work best.

The FirstName column must be populated with values from the FName column. You then need to run both columns in parallel during a "transition period" of sufficient length to give the development teams time to update and redeploy all of their applications. This transition period could be several years in length, depending on the ability of your project teams to get new releases into production. In this case we've decided that the transition period will run to November 14, 2007.

The pair reruns the tests and sees that they now pass. They then refactor the existing tests to work with theFirstName column rather than the FName column. Once the database refactoring is completed in their development work environment, the pair promotes their work into the team's integration sandbox where they rebuild and rerun the tests, fixing any problems which they find. To update the database schema, the pair runs the appropriate change and migration scripts in the appropriate order.

This promotion strategy continues into your pre-production integration testing environment and then eventually into production. Depending on your need, you could implement and then deploy the refactoring within a single day, although more realistically it would be several months until the next major release of your application that you would deploy the refactoring along with any other updates that you've made.

After the transition period, you remove the original column plus the trigger(s), resulting in the final database schema of Figure 3. You remove these things only after sufficient testing to ensure that it is safe to do so. At this point, your refactoring is complete.

Figure 3. The final database schema for Customer.

There is a little more to successfully implementing a database refactoring than what I've described. You need a way to coordinate the refactoring efforts of all the development teams within your organization, clearly something that may prove quite difficult. You also need to get good at deploying refactorings in production, once again coordinating the efforts of several teams. In Refactoring Databases [3], my co-author Pramod Sadalage and I discuss several strategies for doing each of these things.

4. Why Not Just Get it Right to Begin With?

I am often told by existing data professionals that the real solution is to model everything up front, and then you would not need to refactor your database schema. Although that is an interesting vision, and I have seen it work in a few situations, experience from the past three decades has shown that this approach does not seem to be working well in practice for the overall IT community. The traditional approach to data modeling does not reflect the evolutionary approach of modern methods such as the RUP and XP, nor does it reflect the fact that business customers are demanding new features and changes to existing functionality at an accelerating rate. The old ways simply aren't sufficient any more, if they ever were [8].

I suggest that you take an Agile Model-Driven Development (AMDD) approach [9, 10], in which you do some high-level modeling to identify the overall "landscape" of your system, and then model storm the details on a just-in-time (JIT) basis. Take advantage of the benefits of modeling without suffering from the costs of over-modeling, over-documentation, and the resulting bureaucracy of trying to keep too many artifacts up-to-date and synchronized with one another. Your application code and your database schema evolve as your understanding of the problem domain evolves, and you maintain quality through refactoring both.

5. In Conclusion

Database refactoring is a database implementation technique, just like code refactoring is an application implementation technique. You refactor your database schema to ease additions to it. You often find that you have to add a new feature to a database, such as a new column or stored procedure, but the existing design is not the best one possible to easily support that new feature. You start by refactoring your database schema to make it easier to add the feature, and after the refactoring has been successfully applied, you then add the feature. The advantage of this approach is that you are slowly, but constantly, improving the quality of your database design. This process not only makes your database easier to understand and use, it also makes it easier to evolve over time; in other words, you improve your overall development productivity.

My experience is that data professionals can benefit from adopting modern evolutionary techniques similar to those of developers, and that database refactoring is one of several important skills that data professionals require. Evolutionary development has arguably become the norm within the IT community, and agile software development approaches extend evolutionary methods to become more effective. My advice to data professionals is to take evolutionary and agile concepts and techniques seriously: they're real, they work, and they're here to stay.

6. References and Recommended Reading

    1. Fowler, M. (1999). Refactoring: Improving the Design of Existing Code. Menlo Park, California: Addison Wesley Longman, Inc.
    2. Ambler, S.W. (2003). Agile Database Techniques: Effective Strategies for the Agile Software Developer. New York: John Wiley & Sons. www.ambysoft.com/books/agileDatabaseTechniques.html
    3. Ambler, S.W. and Sadalage, P.J. (2006). Refactoring Databases: Evolutionary Database Design. Boston: Addison Wesley. www.ambysoft.com/books/refactoringDatabases.html
    4. Larman, C. (2004). Agile and Iterative Development: A Manager's Guide. Boston: Addison-Wesley.
    5. Astels D. (2003). Test Driven Development: A Practical Guide. Upper Saddle River, NJ: Prentice Hall.
    6. Beck, K. (2003). Test Driven Development: By Example. Boston, MA: Addison Wesley.
    7. Ambler, S.W. (2004). Introduction to Test Driven Development (TDD). www.agiledata.org/essays/tdd.html
    8. Ambler, S.W. (2004). The Agile Data Home Page. www.agiledata.org.
    9. Ambler, S.W. (2002). Agile Modeling: Best Practices for the Unified Process and Extreme Programming. New York: John Wiley & Sons. www.ambysoft.com/books/agileModeling.html
    10. Ambler, S.W. Agile Model Driven Development (AMDD). www.agilemodeling.com/essays/amdd.htm

taiyi_interview(Introduction To Database Refactoring)的更多相关文章

  1. 数据库设计(二)Introduction to Database Design

    原文链接:http://www.datanamic.com/support/lt-dez005-introduction-db-modeling.html Introduction to Databa ...

  2. Database ORM

    Database ORM Introduction Basic Usage Mass Assignment Insert, Update, Delete Soft Deleting Timestamp ...

  3. Harvard数据库课程CS 265: Research Topics in Database Systems

    CS 265: Research Topics in Database Systems Announcements Quiz 3 will be posted. Good luck! Quiz 2 h ...

  4. Linux新手必看:浅谈如何学习linux

    本文在Creative Commons许可证下发布 一.起步 首先,应该为自己创造一个学习linux的环境--在电脑上装一个linux或unix问题1:版本的选择 北美用redhat,欧洲用SuSE, ...

  5. 新手学习Linux之快速上手分析

    一.起步 首先,应该为自己创造一个学习linux的环境--在电脑上装一个linux或unix 问题1:版本的选择 北美用redhat,欧洲用SuSE,桌面mandrake较多,而debian是技术最先 ...

  6. [转载] Linux新手必看:浅谈如何学习linux

    本文转自 https://www.cnblogs.com/evilqliang/p/6247496.html 本文在Creative Commons许可证下发布 一.起步 首先,应该为自己创造一个学习 ...

  7. (009)每日SQL学习:Oracle各个键说明(转)

    原文地址:http://www.agiledata.org/essays/keys.html 本文概述关系数据库中为表指定主键的策略.主要关注于何时使用自然键或者代理键的问题.有些人会告诉你应该总是使 ...

  8. Distributed Databases and Data Mining: Class timetable

    Course textbooks Text 1: M. T. Oszu and P. Valduriez, Principles of Distributed Database Systems, 2n ...

  9. MongoDB之bson的介绍

    MongoDB之bson的介绍 1. 什么是bson BSON是一种类json的一种二进制形式的存储格式,简称Binary JSON,它和JSON一样,支持内嵌的文档对象和数组对象,但是BSON有JS ...

随机推荐

  1. [LeetCode OJ] Max Points on a Line

    Max Points on a Line Submission Details 27 / 27 test cases passed. Status: Accepted Runtime: 472 ms ...

  2. linux基础2——gdb调试器

    gdb调试器使用的一般步骤(不断更新完善): 1.编译过程中要用-g参数来添加调试符号——gcc test.c -g: 2.gdb启动可执行文件——gdb a.out: 3.出现gdb符号表示启动成功 ...

  3. cmd执行mysql操作

    (以下已安装到本机的mysql为例) 登录mysql数据库,如果没有在环境变量配置path到mysql中的bin目录,需要手动进入该目录中 执行:mysql -u用户名 -p密码 (注意:只要进入了m ...

  4. idea添加内存

    在idea安装目录下的以下两个文件: 添加内存:

  5. iptables日志探秘

    iptables日志探秘 防火墙的主要功能除了其本身能进行有效控制网络访问之外,还有一个很重要的功能就是能清晰地记录网络上的访问,并自动生成日志进行保存.虽然日志格式会因防火墙厂商的不同而形态各异,但 ...

  6. Electronic Payment App analysis

    Electronic Payment App is getting more and more popular now. People don't have to bring credit cards ...

  7. 拒绝访问 temp 目录。用来运行 XmlSerializer 的标识“IIS APPPOOL\UGAS”没有访问 temp 目录的足够权限

    在部署IIS时候会出现下图错误,拒绝访问 temp 目录.用来运行 XmlSerializer 的标识“IIS APPPOOL\UGAS”没有访问 temp 目录的足够权限 解决方法: 在IIS信息管 ...

  8. centos环境自动化批量安装软件脚本

    自动化安装jdk软件部署脚本 准备工作: 1.在执行脚本的服务器上生成免密码公钥: 安装expect命令 yum install -y expect ssh-keygen 三次回车 2.将jdk-7u ...

  9. ASP.NET 常用状态(信息)保存方式

    服务器端:Session(会话)/ Application/ Caching(缓存)/DataBase(数据库) Session: Session: 常用于保存登录用户ID.验证码 不同的浏览器不能共 ...

  10. 01HTTP服务&AJAX编程

    HTTP服务&AJAX编程 一.服务器         1. 什么是服务器? 能够提供某种服务的机器(计算机)称为服务器. 2.服务器的分类:              1.按系统分类:Lin ...