w分布式查询、数据聚合、跨碎片join是可且应避免的、自增主键管理、基于-会话/事务/语句-选择碎片、通过-主键/模块/碎片索引-碎片化数据

http://www.agildata.com/database-sharding/

Database Sharding Challenges

Due to the distributed nature of individual databases, a number of key elements must be taken into account:

  • Reliability. First and foremost, any production business application must be reliable and fault-tolerant, and cannot be subject to frequent outages. The database tier is often the single most critical element in any reliability design, and therefore an implementation of Database Sharding is no exception. In fact, due to the distributed nature of multiple shard databases, the criticality of a well-designed approach is even greater. To ensure a fault-tolerant and reliable approach, the following items are required:

    • Automated backups of individual Database Shards.
    • Database Shard redundancy, ensuring at least 2 “live” copies of each shard are available in the event of an outage or server failure. This requires a high-performance, efficient, and reliable replication mechanism.
    • Cost-effective hardware redundancy, both within and across servers.
    • Automated failover when an outage or server failure occurs.
    • Disaster Recovery site management.
  • Distributed queries. Many types of queries can be processed far faster using distributed queries, performing parallel processing of interim results on each shard server. This technique can achieve order-of-magnitude improvements in performance, in many cases 10X or more. To enable distributed queries in a seamless manner for the application, it is important to have a facility that can process a segment of the query on each individual shard, and then consolidate the results into a single result set for the application tier. Common queries that can benefit from distributed processing are:
    • Aggregation of statistics, requiring a broad sweep of data across the entire system. Such an example is the computation of sales by product, which ordinarily requires evaluation of the entire database.
    • Queries that support comprehensive reports, such as listings of all individual customers that purchased a given product in the last day, week or month.
  • Avoidance of cross-shard joins. In a sharded system, queries or other statements that use inner-joins that span shards are highly inefficient and difficult to perform. In the majority of cases, it has been found that such inner-joins are not actually required by an application, so long as the correct techniques are applied. The primary technique is the replication of Global Tables, the relatively static lookup tables that are common utilized when joining to much larger primary tables. Tables containing values as Status Codes, Countries, Types, and even Products fall into this category. What is required is an automated replication mechanism that ensures values for Global Tables are in synch across all shards, minimizing or eliminating the need for cross-shard joins.
  • Auto-increment key management. Typical auto-increment functionality provided by database management systems generate a sequential key for each new row inserted into the database. This is fine for a single database application, but when using Database Sharding, keys must be managed across all shards in a coordinated fashion. The requirement here is to provide a seamless, automated method of key generation to the application, one that operates across all shards, ensuring that keys are unique across the entire system.
  • Support for multiple Shard Schemes. It is important to note that Database Sharding is effective because it offers an application specific technique for massive scalability and performance improvements. In fact it can be said that the degree of effectiveness is directly related to how well the sharding algorithms themselves are tailored to the application problem at hand. What is required is a set of multiple, flexible shard schemes, each designed to address a specific type of application problem. Each scheme has inherent performance and/or application characteristics and advantages when applied to a specific problem domain. In fact, using the wrong shard scheme can actually inhibit performance and the very results you are trying to obtain. It is also not uncommon for a single application to use more than one shard scheme, each applied to a specific portion of the application to achieve optimum results. Here is a list of some common shard schemes:
    • Session-based sharding, where each individual user or process interacts with a specific shard for the duration of the user or process session. This is the simplest technique to implement, and adds virtually zero overhead to overall performance, since the sharding decision is made only once per session. Applications which can benefit from this approach are often customer-centric, where all data for a given customer is contained in a single shard, and that is all the data that the customer requires.
    • Transaction-based sharding determines the shard by examining the first SQL Statement in a given database transaction. This is normally done by evaluating the “shard key” value used in the statement (such as an Order Number), and then directing all other statements in the transaction to the same shard.
    • Statement-based sharding is the most process intensive of all types, evaluating each individual SQL Statement to determine the appropriate shard to direct it to. Again, evaluation of the shard key value is required. This option is often desirable on high-volume, granular transactions, such as recording phone call records.
  • Determine the optimum method for sharding the data. This is another area that is highly variable, change from application to application. It is closely tied with the selection of the Database Shard Scheme described above. There are numerous methods for deciding how to shard your data, and its important to understand your transaction rates, table volumes, key distribution, and other characteristics of your application. This data is required to determine the optimum sharding strategy:
    • Shard by a primary key on a table. This is the most straightforward option, and easiest to map to a given application. However, this is only effective if your data is reasonably well distributed. For example, if you elected to shard by Customer ID (and this is a sequential numeric value), and most of your transactions are for new customers, very little if anything will be gained by sharding your database. On the other hand, if you can select a key that does adequately and naturally distribute your transactions, great benefits can be realized.
    • Shard by the modulus of a key value. This option works in a vast number of cases, by applying the modulus function to the key value, and distributing transactions based on the calculated value. In essence you can predetermine any number of shards, and the modulus function effectively distributes across your shards on a “round-robin” basis, creating a very even distribution of new key values.
    • Maintain a master shard index table. This technique involves using a single master table that maps various values to specific shards. It is very flexible, and meets a wide variety of application situations. However, this option often delivers lower performance as it requires an extra lookup for each sharded SQL Statement.

As you can see, there are many things to consider and many capabilities required in order to ensure that a Database Sharding implementation is successful and effective, delivering on its objectives of providing new levels of scalability and performance in a cost-effective manner.

Database Sharding Challenges DATABASE SHARDING的更多相关文章

  1. Azure SQL Database (19) Stretch Database 概览

    <Windows Azure Platform 系列文章目录>  Azure SQL Database (19) Stretch Database 概览      Azure SQL Da ...

  2. 使用duplicate target database ... from active database复制数据库

    使用duplicate target database ... from active database复制数据库 source db:ora11auxiliary db:dupdb 1.修改监听文件 ...

  3. Oracle® Database Patch 19121551 - Database Patch Set Update 11.2.0.4.4 (Includes CPUOct2014) - 傲游云浏览

    Skip Headers Oracle® Database Patch 19121551 - Database Patch Set Update 11.2.0.4.4 (Includes CPUOct ...

  4. Oracle Database 12c Using duplicate standby database from active database Created Active DataGuard

    primary database db_name=zwc, db_unique_name=zwc standby database db_name=zwc, db_unique_name=standb ...

  5. Teradata Delete Database and Drop Database

    DELETE DATABASE and DELETE USER statements delete all data tables, views, and macros from a database ...

  6. Cannot connect to database because the database client

    问题描述: arcgis server10.1  arcgis sde10出现下面问题 Cannot connect to  database because the database client ...

  7. Database Partitioning Options DATABASE SHARDING

    w主写从读.集群节点间时时内存复制.单表横切纵切.分析报表系统通过服务器联表 http://www.agildata.com/database-sharding/ Database Partition ...

  8. Database Corruption ->> Fix Database In Suspect State

    昨天在工作中遇到一个情况,就是Development环境中的某台服务器上的某个数据库进入了Suspect状态.以前看书倒是知道说这个状态,不过实际工作当中从来没有遇到过.那么一些背景情况是这样的. 环 ...

  9. What is the difference between database table and database view?

    The database table has a physical existence in the database. A view is a virtual table, that is one ...

随机推荐

  1. 在系统中使用read函数读取文件内容

    read函数(读取文件) read函数可以读取文件.读取文件指从某一个已打开地文件中,读取一定数量地字符,然后将这些读取的字符放入某一个预存的缓冲区内,供以后使用. 使用格式如下: number = ...

  2. java反射详解 (转至 http://www.cnblogs.com/rollenholt/archive/2011/09/02/2163758.html)

    本篇文章依旧采用小例子来说明,因为我始终觉的,案例驱动是最好的,要不然只看理论的话,看了也不懂,不过建议大家在看完文章之后,在回过头去看看理论,会有更好的理解. 下面开始正文. [案例1]通过一个对象 ...

  3. c# 异步编程demo (async await)

    using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.W ...

  4. postman从入门到精通

    今天总监让我给测试同事们培训postman,使用过postman的朋友应该知道,这个简直就是前后端接口调试神器.根据平时的经验以及自己到网上看了相关的帖子,对于postman又有了新的认识. post ...

  5. Oracle Net Manager 服务命名配置以及用PL/SQL 登陆数据库

    我们知道,要连接一个数据库需要知道四个参数: 1. 登陆用户名:user: 2. 登录密码:password: 3. 存放数据库的服务器地址(server_ip)和端口(server_port): 4 ...

  6. HBase学习之深入理解Memstore-6

      MemStore是HBase非常重要的组成部分,深入理解MemStore的运行机制.工作原理.相关配置,对HBase集群管理以及性能调优有非常重要的帮助. HBase Memstore 首先通过简 ...

  7. jQuery为动态生成的select元素添加事件的方法

    项目中需要在点击按钮时动态生成select元素,为防止每次点击按钮时从服务器端获取数据(因为数据都是相同的),可以这样写代码 1.首先定义全局js变量 var strVoucherGroupSelec ...

  8. html5实现刮刮卡效果

    通过Canvas实现的可刮涂层效果. 修改img.src时涂层也会自动适应新图片的尺寸. 修改layer函数可更改涂层样式. 涂层: 可刮效果: 以下是HTML源代码(已增加移动设备支持): 1 2 ...

  9. DoBox 下载

    DoBox下载 一款简单十分好用的办公助手,用于记录您接下来需要做的事情.待办事项小工具 - DoBox DoBox下载 下载地址:http://www.wxzzz.com/?id=141 最新版本: ...

  10. IOS -执行时 (消息传递 )

    一 函数调用概述      Objective-C不支持多重继承(同Java和Smalltalk),而C++语言支持多重继承. Objective-C是动态绑定,它的类库比C++要easy操作. Ob ...