https://gigaom.com/2011/12/06/facebook-shares-some-secrets-on-making-mysql-scale/

Facebook shares some secrets on making MySQL scale

Derrick Harris Dec 6, 2011 - 1:00 PM CST

When you’re storing every transaction for 800 million users and handling more than 60 million queries per second, your database environment had better be something special. Many readers might see these numbers and think NoSQL, but Facebook held a Tech Talk on Monday night explaining how it built a MySQL (s orcl) environment capable of handling everything the company needs in terms of scale, performance and availability.

Over the summer, I reported on Michael Stonebraker’s stance that Facebook is trapped in a MySQL “fate worse than death” because of its reliance on an outdated database paired with a complex sharding and caching strategy (read the comments and this follow-up post for a bevy of opinions on the validity of Stonebraker’s stance on SQL). Facebook declined an official comment at the time, but last night’s night talk proved to me that Stonebraker (and I) might have been wrong.

 

Keeping up with performance

Kicking off the event, Facebook’s Domas Mituzas shared some stats that illustrate the importance of its MySQL user database:

  • MySQL handles pretty much every user interaction: likes, shares, status updates, alerts, requests, etc.
  • Facebook has 800 million users; 500 million of them visit the site daily.
  • 350 million mobile users are constantly pushing and pulling status updates
  • 7 million applications and web sites are integrated into the Facebook platform
  • User data sets are made even larger by taking into account both scope and time

And, as Mituzas pointed out, everything on Facebook is social, so every action has a ripple effect that spreads beyond that specific user. “It’s not just about me accessing some object,” he said. “It’s also about analyzing and ranking through that include all my friends’ activities.” The result (although Mituzas noted these numbers are somewhat outdated) is 60 million queries per second, and nearly 4 million row changes per second.

Facebook shards, or splits its database into numerous distinct sections, because of the sheer volume of the data it stores (a number it doesn’t share), but it caches extensively in order to write all these transactions in a hurry. In fact, most queries (more than 90 percent) never hit the database at all but only touch the cache layer. Facebook relies heavily on the open-source memcached MySQL caching tool, as well as it custom-built Flashcache module for caching data on solid-state drives.

Keeping up with scale

Speaking of drives, and hardware generally, Facebook’s Mark Konetchy took the stage after Mituzas to share some data points on the growth of Facebook’s MySQL infrastructure. Although he made sure to point out that the “buzzkills at legal” won’t let him share actual numbers, he was able to point to 3x server growth across all data centers over the past two years, 7x growth in raw user data, and 20x growth in all user data (which includes replicated data). The median data-set size per physical host has increased almost 5x since Jan. 2010, and maximum data-set size per host has increased 10x.

Konetchy credits the ability to store so much more data per host on software-performance improvements made by Facebook’s MySQL team, as well as on better server technology. Facebook’s MySQL user database is composed of approximately 60 percent hard disk drives, 20 percent SSDs and 10 percent hybrid HDD-plus-SSD servers running Flashcache.

However, Facebook wants to buy fewer servers while still improving MySQL performance. Looking forward, Konetchy said some primary objectives are to automate the splitting of large data sets onto underutilized hardware, to improve MySQL compression and to move more data to the Hadoop-based HBase data store when appropriate. NoSQL databases such as HBase (which powers Facebook Messages) weren’t really around when Facebook built its MySQL environment, so there likely are unstructured or semistructured data currently in MySQL that are better suited for HBase.

With all this growth, why MySQL?

The logical question when one sees rampant growth and performance requirements like this is “Why stick with MySQL?”. As Stonebraker pointed out over the summer, both NoSQL and NewSQL are arguably better suited to large-scale web applications than is MySQL. Perhaps, but Facebook begs to differ.

Facebook’s Mark Callaghan, who spent eight years as a “principal member of the technical staff” at Oracle (s orcl) , explained that using open-source software lets Facebook operate with “orders of magnitude” more machines than people, which means lots of money saved on software licenses and lots of time put into working on new features (many of which, including the rather-cool Online Schema Change, are discussed in the talk).

Additionally, he said, the patch and update cycles at companies like Oracle are far slower than what Facebook can get by working on issues internally and with an open-source community. The same holds true for general support issues, which Facebook can resolve itself in hours instead of waiting days for commercial support.

On the performance front, Callaghan noted, Facebook might find some interesting things if large vendors allowed it to benchmark their products. But they won’t, and they won’t let Facebook publish the results, so MySQL it is. Plus, he said, you actually can tune MySQL to perform very fast per node if you know what you’re doing — and Facebook has the best MySQL team around. That also helps keep costs down because it requires fewer servers.

Callaghan was more open to using NoSQL databases, but said they’re still not quite ready for primetime, especially for mission-critical workloads such as Facebook’s user database. The implementations just aren’t as mature, he said, and there are no published cases of NoSQL databases operating at the scale of Facebook’s MySQL database. And, Callaghan noted, the HBase engineering team at Facebook is quite a bit larger than the MySQL engineering team, suggesting that tuning HBase to meet Facebook’s needs is more resource-intensive process than is tuning MySQL at this point.

The whole debate about Facebook and MySQL was never really about whether it should be using it, but rather about how much work it has put into MySQL to make it work at Facebook scale. The answer, clearly, is a lot, but Facebook seems to have it down to an art at this point, and everyone appears pretty content with what they have in place and how they plan to improve it. It doesn’t seem like a fate worse than death, and if it had to start from scratch, I don’t get the impression Facebook would do too much differently, even with the new database offerings available today.

most queries (more than 90 percent) never hit the database at all but only touch the cache layer的更多相关文章

  1. loadrunner_analysis技巧_average 和 90% percent

    “90% Percent Time” 表示90%的事务response time 都维持在某个值附近,不是 average response time * 90%;  “Average Time” 平 ...

  2. Monitor All SQL Queries in MySQL (alias mysql profiler)

    video from youtube: http://www.youtube.com/watch?v=79NWqv3aPRI one blog post: Monitor All SQL Querie ...

  3. squid源码安装下的conf文件默认值和提示

    #    WELCOME TO SQUID 3.0.STABLE26#    ----------------------------##    This is the default Squid c ...

  4. Java性能提示(全)

    http://www.onjava.com/pub/a/onjava/2001/05/30/optimization.htmlComparing the performance of LinkedLi ...

  5. Inside TSQL Querying - Chapter 3. Query Tuning

    Tuning Methodology When dealing with performance problems, database professionals tend to focus on t ...

  6. Thinking Clearly about Performance

    http://queue.acm.org/detail.cfm?id=1854041 The July/August issue of acmqueue is out now acmqueue is ...

  7. Oracle12c版本中未归档隐藏参数

    In this post, I will give a list of all undocumented parameters in Oracle 12.1.0.1c. Here is a query ...

  8. Oracle11g版本中未归档隐藏参数

    In this post, I will give a list of all undocumented parameters in Oracle 11g. Here is a query to se ...

  9. Unity 5 Game Optimization (Chris Dickinson 著)

    1. Detecting Performance Issues 2. Scripting Strategies 3. The Benefits of Batching 4. Kickstart You ...

随机推荐

  1. 【转】fastdb中的数据字典

    在程序的启动过程中,第一项任务,在没执行main 函数之前,通过REGISTER宏定义,把表的结构存储在一个全局变量的列表中static dbTableDescriptor* chain,由于此时数据 ...

  2. 全排列 UVA 11525 Permutation

    题目传送门 题意:训练指南P248 分析:逆向考虑,比如一个全排列:7345261,它也可以表示成题目中的形式,第一个数字7是由6 * (7 - 1)得到的,第二个数字3有2 * (7 - 2)得到, ...

  3. 使用递推解题:EOJ2999

    题目: Description 给定一个多项式 (ax+by)k,计算多项式展开后 xnym 项的系数. Input 第1行:一个整数T(1≤T≤10)为问题数. 接下来共T行.每行5个整数,分别为a ...

  4. unity MenuAnim.MoveTo

    移动函数,第一个参数是gameobject,第二个参数是pos,第三个参数是时间,第四个参数延迟 MenuAnim.MoveTo(title, new Vector3(1, 0.7f, 0), 0.5 ...

  5. git 回滚

    git reset --hard HEAD~10 可以通过上面的命令会退到最初的版本查看源代码, git reset --hard 4aa9a32d1625997ef5b28463ccde78d711 ...

  6. wamp 2.5 开放访问权限和设置虚拟域名

    开放访问权限 D:\wamp\bin\apache\apache2.4.9\conf  里的 httpd.conf 搜索www   把 Require local 改为 Require all gra ...

  7. BZOJ 1029 & 丝帛贪心

    题意: 小刚在玩JSOI提供的一个称之为“建筑抢修”的电脑游戏:经过了一场激烈的战斗,T部落消灭了所有z部落的入侵者.但是T部落的基地里已经有N个建筑设 施受到了严重的损伤,如果不尽快修复的话,这些建 ...

  8. HTTP协议---HTTP请求中的常用请求字段和HTTP的响应状态码及响应头

    http://blog.csdn.net/qxs965266509/article/details/8082810 用于HTTP请求中的常用请求头字段 Accept:用于高速服务器,客户机支持的数据类 ...

  9. 开篇&TexturePacker打出图集给UGUI使用

    开篇: 前段时间,网上流出了一套手游源码,本想着把服务器端搭一下,给自己认识小伙伴们调试着把这套源码学习一下.于是就买一个阿里云服务器,可是花了几天时间,就是run不起来了啊.还好网上已经有人搭出来了 ...

  10. libc abi.dylib: terminate_handler unexpectedly threw an exception

    错误代码:很明显的错误,一定要谨记. - (NSInteger)giftCountFullScreen{ NSArray *arr = [NSMutableArray arrayWithArray:s ...