Facebook Architecture
Facebook Architecture
Quora article
a relatively old presentation on facebook architecture
another InfoQ presentation on Facebook architecture / scale
Web frontend
- PHP
- HipHop
- HipHop Virtual Machine (HHVM)
- BigPipe to pipeline page rendering, by dividing the page into pagelet and pipeline.
- Vanish Cache for web caching
Business Logic
- service-oriented, exposed as service
- Thrift API
- multiple language bindings
- no need to worry about serialization / connection handling / threading
- support different server type: non-blocking, async, single-thread, multi-thread
- Java service uses a custom application server (not Tomcat or Jetty etc.)
Persistence
- MySQL, Memcached, Hadoop's HBase
- MySQL/Innodb used as key-value store, distributed / load-balanced to many instances
- global ID is assigned to user data (user info, wall posts, comments etc.)
- Blob data e.g. photos and videos, are handled separately
Logging
- Scribe, one instance on each host
- Scribe-HDFS for analytics
Photo
- first version is NFS-backed storage, served via HTTP
- Haystack, Facebook's object store for photos
- Haystack slides
- Massive CDN to cache/delivery data
- previously NFS-backed, but traditional POSIX file system incurs too much overhead which is not necessary: directory resolution, file metadata, inode etc.
- Haystack Store: 1 server's 10 TB storage is split into 100 "physical volumes"; physical volumes on different hosts are organized into "logical volumes", data are replicated within logical volume
- physical volume is simply a very large file (100 GB) mounted at /hay/haystack_/
- Haystack Cache: internal cache
- example of an image's URL:
http://<CDN>/<Cache>/<Machine id>/<Logical volume, Photo> - Haystack Directory: metadata / mapping
- mapping and URL construction
- load balance among logical volumes for write, and load balance among physical volumes (within a specific logical volume) for read.
- XFS works best with Haystack
News Feed
- the system is called multifeed in FB
- Facebook News Feed: Social Data at Scale, and slides
- recent (2015) redesign to News Feed
- What is News Feed
- fetch recent activity from all your friends
- gather it in a central place
- group into stories
- rank stories by relevance etc.
- send back results
- Scale
- 10 billion / day
- 60ms average latency
- Fan-out-on-write vs. Fan-out-on-read
- fan-out-on-write i.e. push writes to your friend
- can cause so called write amplification
- what Twitter originally does (with some optimization later on users with many followers, Justin Bieber Problem..)
- fan-out-on-read i.e. fetch and aggregate at read time - what Facebook does
- flexibility on read-time aggregation (like what content to generate, bound the data volume)
- How it works
- incoming requests is sent from PHP layer to an "aggregator", which figures out users to query (e.g. a request from me will query for all my friends)
- a server named leaf node holds all activities of a number of users
- there're many many leaf nodes for such purpose, with partitioning / possibly replication
- data is then loaded from the corresponding leaf node, then rank/aggregate the data, and finally send the stories back.
- PHP layer gets back a list of "action ids", and queries memcached/MySQL to load content of the action (like a video, a post)
- a "tailer": input data pipelines user actions and feedbacks to a leaf node in realtime (e.g. when a user posts a new video)
Facebook Chat
- Chat Stability and Scalability
- channel server: receive a user's message, and send to the user's browser, written in Erlang
- presence server: whether a user is online or not - channel server pushes active users to presence server - written in C++
- lexical_cast causes memory allocation, when heap is fragmented, new malloc() will spend quite some CPU time on finding memory
Facebook Search
- Intro to facebook search
- Role: find a specific name/page in Facebook, e.g. a guy named "Bob", a band named "Johny"
- Ranking (relevance indicators)
- personal context;
- social context;
- query itself;
- global popularity
- challenges
- no query cache can be used;
- no locality in index (i.e. no hot index)
- Life of a Typeahead Query
- initial try: preload user's friends, pages, groups, applications, upcoming events into browser cache - and try to serve the search here
- request sent to aggregator (similar to News Feed's aggregator), which delegates to several leaf services
- Graph Search on people
- Graph Search on objects
- global objects - an index on all pages and applications on Facebook, no personalization - could be cached
- each leaf service returns some data, aggregator merges and ranks the result, and send to web tier
- result from aggregator are ids to resources, web-tier will load the data and send back to user's browser
Graph Search
- Unicorn: A System for Searching the Social Graph
- Under the Hood: Building out the infrastructure for Graph Search
- Under the Hood: Indexing and ranking in Graph Search
- Under the Hood: The natural language interface of Graph Search
- Under the Hood: Building posts search
- hisotry of facebook search
- keyword based search
- typeahead search, prefix-matching
- Unicorn is an inverted index system for many-to-many mapping. Difference with typical inverted index is that it not only indexes "documents" or entities like users/pages/groups/applications, but also search based on the edges (edge types) between nodes
- graph search natural language interface example: employers of my friends who live in New York
- input node: ME
ME --[friend-edge]--> my friends (who live in NY)- load list of nodes connected by a specific edge-type to the input nodes, here edge-type is "friend-edge"[MY FRIENDS FROM NY]--[works-at-edge]--> employers- "apply operator" i.e. "work-at" edge
- Indexing: performed as a combination of map-reduce jobs that collect data from Hive tables, process them and convert into inverted index data structures
- live udpates are streamed into the index via a separate live udpate pipeline.
- Graph Search components (Unicorn) - essentially an in-memory database with a query language interface
- Vertical - an unicorn instance - different entity types are kept in separate Unicorn verticals, e.g. USER Vertical, PAGES Vertical
- index server - part of a vertical, holds some of the index given the index is too large to fit into one single host
- Vertical Aggregator - broadcasts query to all verticals, and rank them
- because there're multiple Unicorn instances (Verticals), there's a TOP AGGREGATOR to on top of all vertical aggregators - which runs blending algorithm to blend result from each vertical
- Query Rewriting: parse the query into a structured Unicorn retrivial query, correct spelling, synonyms / segmentation etc.
- example: "restaurants liked by Facebook employees" gets converted to
273819889375819/places/20531316728/employees/places-liked/intersect - Scoring to rank result (static ranking); then "Result set scoring" to score the result as a whole, and only return a subset (e.g. "photos of facebook employees" may contain too many photos from Mark Zuckerberg)
- Nested Queries: the structured query may be nested and need to be JOINed, e.g. "restaurants liked by Facebook employees"
- Query Suggestion: relies on a NLP module to identify what kinds of entity that may be (sri as in name vs. sri as in "people who live in Sri.."
- Machine Learning is used to adjust the "scoring function"
- How to evaluate Search algorithm changes
- CTR - click through rate
- DCG (discounted cumulative gain) - measures the usefulness (gain) of a result set, by considering the gain of each result in the set and the position of the result
- Natural Language Interface to Graph Search
- keywords as an interface is not good: nouns only, while connections in Facebook Graph data are verbs
- quite intensive content, see article
- Building Posts Search
- more than 1 billion posts added everyday
- Wormhole to listen on posts from MySQL store of posts
- much larger than other index types - stored in SSD instead of RAM
- trillions of posts, nobody can read all result - dynamically add optional clauses to bias the result towards what we think are more valuable to the user
Facebook Messages
- presentation in Hadoop Summit 2011
- Scaling the Messages Application Back End
- Inside Facebook Messages' Application Server
- The Underlying Technology of Messages
- HBase as main storage
- Database Layer: Master / Backup Master / Region Server [1..n]
- Storage Layer: Name node / secondary name node / Data node [1..n]
- Coordination Service: Zookeeper peers
- A user is sticky to an application server
- Cell: application server + HBase node
- 5 or more racks per cell, 20 servers per rack => more than 100 machine for a cell
- controllers (master nodes, zookeeper, name nodes) spread across racks
- User Directory Service: find cell for a given user
- A separate backup system - quick and dirty to me
- Use Scribe
- double logging to reduce loss - merge and dedup
- ability to restore
- quite some effort to make HBase more reliable, fail safe, and support real-time workload.
- action log - any updates to a user's mailbox is recorded into the action log - can be replayed for various purposes
- full text search - use Lucene to extract data and add to HBase, each keyword has its own column
- Testing via Dark Launch - mirror live traffic from Chat and Inbox into a test Messages cluster for about 10% of the users.
Configuration Management
- an 2015 paper on this topic
Facebook Architecture的更多相关文章
- facebook architecture 2 【转】
At the scale that Facebook operates, a lot of traditional approaches to serving web content breaks d ...
- 【转发】揭秘Facebook 的系统架构
揭底Facebook 的系统架构 www.MyException.Cn 发布于:2012-08-28 12:37:01 浏览:0次 0 揭秘Facebook 的系统架构 www.MyExcep ...
- Facebook的体系结构分析---外文转载
Facebook的体系结构分析---外文转载 From various readings and conversations I had, my understanding of Facebook's ...
- 【转】为什么很多看起来不是很复杂的网站,比如 Facebook、淘宝,都需要大量顶尖高手来开发?
先说你看到的页面上,最重要的几个:[搜索商品]——这个功能,如果你有几千条商品,完全可以用select * from tableXX where title like %XX%这样的操作来搞定.但是— ...
- Facebook MyRocks at MariaDB
Recently my colleague Rasmus Johansson announced that MariaDB is adding support for the Facebook MyR ...
- Facebook技术架构
Facebook MySQL,Multifeed (a custom distributed system which takes the tens of thousands of updates f ...
- Analyzing The Papers Behind Facebook's Computer Vision Approach
Analyzing The Papers Behind Facebook's Computer Vision Approach Introduction You know that company c ...
- 100 open source Big Data architecture papers for data professionals
zhuan :https://www.linkedin.com/pulse/100-open-source-big-data-architecture-papers-anil-madan Big Da ...
- Facebook 的系统架构(转)
来源:http://www.quora.com/What-is-Facebooks-architecture(由Micha?l Figuière回答) 根据我现有的阅读和谈话,我所理解的今天Faceb ...
随机推荐
- MySQL REPLACE替换输出
原输出: [root@ARPGTest ~]# mysql -p`cat /data/save/mysql_root` pro_manager -e'select erlang_script,sql_ ...
- Java实现断点下载Demo
//1.声明URL String path="http://localhost:8080/day22_DownLoad/file/a.rmvb"; URL url=new URL( ...
- nginx实现负载均衡
A服务器IP :192.168.5.149 (主) B服务器IP :192.168.5.27 C服务器IP :192.168.5.126 A服务器配置: 打开nginx.conf,文件位置在nginx ...
- zabbix log(logrt) key的使用
今天看了篇帖子是关于如何利用zabbix 自带的key去读log,监控linux异常登陆,一直以来都是自己写脚本去读log的.就想看看这个zabbix log 这个key怎么样..好吧开始: 官方文档 ...
- (转)Eclipse快捷键大全,导包快捷键:ctrl+Shift+/
Ctrl+1 快速修复(最经典的快捷键,就不用多说了)Ctrl+D: 删除当前行 Ctrl+Alt+↓ 复制当前行到下一行(复制增加)Ctrl+Alt+↑ 复制当前行到上一行(复制增加)Alt+↓ 当 ...
- ASP.NET常用技术之加密解密
在开发项目中有许多数据需要我们进行加密解密操作,这里介绍几个加密解密的方法. 一:MD5加密 MD5加密是一种单向的加密算法,它只能加密,加密后不能进行逆向解密操作,常用于数字签名和加密用户密码. 下 ...
- phpcms v9 读取地区联动菜单缓存文件
读取缓存文件的方法是 getcache() 在 phpcms\libs\functions\global.func.php 中可找到. 地区联动菜单的缓存文件是 caches\caches_link ...
- python 的内置函数(1)
19.内置函数: abs():求绝对值 bool():求一个值是True or False ,其中False值有 0 ,空字符串'',None,空 ...
- [转]100个经典C语言程序(益智类问题)
目录: 1.绘制余弦曲线 2.绘制余弦曲线和直线 3.绘制圆 4.歌星大奖赛 5.求最大数 6.高次方数的尾数 8.借书方案知多少 9.杨辉三角形 10.数制转换 11.打鱼还是晒网 12.抓交通肇事 ...
- 仿QQ5.0以上新版本侧滑效果
1.此效果使用了csdn大神孙国威的代码案例在此感谢附上参考博客地址: http://blog.csdn.net/manoel/article/details/39013095/#plain 2.sl ...