Facebook Architecture
Facebook Architecture
Quora article
a relatively old presentation on facebook architecture
another InfoQ presentation on Facebook architecture / scale
Web frontend
- PHP
- HipHop
- HipHop Virtual Machine (HHVM)
- BigPipe to pipeline page rendering, by dividing the page into pagelet and pipeline.
- Vanish Cache for web caching
Business Logic
- service-oriented, exposed as service
- Thrift API
- multiple language bindings
- no need to worry about serialization / connection handling / threading
- support different server type: non-blocking, async, single-thread, multi-thread
- Java service uses a custom application server (not Tomcat or Jetty etc.)
Persistence
- MySQL, Memcached, Hadoop's HBase
- MySQL/Innodb used as key-value store, distributed / load-balanced to many instances
- global ID is assigned to user data (user info, wall posts, comments etc.)
- Blob data e.g. photos and videos, are handled separately
Logging
- Scribe, one instance on each host
- Scribe-HDFS for analytics
Photo
- first version is NFS-backed storage, served via HTTP
- Haystack, Facebook's object store for photos
- Haystack slides
- Massive CDN to cache/delivery data
- previously NFS-backed, but traditional POSIX file system incurs too much overhead which is not necessary: directory resolution, file metadata, inode etc.
- Haystack Store: 1 server's 10 TB storage is split into 100 "physical volumes"; physical volumes on different hosts are organized into "logical volumes", data are replicated within logical volume
- physical volume is simply a very large file (100 GB) mounted at /hay/haystack_/
- Haystack Cache: internal cache
- example of an image's URL:
http://<CDN>/<Cache>/<Machine id>/<Logical volume, Photo>
- Haystack Directory: metadata / mapping
- mapping and URL construction
- load balance among logical volumes for write, and load balance among physical volumes (within a specific logical volume) for read.
- XFS works best with Haystack
News Feed
- the system is called multifeed in FB
- Facebook News Feed: Social Data at Scale, and slides
- recent (2015) redesign to News Feed
- What is News Feed
- fetch recent activity from all your friends
- gather it in a central place
- group into stories
- rank stories by relevance etc.
- send back results
- Scale
- 10 billion / day
- 60ms average latency
- Fan-out-on-write vs. Fan-out-on-read
- fan-out-on-write i.e. push writes to your friend
- can cause so called write amplification
- what Twitter originally does (with some optimization later on users with many followers, Justin Bieber Problem..)
- fan-out-on-read i.e. fetch and aggregate at read time - what Facebook does
- flexibility on read-time aggregation (like what content to generate, bound the data volume)
- How it works
- incoming requests is sent from PHP layer to an "aggregator", which figures out users to query (e.g. a request from me will query for all my friends)
- a server named leaf node holds all activities of a number of users
- there're many many leaf nodes for such purpose, with partitioning / possibly replication
- data is then loaded from the corresponding leaf node, then rank/aggregate the data, and finally send the stories back.
- PHP layer gets back a list of "action ids", and queries memcached/MySQL to load content of the action (like a video, a post)
- a "tailer": input data pipelines user actions and feedbacks to a leaf node in realtime (e.g. when a user posts a new video)
Facebook Chat
- Chat Stability and Scalability
- channel server: receive a user's message, and send to the user's browser, written in Erlang
- presence server: whether a user is online or not - channel server pushes active users to presence server - written in C++
- lexical_cast causes memory allocation, when heap is fragmented, new malloc() will spend quite some CPU time on finding memory
Facebook Search
- Intro to facebook search
- Role: find a specific name/page in Facebook, e.g. a guy named "Bob", a band named "Johny"
- Ranking (relevance indicators)
- personal context;
- social context;
- query itself;
- global popularity
- challenges
- no query cache can be used;
- no locality in index (i.e. no hot index)
- Life of a Typeahead Query
- initial try: preload user's friends, pages, groups, applications, upcoming events into browser cache - and try to serve the search here
- request sent to aggregator (similar to News Feed's aggregator), which delegates to several leaf services
- Graph Search on people
- Graph Search on objects
- global objects - an index on all pages and applications on Facebook, no personalization - could be cached
- each leaf service returns some data, aggregator merges and ranks the result, and send to web tier
- result from aggregator are ids to resources, web-tier will load the data and send back to user's browser
Graph Search
- Unicorn: A System for Searching the Social Graph
- Under the Hood: Building out the infrastructure for Graph Search
- Under the Hood: Indexing and ranking in Graph Search
- Under the Hood: The natural language interface of Graph Search
- Under the Hood: Building posts search
- hisotry of facebook search
- keyword based search
- typeahead search, prefix-matching
- Unicorn is an inverted index system for many-to-many mapping. Difference with typical inverted index is that it not only indexes "documents" or entities like users/pages/groups/applications, but also search based on the edges (edge types) between nodes
- graph search natural language interface example: employers of my friends who live in New York
- input node: ME
ME --[friend-edge]--> my friends (who live in NY)
- load list of nodes connected by a specific edge-type to the input nodes, here edge-type is "friend-edge"[MY FRIENDS FROM NY]--[works-at-edge]--> employers
- "apply operator" i.e. "work-at" edge
- Indexing: performed as a combination of map-reduce jobs that collect data from Hive tables, process them and convert into inverted index data structures
- live udpates are streamed into the index via a separate live udpate pipeline.
- Graph Search components (Unicorn) - essentially an in-memory database with a query language interface
- Vertical - an unicorn instance - different entity types are kept in separate Unicorn verticals, e.g. USER Vertical, PAGES Vertical
- index server - part of a vertical, holds some of the index given the index is too large to fit into one single host
- Vertical Aggregator - broadcasts query to all verticals, and rank them
- because there're multiple Unicorn instances (Verticals), there's a TOP AGGREGATOR to on top of all vertical aggregators - which runs blending algorithm to blend result from each vertical
- Query Rewriting: parse the query into a structured Unicorn retrivial query, correct spelling, synonyms / segmentation etc.
- example: "restaurants liked by Facebook employees" gets converted to
273819889375819/places/20531316728/employees/places-liked/intersect
- Scoring to rank result (static ranking); then "Result set scoring" to score the result as a whole, and only return a subset (e.g. "photos of facebook employees" may contain too many photos from Mark Zuckerberg)
- Nested Queries: the structured query may be nested and need to be JOINed, e.g. "restaurants liked by Facebook employees"
- Query Suggestion: relies on a NLP module to identify what kinds of entity that may be (sri as in name vs. sri as in "people who live in Sri.."
- Machine Learning is used to adjust the "scoring function"
- How to evaluate Search algorithm changes
- CTR - click through rate
- DCG (discounted cumulative gain) - measures the usefulness (gain) of a result set, by considering the gain of each result in the set and the position of the result
- Natural Language Interface to Graph Search
- keywords as an interface is not good: nouns only, while connections in Facebook Graph data are verbs
- quite intensive content, see article
- Building Posts Search
- more than 1 billion posts added everyday
- Wormhole to listen on posts from MySQL store of posts
- much larger than other index types - stored in SSD instead of RAM
- trillions of posts, nobody can read all result - dynamically add optional clauses to bias the result towards what we think are more valuable to the user
Facebook Messages
- presentation in Hadoop Summit 2011
- Scaling the Messages Application Back End
- Inside Facebook Messages' Application Server
- The Underlying Technology of Messages
- HBase as main storage
- Database Layer: Master / Backup Master / Region Server [1..n]
- Storage Layer: Name node / secondary name node / Data node [1..n]
- Coordination Service: Zookeeper peers
- A user is sticky to an application server
- Cell: application server + HBase node
- 5 or more racks per cell, 20 servers per rack => more than 100 machine for a cell
- controllers (master nodes, zookeeper, name nodes) spread across racks
- User Directory Service: find cell for a given user
- A separate backup system - quick and dirty to me
- Use Scribe
- double logging to reduce loss - merge and dedup
- ability to restore
- quite some effort to make HBase more reliable, fail safe, and support real-time workload.
- action log - any updates to a user's mailbox is recorded into the action log - can be replayed for various purposes
- full text search - use Lucene to extract data and add to HBase, each keyword has its own column
- Testing via Dark Launch - mirror live traffic from Chat and Inbox into a test Messages cluster for about 10% of the users.
Configuration Management
- an 2015 paper on this topic
Facebook Architecture的更多相关文章
- facebook architecture 2 【转】
At the scale that Facebook operates, a lot of traditional approaches to serving web content breaks d ...
- 【转发】揭秘Facebook 的系统架构
揭底Facebook 的系统架构 www.MyException.Cn 发布于:2012-08-28 12:37:01 浏览:0次 0 揭秘Facebook 的系统架构 www.MyExcep ...
- Facebook的体系结构分析---外文转载
Facebook的体系结构分析---外文转载 From various readings and conversations I had, my understanding of Facebook's ...
- 【转】为什么很多看起来不是很复杂的网站,比如 Facebook、淘宝,都需要大量顶尖高手来开发?
先说你看到的页面上,最重要的几个:[搜索商品]——这个功能,如果你有几千条商品,完全可以用select * from tableXX where title like %XX%这样的操作来搞定.但是— ...
- Facebook MyRocks at MariaDB
Recently my colleague Rasmus Johansson announced that MariaDB is adding support for the Facebook MyR ...
- Facebook技术架构
Facebook MySQL,Multifeed (a custom distributed system which takes the tens of thousands of updates f ...
- Analyzing The Papers Behind Facebook's Computer Vision Approach
Analyzing The Papers Behind Facebook's Computer Vision Approach Introduction You know that company c ...
- 100 open source Big Data architecture papers for data professionals
zhuan :https://www.linkedin.com/pulse/100-open-source-big-data-architecture-papers-anil-madan Big Da ...
- Facebook 的系统架构(转)
来源:http://www.quora.com/What-is-Facebooks-architecture(由Micha?l Figuière回答) 根据我现有的阅读和谈话,我所理解的今天Faceb ...
随机推荐
- 在MyEclipse中设置Source folders和output folder
在一个项目中可能会有多个资源文件,它们共同编译输出到输出文件.那么除了默认的src以外,如何把其他文件设置成资源文件(Source folders)呢?
- An FPS counter.
本文由博主(YinaPan)原创,转载请注明出处:http://www.cnblogs.com/YinaPan/p/Unity_FPFCounter.html using UnityEngine; u ...
- Integer和int的详细比较(转)
Integer与int的区别我们耳熟详的有两点:1.Integer是int的包装类.2.Integer的默认初始值是null,而int的默认初试值是0. 下面通过代码进行详细比较. public cl ...
- MySQL如何执行关联查询
MySQL中‘关联(join)’ 一词包含的意义比一般意义上理解的要更广泛.总的来说,MySQL认为任何一个查询都是一次‘关联’ --并不仅仅是一个查询需要到两个表的匹配才叫关联,索引在MySQL中, ...
- idea intellij 快捷键(ubuntu版本)
S + C + T 创建测试类 A + F12 开启终端 C + F12 查看类中的方法属性 ----随时更新,记录快捷方式
- 转:十条不错的编程观点。(出处:酷 壳 – CoolShell.cn)
在Stack Overflow上有这样的一个贴子<What’s your most controversial programming opinion?>,翻译成中文就是“你认为最有争议的 ...
- css设置时父元素随子元素margin值移动
父元素的盒子包含一个子元素盒子,给子元素盒子一个垂直外边距margin-top,父元素盒子也会往下走margin-top的值,而子元素和父元素的边距则没有发生变化. HTML,CSS: <div ...
- js和jquery中有关透明度操作的问题
在日常开发的网站中,常常会用到设置透明度问题,最简单的就是图片的淡入淡出效果.下面我介绍一下在原生js和jQuery中设置透明度的相关问题和注意点: 1 透明度样式设置 透明度在IE浏览器 ...
- Jasper_table_Cloud not resolve style(s)
resolve method : delete style="".
- Dropbox可伸缩性设计最佳实践分享
http://www.infoq.com/cn/news/2012/11/dropbox-scale-bestpractice Dropbox的运维工程师Rajiv,跟大家分享了可伸缩性设计的最佳实践 ...