Facebook Architecture
Facebook Architecture
Quora article
a relatively old presentation on facebook architecture
another InfoQ presentation on Facebook architecture / scale
Web frontend
- PHP
- HipHop
- HipHop Virtual Machine (HHVM)
- BigPipe to pipeline page rendering, by dividing the page into pagelet and pipeline.
- Vanish Cache for web caching
Business Logic
- service-oriented, exposed as service
- Thrift API
- multiple language bindings
- no need to worry about serialization / connection handling / threading
- support different server type: non-blocking, async, single-thread, multi-thread
- Java service uses a custom application server (not Tomcat or Jetty etc.)
Persistence
- MySQL, Memcached, Hadoop's HBase
- MySQL/Innodb used as key-value store, distributed / load-balanced to many instances
- global ID is assigned to user data (user info, wall posts, comments etc.)
- Blob data e.g. photos and videos, are handled separately
Logging
- Scribe, one instance on each host
- Scribe-HDFS for analytics
Photo
- first version is NFS-backed storage, served via HTTP
- Haystack, Facebook's object store for photos
- Haystack slides
- Massive CDN to cache/delivery data
- previously NFS-backed, but traditional POSIX file system incurs too much overhead which is not necessary: directory resolution, file metadata, inode etc.
- Haystack Store: 1 server's 10 TB storage is split into 100 "physical volumes"; physical volumes on different hosts are organized into "logical volumes", data are replicated within logical volume
- physical volume is simply a very large file (100 GB) mounted at /hay/haystack_/
- Haystack Cache: internal cache
- example of an image's URL:
http://<CDN>/<Cache>/<Machine id>/<Logical volume, Photo> - Haystack Directory: metadata / mapping
- mapping and URL construction
- load balance among logical volumes for write, and load balance among physical volumes (within a specific logical volume) for read.
- XFS works best with Haystack
News Feed
- the system is called multifeed in FB
- Facebook News Feed: Social Data at Scale, and slides
- recent (2015) redesign to News Feed
- What is News Feed
- fetch recent activity from all your friends
- gather it in a central place
- group into stories
- rank stories by relevance etc.
- send back results
- Scale
- 10 billion / day
- 60ms average latency
- Fan-out-on-write vs. Fan-out-on-read
- fan-out-on-write i.e. push writes to your friend
- can cause so called write amplification
- what Twitter originally does (with some optimization later on users with many followers, Justin Bieber Problem..)
- fan-out-on-read i.e. fetch and aggregate at read time - what Facebook does
- flexibility on read-time aggregation (like what content to generate, bound the data volume)
- How it works
- incoming requests is sent from PHP layer to an "aggregator", which figures out users to query (e.g. a request from me will query for all my friends)
- a server named leaf node holds all activities of a number of users
- there're many many leaf nodes for such purpose, with partitioning / possibly replication
- data is then loaded from the corresponding leaf node, then rank/aggregate the data, and finally send the stories back.
- PHP layer gets back a list of "action ids", and queries memcached/MySQL to load content of the action (like a video, a post)
- a "tailer": input data pipelines user actions and feedbacks to a leaf node in realtime (e.g. when a user posts a new video)
Facebook Chat
- Chat Stability and Scalability
- channel server: receive a user's message, and send to the user's browser, written in Erlang
- presence server: whether a user is online or not - channel server pushes active users to presence server - written in C++
- lexical_cast causes memory allocation, when heap is fragmented, new malloc() will spend quite some CPU time on finding memory
Facebook Search
- Intro to facebook search
- Role: find a specific name/page in Facebook, e.g. a guy named "Bob", a band named "Johny"
- Ranking (relevance indicators)
- personal context;
- social context;
- query itself;
- global popularity
- challenges
- no query cache can be used;
- no locality in index (i.e. no hot index)
- Life of a Typeahead Query
- initial try: preload user's friends, pages, groups, applications, upcoming events into browser cache - and try to serve the search here
- request sent to aggregator (similar to News Feed's aggregator), which delegates to several leaf services
- Graph Search on people
- Graph Search on objects
- global objects - an index on all pages and applications on Facebook, no personalization - could be cached
- each leaf service returns some data, aggregator merges and ranks the result, and send to web tier
- result from aggregator are ids to resources, web-tier will load the data and send back to user's browser
Graph Search
- Unicorn: A System for Searching the Social Graph
- Under the Hood: Building out the infrastructure for Graph Search
- Under the Hood: Indexing and ranking in Graph Search
- Under the Hood: The natural language interface of Graph Search
- Under the Hood: Building posts search
- hisotry of facebook search
- keyword based search
- typeahead search, prefix-matching
- Unicorn is an inverted index system for many-to-many mapping. Difference with typical inverted index is that it not only indexes "documents" or entities like users/pages/groups/applications, but also search based on the edges (edge types) between nodes
- graph search natural language interface example: employers of my friends who live in New York
- input node: ME
ME --[friend-edge]--> my friends (who live in NY)- load list of nodes connected by a specific edge-type to the input nodes, here edge-type is "friend-edge"[MY FRIENDS FROM NY]--[works-at-edge]--> employers- "apply operator" i.e. "work-at" edge
- Indexing: performed as a combination of map-reduce jobs that collect data from Hive tables, process them and convert into inverted index data structures
- live udpates are streamed into the index via a separate live udpate pipeline.
- Graph Search components (Unicorn) - essentially an in-memory database with a query language interface
- Vertical - an unicorn instance - different entity types are kept in separate Unicorn verticals, e.g. USER Vertical, PAGES Vertical
- index server - part of a vertical, holds some of the index given the index is too large to fit into one single host
- Vertical Aggregator - broadcasts query to all verticals, and rank them
- because there're multiple Unicorn instances (Verticals), there's a TOP AGGREGATOR to on top of all vertical aggregators - which runs blending algorithm to blend result from each vertical
- Query Rewriting: parse the query into a structured Unicorn retrivial query, correct spelling, synonyms / segmentation etc.
- example: "restaurants liked by Facebook employees" gets converted to
273819889375819/places/20531316728/employees/places-liked/intersect - Scoring to rank result (static ranking); then "Result set scoring" to score the result as a whole, and only return a subset (e.g. "photos of facebook employees" may contain too many photos from Mark Zuckerberg)
- Nested Queries: the structured query may be nested and need to be JOINed, e.g. "restaurants liked by Facebook employees"
- Query Suggestion: relies on a NLP module to identify what kinds of entity that may be (sri as in name vs. sri as in "people who live in Sri.."
- Machine Learning is used to adjust the "scoring function"
- How to evaluate Search algorithm changes
- CTR - click through rate
- DCG (discounted cumulative gain) - measures the usefulness (gain) of a result set, by considering the gain of each result in the set and the position of the result
- Natural Language Interface to Graph Search
- keywords as an interface is not good: nouns only, while connections in Facebook Graph data are verbs
- quite intensive content, see article
- Building Posts Search
- more than 1 billion posts added everyday
- Wormhole to listen on posts from MySQL store of posts
- much larger than other index types - stored in SSD instead of RAM
- trillions of posts, nobody can read all result - dynamically add optional clauses to bias the result towards what we think are more valuable to the user
Facebook Messages
- presentation in Hadoop Summit 2011
- Scaling the Messages Application Back End
- Inside Facebook Messages' Application Server
- The Underlying Technology of Messages
- HBase as main storage
- Database Layer: Master / Backup Master / Region Server [1..n]
- Storage Layer: Name node / secondary name node / Data node [1..n]
- Coordination Service: Zookeeper peers
- A user is sticky to an application server
- Cell: application server + HBase node
- 5 or more racks per cell, 20 servers per rack => more than 100 machine for a cell
- controllers (master nodes, zookeeper, name nodes) spread across racks
- User Directory Service: find cell for a given user
- A separate backup system - quick and dirty to me
- Use Scribe
- double logging to reduce loss - merge and dedup
- ability to restore
- quite some effort to make HBase more reliable, fail safe, and support real-time workload.
- action log - any updates to a user's mailbox is recorded into the action log - can be replayed for various purposes
- full text search - use Lucene to extract data and add to HBase, each keyword has its own column
- Testing via Dark Launch - mirror live traffic from Chat and Inbox into a test Messages cluster for about 10% of the users.
Configuration Management
- an 2015 paper on this topic
Facebook Architecture的更多相关文章
- facebook architecture 2 【转】
At the scale that Facebook operates, a lot of traditional approaches to serving web content breaks d ...
- 【转发】揭秘Facebook 的系统架构
揭底Facebook 的系统架构 www.MyException.Cn 发布于:2012-08-28 12:37:01 浏览:0次 0 揭秘Facebook 的系统架构 www.MyExcep ...
- Facebook的体系结构分析---外文转载
Facebook的体系结构分析---外文转载 From various readings and conversations I had, my understanding of Facebook's ...
- 【转】为什么很多看起来不是很复杂的网站,比如 Facebook、淘宝,都需要大量顶尖高手来开发?
先说你看到的页面上,最重要的几个:[搜索商品]——这个功能,如果你有几千条商品,完全可以用select * from tableXX where title like %XX%这样的操作来搞定.但是— ...
- Facebook MyRocks at MariaDB
Recently my colleague Rasmus Johansson announced that MariaDB is adding support for the Facebook MyR ...
- Facebook技术架构
Facebook MySQL,Multifeed (a custom distributed system which takes the tens of thousands of updates f ...
- Analyzing The Papers Behind Facebook's Computer Vision Approach
Analyzing The Papers Behind Facebook's Computer Vision Approach Introduction You know that company c ...
- 100 open source Big Data architecture papers for data professionals
zhuan :https://www.linkedin.com/pulse/100-open-source-big-data-architecture-papers-anil-madan Big Da ...
- Facebook 的系统架构(转)
来源:http://www.quora.com/What-is-Facebooks-architecture(由Micha?l Figuière回答) 根据我现有的阅读和谈话,我所理解的今天Faceb ...
随机推荐
- SQL Server自定义函数( 转载于51CTO )
用户自定义函数自定义函数不能执行一系列改变数据库状态的操作,可以像系统函数在查询或存储过程等的程序中使用,也可以像相信过程一样能过 execute 命令来执行.自定义函数中存储了一个 Transact ...
- 如何启动Service,如何停用Service(转)
如何启用Service,如何停用Service Android中的服务和windows中的服务是类似的东西,服务一般没有用户操作界面,它运行于系统中不容易被用户发现,可以使用它开发如监控之类的程序.服 ...
- C#实现MySQL数据库中的blob数据存储
在MySQL数据库中,有一种blob数据类型,用来存储文件.C#编程语言操作MySQL数据库需要使用MySQL官方组件MySQL.Data.dll. Mysql.Data.dll(6.9.6)组件下载 ...
- Linux程序设计笔记
使用size命令查看二进制文件时,结果并不一定和预测占用内存大小一致,因为可能存在内存对齐,导致内存字节数比实际的更多 在C语言中,字符串常量存放在text segment中,在C++中却是存储在da ...
- tcpdump使用和TCP/IP包分析
关于tcpdump如何抓包,本文不再总结,可以查看 tcpdump的官方地址查看http://www.tcpdump.org 本文重点记录两个部分: 第一部分:tcpdump所抓包 ...
- 文成小盆友python-num13 整个堡垒机
本节主要内容: 1.pymsql的使用 2.SQLAchemy使用 3.Paramiko 4.通过ORM功能使用和Paramiko的使用实现一个简单的堡垒机模型. 一.pymsql的使用 pymsql ...
- with语句
<script type="text/javascript"> /* with语句:有了 With 语句,在存取对象属性和调用方法时就不用重复指定对象. 格式: wit ...
- string标准库的使用
string s; s.empty() 如果 s 为空串,则返回 true,否则返回 false. s.size() 返回 s 中字符的个数 s[n] 返回 s 中位置为 n 的字符,位置从 开始计数 ...
- SQL Server 2012数据库还原所遇到的问题
在SQL Server2005及以下版本做数据库备份还原时,需要首先建立数据库,然后才能进行数据库还原操作:而在SQL Server2005以上版本做数据库还原时,不需要建立数据库,可以直接进行数据库 ...
- ViewConfiguration滑动参数设置类
/** * 包含了方法和标准的常量用来设置UI的超时.大小和距离 */ public class ViewConfiguration { // 设定水平滚动条的宽度和垂直滚动条的高度,单位是像素px ...