One of the features merged in the 3.9 development cycle was TCP and UDP support for the SO_REUSEPORTsocket option; that support was implemented in a series of patches by Tom Herbert. The new socket option allows multiple sockets on the same host to bind to the same port, and is intended to improve the performance of multithreaded network server applications running on top of multicore systems.

The basic concept of SO_REUSEPORT is simple enough. Multiple servers (processes or threads) can bind to the same port if they each set the option as follows:

    int sfd = socket(domain, socktype, 0);

    int optval = 1;
setsockopt(sfd, SOL_SOCKET, SO_REUSEPORT, &optval, sizeof(optval)); bind(sfd, (struct sockaddr *) &addr, addrlen);

So long as the first server sets this option before binding its socket, then any number of other servers can also bind to the same port if they also set the option beforehand. The requirement that the first server must specify this option prevents port hijacking—the possibility that a rogue application binds to a port already used by an existing server in order to capture (some of) its incoming connections or datagrams. To prevent unwanted processes from hijacking a port that has already been bound by a server using SO_REUSEPORT, all of the servers that later bind to that port must have an effective user ID that matches the effective user ID used to perform the first bind on the socket.

SO_REUSEPORT can be used with both TCP and UDP sockets. With TCP sockets, it allows multiple listening sockets—normally each in a different thread—to be bound to the same port. Each thread can then accept incoming connections on the port by calling accept(). This presents an alternative to the traditional approaches used by multithreaded servers that accept incoming connections on a single socket.

The first of the traditional approaches is to have a single listener thread that accepts all incoming connections and then passes these off to other threads for processing. The problem with this approach is that the listening thread can become a bottleneck in extreme cases. Inearly discussions on SO_REUSEPORT, Tom noted that he was dealing with applications that accepted 40,000 connections per second. Given that sort of number, it's unsurprising to learn that Tom works at Google.

The second of the traditional approaches used by multithreaded servers operating on a single port is to have all of the threads (or processes) perform an accept() call on a single listening socket in a simple event loop of the form:

    while (1) {
new_fd = accept(...);
process_connection(new_fd);
}

The problem with this technique, as Tom pointed out, is that when multiple threads are waiting in the accept() call, wake-ups are not fair, so that, under high load, incoming connections may be distributed across threads in a very unbalanced fashion. At Google, they have seen a factor-of-three difference between the thread accepting the most connections and the thread accepting the fewest connections; that sort of imbalance can lead to underutilization of CPU cores. By contrast, the SO_REUSEPORT implementation distributes connections evenly across all of the threads (or processes) that are blocked in accept() on the same port.

As with TCP, SO_REUSEPORT allows multiple UDP sockets to be bound to the same port. This facility could, for example, be useful in a DNS server operating over UDP. With SO_REUSEPORT, each thread could use recv() on its own socket to accept datagrams arriving on the port. The traditional approach is that all threads would compete to perform recv() calls on a single shared socket. As with the second of the traditional TCP scenarios described above, this can lead to unbalanced loads across the threads. By contrast, SO_REUSEPORTdistributes datagrams evenly across all of the receiving threads.

Tom noted that the traditional SO_REUSEADDR socket option already allows multiple UDP sockets to be bound to, and accept datagrams on, the same UDP port. However, by contrast with SO_REUSEPORTSO_REUSEADDR does not prevent port hijacking and does not distribute datagrams evenly across the receiving threads.

There are two other noteworthy points about Tom's patches. The first of these is a useful aspect of the implementation. Incoming connections and datagrams are distributed to the server sockets using a hash based on the 4-tuple of the connection—that is, the peer IP address and port plus the local IP address and port. This means, for example, that if a client uses the same socket to send a series of datagrams to the server port, then those datagrams will all be directed to the same receiving server (as long as it continues to exist). This eases the task of conducting stateful conversations between the client and server.

The other noteworthy point is that there is a defect in the current implementation of TCP SO_REUSEPORT. If the number of listening sockets bound to a port changes because new servers are started or existing servers terminate, it is possible that incoming connections can be dropped during the three-way handshake. The problem is that connection requests are tied to a specific listening socket when the initial SYN packet is received during the handshake. If the number of servers bound to the port changes, then the SO_REUSEPORT logic might not route the final ACK of the handshake to the correct listening socket. In this case, the client connection will be reset, and the server is left with an orphaned request structure. A solution to the problem is still being worked on, and may consist of implementing a connection request table that can be shared among multiple listening sockets.

The SO_REUSEPORT option is non-standard, but available in a similar form on a number of other UNIX systems (notably, the BSDs, where the idea originated). It seems to offer a useful alternative for squeezing the maximum performance out of network applications running on multicore systems, and thus is likely to be a welcome addition for some application developers.

socket层分流思想:监听同一端口的场景下,所有线程都拥有一个独立的socket fd,而不是共用一个,从而提高性能!这也是引入SO_REUSEPORT socket option的原因

The SO_REUSEPORT socket option的更多相关文章

  1. Java Socket Option

    选项 public final static int TCP_NODELAY = 0x0001; public final static int SO_REUSEADDR = 0x04; public ...

  2. socket - option编程:SO_REUSEADDR

    网友vmstat多次提出了这个问题:SO_REUSEADDR有什么用处和怎么使用.而且很多网友在编写网络程序时也会遇到这个问题.所以特意写了这么一篇文章,希望能够解答一些人的疑难. 其实这个问题在Ri ...

  3. Nginx实现内参:为什么架构很重要?

    Nginx在web开发者眼中就是高并发高性能的代名词,其基于事件的架构也被众多开发者效仿.我从Nginx的网站找到一篇技术文章将Nginx是怎样实现的,文章是Nginx的产品老大Owen Garret ...

  4. nginx指令

    Directives(指令) Syntax(语法): aio on | off | threads[=pool]; Default: aio off; Context: http, server, l ...

  5. Oracle:使用nginx做为代理访问

    nginx 必须启用 启用 --with-stream 模块. 可下载源码编译. nginx.conf的配置: worker_processes ; events { worker_connectio ...

  6. Inside NGINX: How We Designed for Performance & Scale

    NGINX leads the pack in web performance, and it’s all due to the way the software is designed. Where ...

  7. 现在的 Linux 内核和 Linux 2.6 的内核有多大区别?

    作者:larmbr宇链接:https://www.zhihu.com/question/35484429/answer/62964898来源:知乎著作权归作者所有.商业转载请联系作者获得授权,非商业转 ...

  8. 【wiki】红帽linux

    Red Hat Enterprise Linux From Wikipedia, the free encyclopedia wiki 上面红帽的版本信息. https://en.wikipedia. ...

  9. [译]深入 NGINX: 为性能和扩展所做之设计

    来自:http://ifeve.com/inside-nginx-how-we-designed-for-performance-scale/ 这篇文章写了nginx的设计,写的很仔细全面, 同时里面 ...

随机推荐

  1. 前端开发 - JavaScript - 总结

    一.JavaScript的特征 javaScript是一种web前端的描述语言,也是一种基于对象(object)和事件驱动(Event Driven)的.安全性好的脚本语言.它运行在客户端从而减轻服务 ...

  2. AndroidStudio修改常用快捷键

    近期公司开发工具要从eclipse转向Androidstudio,安装好as后当然迫不及待地要将快捷键修改为eclipse中的快捷键啦,下面是个人的一些小的总结. 1.首先当然要打开快捷键的设置界面啦 ...

  3. 在django中实现支付宝支付(支付宝接口调用)

    支付宝支付 正式环境:用营业执照,申请商户号,appid 测试环境:沙箱环境:https://openhome.alipay.com/platform/appDaily.htm?tab=info 支付 ...

  4. 对Numpy数组按axis运算的理解

    Python的Numpy数组运算中,有时会出现按axis进行运算的情况,如 >>> x = np.array([[1, 1], [2, 2]]) >>> x arr ...

  5. Upsource——对已签入的代码进行分享、讨论和审查代码

    Upsource 一.Upsource简介 Upsource ,这是一个专门为软件开发团队所设计的源代码协作工具.Upsource能够与多种版本控制工具进行集成,包括Git.Mercurial.Sub ...

  6. 4.ubuntu实现linux与windows的互相复制与粘贴

    为了能够在linux和windows之间直接进行互相复制粘贴,给出下面的解决办法. 系统环境: ubuntu12.04(linux), win7系统 以下指令都是在超级用户的执行权限下执行的. 要解决 ...

  7. 安装memcached扩展

    1.wget http://pecl.php.net/package/memcache  下载相对应的扩展包 2. tar -zxvf memcache-2.2.7.tgz 3.  cd memcac ...

  8. XDU 1001 又是苹果(状态压缩)

    #include<cstdio> #include<cstring> ; using namespace std; int r[maxn],c[maxn]; char pic[ ...

  9. $digest / $apply digest in progress报错

    有的时候出于某种原因,如jq操作了model.或者$watch.setTimeout等函数改变了model,导致最后没有脏数据检测.所以我没就手动调用了$apply( )等.但是第一次运行的时候ang ...

  10. HDU 3081 Marriage Match II (二分+并查集+最大流)

    题意:N个boy和N个girl,每个女孩可以和与自己交友集合中的男生配对子;如果两个女孩是朋友,则她们可以和对方交友集合中的男生配对子;如果女生a和女生b是朋友,b和c是朋友,则a和c也是朋友.每一轮 ...