通过完整示例来理解如何使用 epoll

网络服务器通常使用一个独立的进程或线程来实现每个连接。由于高性能应用程序需要同时处理大量的客户端，这种方法就不太好用了，因为资源占用和上下文切换时间等因素影响了同时处理大量客户端的能力。另一种方法是在一个线程中使用非阻塞 I/O，以及一些就绪通知方法，即当你可以在一个套接字上读写更多数据的时候告诉你。

本文介绍了 Linux 的 epoll(7) 机制，它是 Linux 最好的就绪通知机制。我们用 C 语言编写了示例代码，实现了一个完整的 TCP 服务器。我假设您有一定 C 语言编程经验，知道如何在 Linux 上编译和运行程序，并且可以阅读手册查看各种需要的 C 函数。

epoll 是在 Linux 2.6 中引入的，在其他类 UNIX 操作系统上不可用。它提供了一个类似于 select(2) 和 poll(2) 函数的功能：

select(2) 一次可以监测 FD_SETSIZE数量大小的描述符，FD_SETSIZE 通常是一个在 libc 编译时指定的小数字。
poll(2) 一次可以监测的描述符数量并没有限制，但撇开其它因素，我们每次都不得不检查就绪通知，线性扫描所有通过描述符，这样时间复杂度为 O(n)而且很慢。

epoll 没有这些固定限制，也不执行任何线性扫描。因此它可以更高效地执行和处理大量事件。

一个 epoll 实例可由 epoll_create(2) 或 epoll_create1(2) （它们采用不同的参数）创建，它们的返回值是一个 epoll 实例。epoll_ctl(2) 用来添加或删除监听 epoll 实例的描述符。epoll_wait(2) 用来等待被监听的描述符事件，一直阻塞到事件可用。更多信息请参见相关手册。

当描述符被添加到 epoll 实例时，有两种模式：电平触发和边缘触发（译者注：借鉴电路里面的概念）。当你使用电平触发模式，并且数据可以被读取，epoll_wait(2) 函数总是会返回就绪事件。如果你还没有读完数据，并且再次在 epoll 实例上调用 epoll_wait(2) 函数监听这个描述符，由于还有数据可读，那么它会再次返回这个事件。在边缘触发模式下，你只会得到一次就绪通知。如果你没有将数据全部读走，并且再次在 epoll 实例上调用 epoll_wait(2) 函数监听这个描述符，它就会阻塞，因为就绪事件已经发送过了。

传递到 epoll_ctl(2) 的 epoll 事件结构体如下。对每一个被监听的描述符，你可以关联到一个整数或者一个用户数据的指针。

typedef union epoll_data
{
void *ptr;
int fd;
__uint32_t u32;
__uint64_t u64;
} epoll_data_t;

struct epoll_event
{
__uint32_t events; /* Epoll events */
epoll_data_t data; /* User data variable */
};

typedef union epoll_data

{

void *ptr;

int fd;

__uint32_t u32;

__uint64_t u64;

} epoll_data_t;

struct epoll_event

{

__uint32_t events; /* Epoll events */

epoll_data_t data; /* User data variable */

};

现在我们开始写代码。我们将实现一个小的 TCP 服务器，将发送到这个套接字的所有数据打印到标准输出上。首先编写一个 create_and_bind() 函数，用来创建和绑定 TCP 套接字：

static int
create_and_bind (char *port)
{
struct addrinfo hints;
struct addrinfo *result, *rp;
int s, sfd;

memset (&hints, 0, sizeof (struct addrinfo));
hints.ai_family = AF_UNSPEC; /* Return IPv4 and IPv6 choices */
hints.ai_socktype = SOCK_STREAM; /* We want a TCP socket */
hints.ai_flags = AI_PASSIVE; /* All interfaces */

s = getaddrinfo (NULL, port, &hints, &result);
if (s != 0)
{
fprintf (stderr, "getaddrinfo: %sn", gai_strerror (s));
return -1;
}

for (rp = result; rp != NULL; rp = rp->ai_next)
{
sfd = socket (rp->ai_family, rp->ai_socktype, rp->ai_protocol);
if (sfd == -1)
continue;

s = bind (sfd, rp->ai_addr, rp->ai_addrlen);
if (s == 0)
{
/* We managed to bind successfully! */
break;
}

close (sfd);
}

if (rp == NULL)
{
fprintf (stderr, "Could not bindn");
return -1;
}

freeaddrinfo (result);

return sfd;
}

static int

create_and_bind (char *port)

{

struct addrinfo hints;

struct addrinfo *result, *rp;

int s, sfd;

memset (&hints, 0, sizeof (struct addrinfo));

hints.ai_family = AF_UNSPEC; /* Return IPv4 and IPv6 choices */

hints.ai_socktype = SOCK_STREAM; /* We want a TCP socket */

hints.ai_flags = AI_PASSIVE; /* All interfaces */

s = getaddrinfo (NULL, port, &hints, &result);

if (s != 0)

{

fprintf (stderr, "getaddrinfo: %sn", gai_strerror (s));

return -1;

}

for (rp = result; rp != NULL; rp = rp->ai_next)

{

sfd = socket (rp->ai_family, rp->ai_socktype, rp->ai_protocol);

if (sfd == -1)

continue;

s = bind (sfd, rp->ai_addr, rp->ai_addrlen);

if (s == 0)

{

/* We managed to bind successfully! */

break;

}

close (sfd);

}

if (rp == NULL)

{

fprintf (stderr, "Could not bindn");

return -1;

}

freeaddrinfo (result);

return sfd;

}

create_and_bind() 包含一个标准代码块，用一种可移植的方式来获得 IPv4 和 IPv6 套接字。它接受一个 port 字符串参数，可由 argv[1] 传递。getaddrinfo(3) 函数返回一堆 addrinfo 结构体到 result 变量中，它们与传入的 hints参数是兼容的。addrinfo结构体像这样：

struct addrinfo
{
int ai_flags;
int ai_family;
int ai_socktype;
int ai_protocol;
size_t ai_addrlen;
struct sockaddr *ai_addr;
char *ai_canonname;
struct addrinfo *ai_next;
};

struct addrinfo

{

int ai_flags;

int ai_family;

int ai_socktype;

int ai_protocol;

size_t ai_addrlen;

struct sockaddr *ai_addr;

char *ai_canonname;

struct addrinfo *ai_next;

};

我们依次遍历这些结构体并用它们创建套接字，直到可以创建并绑定一个套接字。如果成功了，create_and_bind() 返回这个套接字描述符。如果失败则返回 -1。

下面我们编写一个函数，用于将套接字设置为非阻塞状态。make_socket_non_blocking() 为传入的 sfd 参数设置 O_NONBLOCK 标志：

static int
make_socket_non_blocking (int sfd)
{
int flags, s;

flags = fcntl (sfd, F_GETFL, 0);
if (flags == -1)
{
perror ("fcntl");
return -1;
}

flags |= O_NONBLOCK;
s = fcntl (sfd, F_SETFL, flags);
if (s == -1)
{
perror ("fcntl");
return -1;
}

return 0;
}

static int

make_socket_non_blocking (int sfd)

{

int flags, s;

flags = fcntl (sfd, F_GETFL, 0);

if (flags == -1)

{

perror ("fcntl");

return -1;

}

flags |= O_NONBLOCK;

s = fcntl (sfd, F_SETFL, flags);

if (s == -1)

{

perror ("fcntl");

return -1;

}

return 0;

}

现在说说 main() 函数吧，它里面包含了这个程序的事件循环。这是主要代码:

#define MAXEVENTS 64

int
main (int argc, char *argv[])
{
int sfd, s;
int efd;
struct epoll_event event;
struct epoll_event *events;

if (argc != 2)
{
fprintf (stderr, "Usage: %s [port]n", argv[0]);
exit (EXIT_FAILURE);
}

sfd = create_and_bind (argv[1]);
if (sfd == -1)
abort ();

s = make_socket_non_blocking (sfd);
if (s == -1)
abort ();

s = listen (sfd, SOMAXCONN);
if (s == -1)
{
perror ("listen");
abort ();
}

efd = epoll_create1 (0);
if (efd == -1)
{
perror ("epoll_create");
abort ();
}

event.data.fd = sfd;
event.events = EPOLLIN | EPOLLET;
s = epoll_ctl (efd, EPOLL_CTL_ADD, sfd, &event);
if (s == -1)
{
perror ("epoll_ctl");
abort ();
}

/* Buffer where events are returned */
events = calloc (MAXEVENTS, sizeof event);

/* The event loop */
while (1)
{
int n, i;

n = epoll_wait (efd, events, MAXEVENTS, -1);
for (i = 0; i < n; i++)
{
if ((events[i].events & EPOLLERR) ||
(events[i].events & EPOLLHUP) ||
(!(events[i].events & EPOLLIN)))
{
/* An error has occured on this fd, or the socket is not
ready for reading (why were we notified then?) */
fprintf (stderr, "epoll errorn");
close (events[i].data.fd);
continue;
}

else if (sfd == events[i].data.fd)
{
/* We have a notification on the listening socket, which
means one or more incoming connections. */
while (1)
{
struct sockaddr in_addr;
socklen_t in_len;
int infd;
char hbuf[NI_MAXHOST], sbuf[NI_MAXSERV];

in_len = sizeof in_addr;
infd = accept (sfd, &in_addr, &in_len);
if (infd == -1)
{
if ((errno == EAGAIN) ||
(errno == EWOULDBLOCK))
{
/* We have processed all incoming
connections. */
break;
}
else
{
perror ("accept");
break;
}
}

s = getnameinfo (&in_addr, in_len,
hbuf, sizeof hbuf,
sbuf, sizeof sbuf,
NI_NUMERICHOST | NI_NUMERICSERV);
if (s == 0)
{
printf("Accepted connection on descriptor %d "
"(host=%s, port=%s)n", infd, hbuf, sbuf);
}

/* Make the incoming socket non-blocking and add it to the
list of fds to monitor. */
s = make_socket_non_blocking (infd);
if (s == -1)
abort ();

event.data.fd = infd;
event.events = EPOLLIN | EPOLLET;
s = epoll_ctl (efd, EPOLL_CTL_ADD, infd, &event);
if (s == -1)
{
perror ("epoll_ctl");
abort ();
}
}
continue;
}
else
{
/* We have data on the fd waiting to be read. Read and
display it. We must read whatever data is available
completely, as we are running in edge-triggered mode
and won't get a notification again for the same
data. */
int done = 0;

while (1)
{
ssize_t count;
char buf[512];

count = read (events[i].data.fd, buf, sizeof buf);
if (count == -1)
{
/* If errno == EAGAIN, that means we have read all
data. So go back to the main loop. */
if (errno != EAGAIN)
{
perror ("read");
done = 1;
}
break;
}
else if (count == 0)
{
/* End of file. The remote has closed the
connection. */
done = 1;
break;
}

/* Write the buffer to standard output */
s = write (1, buf, count);
if (s == -1)
{
perror ("write");
abort ();
}
}

if (done)
{
printf ("Closed connection on descriptor %dn",
events[i].data.fd);

/* Closing the descriptor will make epoll remove it
from the set of descriptors which are monitored. */
close (events[i].data.fd);
}
}
}
}

free (events);

close (sfd);

return EXIT_SUCCESS;
}

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

#define MAXEVENTS 64

int

main (int argc, char *argv[])

{

int sfd, s;

int efd;

struct epoll_event event;

struct epoll_event *events;

if (argc != 2)

{

fprintf (stderr, "Usage: %s [port]n", argv[0]);

exit (EXIT_FAILURE);

}

sfd = create_and_bind (argv[1]);

if (sfd == -1)

abort ();

s = make_socket_non_blocking (sfd);

if (s == -1)

abort ();

s = listen (sfd, SOMAXCONN);

if (s == -1)

{

perror ("listen");

abort ();

}

efd = epoll_create1 (0);

if (efd == -1)

{

perror ("epoll_create");

abort ();

}

event.data.fd = sfd;

event.events = EPOLLIN | EPOLLET;

s = epoll_ctl (efd, EPOLL_CTL_ADD, sfd, &event);

if (s == -1)

{

perror ("epoll_ctl");

abort ();

}

/* Buffer where events are returned */

events = calloc (MAXEVENTS, sizeof event);

/* The event loop */

while (1)

{

int n, i;

n = epoll_wait (efd, events, MAXEVENTS, -1);

for (i = 0; i < n; i++)

{

if ((events[i].events & EPOLLERR) ||

(events[i].events & EPOLLHUP) ||

(!(events[i].events & EPOLLIN)))

{

/* An error has occured on this fd, or the socket is not

ready for reading (why were we notified then?) */

fprintf (stderr, "epoll errorn");

close (events[i].data.fd);

continue;

}

else if (sfd == events[i].data.fd)

{

/* We have a notification on the listening socket, which

means one or more incoming connections. */

while (1)

{

struct sockaddr in_addr;

socklen_t in_len;

int infd;

char hbuf[NI_MAXHOST], sbuf[NI_MAXSERV];

in_len = sizeof in_addr;

infd = accept (sfd, &in_addr, &in_len);

if (infd == -1)

{

if ((errno == EAGAIN) ||

(errno == EWOULDBLOCK))

{

/* We have processed all incoming

connections. */

break;

}

else

{

perror ("accept");

break;

}

s = getnameinfo (&in_addr, in_len,

hbuf, sizeof hbuf,

sbuf, sizeof sbuf,

NI_NUMERICHOST | NI_NUMERICSERV);

if (s == 0)

{

printf("Accepted connection on descriptor %d "

"(host=%s, port=%s)n", infd, hbuf, sbuf);

}

/* Make the incoming socket non-blocking and add it to the

list of fds to monitor. */

s = make_socket_non_blocking (infd);

if (s == -1)

abort ();

event.data.fd = infd;

event.events = EPOLLIN | EPOLLET;

s = epoll_ctl (efd, EPOLL_CTL_ADD, infd, &event);

if (s == -1)

{

perror ("epoll_ctl");

abort ();

}

continue;

}

else

{

/* We have data on the fd waiting to be read. Read and

display it. We must read whatever data is available

completely, as we are running in edge-triggered mode

and won't get a notification again for the same

data. */

int done = 0;

while (1)

{

ssize_t count;

char buf[512];

count = read (events[i].data.fd, buf, sizeof buf);

if (count == -1)

{

/* If errno == EAGAIN, that means we have read all

data. So go back to the main loop. */

if (errno != EAGAIN)

{

perror ("read");

done = 1;

}

break;

}

else if (count == 0)

{

/* End of file. The remote has closed the

connection. */

done = 1;

break;

}

/* Write the buffer to standard output */

s = write (1, buf, count);

if (s == -1)

{

perror ("write");

abort ();

}

if (done)

{

printf ("Closed connection on descriptor %dn",

events[i].data.fd);

/* Closing the descriptor will make epoll remove it

from the set of descriptors which are monitored. */

close (events[i].data.fd);

}

free (events);

close (sfd);

return EXIT_SUCCESS;

}

main() 首先调用 create_and_bind() 新建套接字。然后把套接字设置非阻塞模式，再调用listen(2)。接下来它创建一个 epoll 实例 efd，添加监听套接字 sfd ，用电平触发模式来监听输入事件。

外层的 while 循环是主要事件循环。它调用epoll_wait(2)，线程保持阻塞以等待事件到来。当事件就绪，epoll_wait(2) 用 events 参数返回事件，这个参数是一群 epoll_event 结构体。

当我们添加新的监听输入连接以及删除终止的现有连接时，efd 这个 epoll 实例在事件循环中不断更新。

当事件是可用的，它们可以有三种类型：

错误：当一个错误连接出现，或事件不是一个可以读取数据的通知，我们只要简单地关闭相关的描述符。关闭描述符会自动地移除 efd 这个 epoll 实例的监听列表。
新连接：当监听描述符 sfd 是可读状态，这表明一个或多个连接已经到达。当有一个新连接， accept(2) 接受这个连接，打印一条相应的消息，把这个到来的套接字设置为非阻塞状态，并将其添加到 efd 这个 epoll 实例的监听列表。
客户端数据：当任何一个客户端描述符的数据可读时，我们在内部 while 循环中用 read(2) 以 512 字节大小读取数据。这是因为当前我们必须读走所有可读的数据，当监听描述符是边缘触发模式下，我们不会再得到事件。被读取的数据使用 write(2) 被写入标准输出(fd=1)。如果 read(2) 返回 0，这表示 EOF 并且我们可以关闭这个客户端的连接。如果返回 -1，errno 被设置为 EAGAIN，这表示这个事件的所有数据被读走，我们可以返回主循环。

就是这样。它在一个循环中运行，在监听列表中添加和删除描述符。

下载 epoll-example.c 代码。

更新1：电平和边缘触发的定义被颠倒错误了（虽然代码是正确的）。这是被Reddit用户 bodski 发现的。文章现在正确了。我应该在发布前校对的。对不起，并感谢谢指出错误。:)

更新2：代码被修改成连接将被阻塞时才执行accept(2)，所以如果多个连接到达，我们全部接受。这是Reddit用户 pitchford 提出。谢谢你的评论。 :)

通过完整示例来理解如何使用 epoll的更多相关文章

【第四篇】ASP.NET MVC快速入门之完整示例（MVC5+EF6）
目录 [第一篇]ASP.NET MVC快速入门之数据库操作(MVC5+EF6) [第二篇]ASP.NET MVC快速入门之数据注解(MVC5+EF6) [第三篇]ASP.NET MVC快速入门之安全策 ...
Struts 2.3.4.1完整示例
[系统环境]Windows 7 Ultimate 64 Bit [开发环境]JDK1.6.21,Tomcat6.0.35,MyEclipse10 [其他环境]Struts2.3.4.1 [项目描述]S ...
Spring 3 AOP 概念及完整示例
AOP概念 AOP(Aspect Oriented Programming),即面向切面编程(也叫面向方面编程,面向方法编程).其主要作用是,在不修改源代码的情况下给某个或者一组操作添加额外的功能.像 ...
WCF服务开发与调用的完整示例
WCF服务开发与调用的完整示例开发工具:VS2008 开发语言:C# 开发内容:简单的权限管理系统第一步.建立WCF服务库点击确定,将建立一个WCF 服务库示例程序,自动生成一个包括IServi ...
springmvc 项目完整示例06 日志–log4j 参数详细解析 log4j如何配置
Log4j由三个重要的组件构成: 日志信息的优先级日志信息的输出目的地日志信息的输出格式日志信息的优先级从高到低有ERROR.WARN. INFO.DEBUG,分别用来指定这条日志信息的重要程度 ...
C连接MySQL数据库开发之Linux环境完整示例演示（增、删、改、查）
一.开发环境 ReadHat6.3 32位.mysql5.6.15.gcc4.4.6 二.编译 gcc -I/usr/include/mysql -L/usr/lib -lmysqlclient ma ...
springmvc 项目完整示例01 需求与数据库表设计简单的springmvc应用实例 web项目
一个简单的用户登录系统用户有账号密码,登录ip,登录时间打开登录页面,输入用户名密码登录日志,可以记录登陆的时间,登陆的ip 成功登陆了的话,就更新用户的最后登入时间和ip,同时记录一条登录记录 ...
springmvc 项目完整示例02 项目创建-eclipse创建动态web项目配置文件 junit单元测试
包结构所需要的jar包直接拷贝到lib目录下然后选定 build path 之后开始写项目代码配置文件 ApplicationContext.xml <?xml version=" ...
springmvc 项目完整示例03 小结
利用spring 创建一个web项目大致原理利用spring的ioc 原理,例子中也就是体现在了配置文件中设置了自动扫描注解配置了数据库信息等一般一个项目,主要有domain,dao,ser ...

随机推荐

【转】Vue v-bind与v-model的区别
v-model 指令在表单控件元素上创建双向数据绑定,所谓双向绑定,指的就是我们在js中的vue实例中的data与其渲染的dom元素上的内容保持一致,两者无论谁被改变,另一方也会相应的更新为相同的数 ...
Java线程池使用和常用参数（待续）
线程池怎么实现的,核心参数讲一讲? Executors是线程池的工厂类,通过调用它的静态方法如下: Executors.newCachedThreadPool(); Executors.newFixe ...
559. N叉树的最大深度
给定一个 N 叉树,找到其最大深度. 最大深度是指从根节点到最远叶子节点的最长路径上的节点总数. 例如,给定一个 3叉树 : 我们应返回其最大深度,3. 说明: 树的深度不会超过 1000. 树的节点 ...
[BZOJ4551][TJOI2016&&HEOI2016]树(并查集)
4551: [Tjoi2016&Heoi2016]树 Time Limit: 20 Sec Memory Limit: 128 MBSubmit: 1746 Solved: 800[Sub ...
数据离散化 ( 以及 stl 中的 unique( ) 的用法 )+ bzoj3289:Mato的文件管理
http://blog.csdn.net/gokou_ruri/article/details/7723378 ↑惯例Mark大神的博客 bzoj3289:Mato的文件管理线段树求逆序对+莫队 ...
Java泛型应用总结
一.泛型的引入原因在操作集合的时候,之前方法的定义都是Object类型,向集合中添加对象,都自动向上转型,加入的元素可以是任何类型但是,在取出元素的时候,通常想要使用对象的特有功能,就必须向下转型 ...
PAT甲级1066. Root of AVL Tree
PAT甲级1066. Root of AVL Tree 题意: 构造AVL树,返回root点val. 思路: 了解AVL树的基本性质. AVL树 ac代码: C++ // pat1066.cpp : ...
HTML5 book响应式翻页效果
翻页,HTML5源码下载,HTML5响应式翻页效果,鼠标移到右上角会看到翻页效果,需要鼠标拖动后翻页,支持ie9+,html5浏览器. 单页和双页. 自动播放和暂停. 点击左右翻页. 鼠标点击左右页面 ...
extjs用iframe的问题
项目中用extjs做前提系统的界面是左边用树做目录右边用tabpanel做内容展示点击树节点的时候在tabpanel添加新的tab JScript code var newTab = center ...
HDU 4696 Answers （2013多校10,1001题）
Answers Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 131072/131072 K (Java/Others)Total S ...

通过完整示例来理解如何使用 epoll

通过完整示例来理解如何使用 epoll的更多相关文章

随机推荐

热门专题