1. Scatter/Gather I/O

a single system call  to  read or write data between single data stream and multiple buffers
This type of I/O is so named because the data is scattered into or gathered from the given vector of buffers
Scatter/Gather I/O 相比于 C标准I/O有如下优势:
(1)更自然的编码模式
如果数据是自然分隔的(预先定义的结构体字段),那么 Scatter/Gather I/O  更直观
(2)效率高
A single vectored I/O operation can replace multiple linear I/O operations
(3)性能好
减少了系统调用数
(4)原子性
Scatter/Gather I/O 是原子的,而 multiple linear I/O operations 则有多个进程交替I/O的风险
 
#include <sys/uio.h>

struct iovec {
void *iov_base; /* pointer to start of buffer */
size_t iov_len; /* size of buffer in bytes */
};
Both functions always operate on the segments in order, starting with iov[0], then iov[1], and so on, through iov[count-1].
/* The readv() function reads count segments from the file descriptor fd into the buffers described by iov */
ssize_t readv (int fd, const struct iovec *iov, int count);
/* The writev() function writes at most count segments from the buffers described by iov into the file descriptor fd */
ssize_t writev (int fd, const struct iovec *iov, int count);

注意:在Scatter/Gather I/O操作过程中,内核必须分配内部数据结构来表示每个buffer分段,正常情况下,是根据分段数count进行动态内存分配的,

但是当分段数count较小时(一般<=8),内核直接在内核栈上分配,这显然比在堆中动态分配要快

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <sys/uio.h> int main(int argc, char* argv[])
{
struct iovec iov[];
char* buf[] = {
"The term buccaneer comes from the word boucan.\n",
"A boucan is a wooden frame used for cooking meat.\n",
"Buccaneer is the West Indies name for a pirate.\n" }; int fd = open("wel.txt", O_WRONLY | O_CREAT | O_TRUNC);
if (fd == -) {
fprintf(stderr, "open error\n");
return ;
} /* fill out three iovec structures */
for (int i = ; i < ; ++i) {
iov[i].iov_base = buf[i];
iov[i].iov_len = strlen(buf[i]) + ;
} /* with a single call, write them out all */
ssize_t nwrite = writev(fd, iov, );
if (nwrite == -) {
fprintf(stderr, "writev error\n");
return ;
}
fprintf(stdout, "wrote %d bytes\n", nwrite);
if (close(fd)) {
fprintf(stdout, "close error\n");
return ;
} return ;
}
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <fcntl.h>
#include <sys/uio.h>
#include <sys/stat.h> int main(int argc, char* argv[])
{
char foo[], bar[], baz[];
struct iovec iov[];
int fd = open("wel.txt", O_RDONLY);
if (fd == -) {
fprintf(stderr, "open error\n");
return ;
} /* set up our iovec structures */
iov[].iov_base = foo;
iov[].iov_len = sizeof(foo);
iov[].iov_base = bar;
iov[].iov_len = sizeof(bar);
iov[].iov_base = baz;
iov[].iov_len = sizeof(baz); /* read into the structures with a single call */
ssize_t nread = readv(fd, iov, );
if (nread == -) {
fprintf(stderr, "readv error\n");
return ;
} for (int i = ; i < ; ++i) {
fprintf(stdout, "%d: %s", i, (char*)iov[i].iov_base);
}
if (close(fd)) {
fprintf(stderr, "close error\n");
return ;
} return ;
}

writev的简单实现:

#include <unistd.h>
#include <sys/uio.h> ssize_t my_writev(int fd, const struct iovec* iov, int count)
{
ssize_t ret = ;
for (int i = ; i < count; ++i) {
ssize_t nr = write(fd, iov[i].iov_base, iov[i].iov_len);
if (nr == -) {
if (errno == EINTR)
continue;
ret -= ;
break;
}
ret += nr;
}
return nr;
}

In fact, all I/O inside the Linux kernel is vectored; read() and write() are implemented as vectored I/O with a vector of only one segment

2. epoll

poll 和 select 每次调用都需要监测整个文件描述符集,当描述符集增大时,每次调用对描述符集全局扫描就成了性能瓶颈
 
(1) creating a new epoll instance
/* A successful call to epoll_create1() instantiates a new epoll instance and returns a file descriptor associated with the instance */
#include <sys/epoll.h>
int epoll_create(int size);

parameter size used to provide a  hint about the number of file descriptors to be watched;

nowadays the kernel dynamically sizes the required data structures and this parameter just needs to be greater than zero

(2) controling epoll

/* The epoll_ctl() system call can be used to add file descriptors to and remove file descriptors from a given epoll context */
#include <sys/epoll.h>
int epoll_ctl(int epfd, int op, int fd, struct epoll_event* event); struct epoll_event {
__u32 events; /* events */
union {
void* ptr;
int fd;
__u32 u32;
__u64 u64;
} data;
};

a. op parameter

EPOLL_CTL_ADD   // Add a monitor on the file associated with the file descriptor fd to the epoll instance associated with epfd
EPOLL_CTL_DEL // Remove a monitor on the file associated with the file descriptor fd from the epoll instance associated with epfd
EPOLL_CTL_MOD // Modify an existing monitor of fd with the updated events specified by event

b. event parameter

EPOLLET     // Enables edge-triggered behavior for the monitor of the file ,The default behavior is level-triggered
EPOLLIN // The file is available to be read from without blocking
EPOLLOUT // The file is available to be written to without blocking

对于结构体struct epoll_event 里的data成员,通常做法是将data联合体里的fd设置为第二个参数fd,即 event.data.fd = fd

To add a new watch on the file associated with fd to the epoll instance  epfd :

#include <sys/epoll.h>

struct epoll_event event;
event.data.fd = fd;
event.events = EPOLLIN | EPOLLOUT int ret = epll_ctl(epfd, EPOLL_CTL_ADD, fd, &event);
if (ret) {
fprintf(stderr, "epll_ctl error\n");
}

To modify an existing event on the file associated with fd on the epoll instance epfd :

#include <sys/epoll.h>

struct epoll_event event;
event.data.fd = fd;
event.events = EPOLLIN; int ret = epoll_ctl(epfd, EPOLL_CTL_MOD, fd, &event);
if (ret) {
fprintf(stderr, "epoll_ctl error\n");
}

To remove an existing event on the file associated with fd from the epoll instance epfd :

#include <sys/epoll.h>

struct epoll_event event;

int ret = epoll_ctl(epfd, EPOLL_CTL_DEL, fd, &event);
if (ret) {
fprintf(stderr, "epoll_ctl error\n");
}

(3) waiting for events with epoll

#include <sys/epoll.h>
int epoll_wait(int epfd, struct epoll_event* events, int maxevents, int timeout);

The return value is the number of events, or −1 on error

#include <sys/epoll.h>

#define MAX_EVENTS 64

struct epoll_event* events = malloc(sizeof(struct epoll_event) * MAX_EVENTS);
if (events == NULL) {
fprintf(stdout, "malloc error\n");
return ;
} int nready = epoll_wait(epfd, events, MAX_EVENTS, -);
if (nready < ) {
fprintf(stderr, "epoll_wait error\n");
free(events);
return ;
} for (int i = ; i < nready; ++i) {
fprintf(stdout, "event=%ld on fd=%d\n", events[i].events, events[i].data.fd);
/* we now can operate on events[i].data.fd without blocking */
}
free(events);
(4) Level-Triggered  vs. Edge-Triggered
考虑一个生产者和一个消费者通过Unix 管道通信:
a. 生产者向管道写 1KB数据
b. 消费者在管道上执行 epoll_wait, 等待管道中数据可读
如果是 水平触发,epoll_wait 调用会立即返回,表明管道已经 读操作就绪
如果是 边缘触发,epoll_wait 调用不会立即返回,直到 步骤a 发生,也就是说,即使管道在eoll_wait调用的时候是可读的,该调用也不会立即返回直至数据已经写到管道。
 
select和epoll的默认工作方式是 水平触发,边缘触发通常是应用于 non-blocking I/O, 并且必须小心检查EAGAIN

3. Mapping Files into Memory

/* A call to mmap() asks the kernel to map len bytes of the object represented by the file  descriptor fd, 
starting at offset bytes into the file, into memory
*/
#include <sys/mman.h>
void* mmap(void* addr, size_t len, int prot, int flags, int fd, off_t offset);
void* ptr = mmap(, len, PROT_READ, MAP_SHARED, fd, );

参数addr和offset必须是内存页大小的整数倍
如果文件大小不是内存页大小的整数倍,那么就存在填补为0的空洞,读这段区域将返回0,写这段区域将不会影响到被映射的文件内容
Because addr and offset are usually 0, this requirement is not overly difficult to meet
 
int munmap (void *addr, size_t len);

munmap() removes any mappings that contain pages located anywhere in the process address space starting at addr,

which must be page-aligned, and continuing for len bytes

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h> int main(int argc, char* argv[])
{
if (argc < ) {
fprintf(stderr, "usage:%s <file>\n", argv[]);
return ;
} int fd = open(argv[], O_RDONLY);
if (fd == -) {
fprintf(stderr, "open error\n");
return ;
} struct stat sbuf;
if (fsat(fd, &sbuf) == -) {
fprintf(stderr, "fstat error\n");
return ;
} if (!S_ISREG(sbuf.st_mode)) {
fprintf(stderr, "%s is not a file\n", argv[]);
return ;
}
void* ptr = mmap(, sbuf.st_size, PROT_READ, MAP_SHARED, fd, );
if (ptr == MAP_FAILED) {
fprintf(stderr, "mmap error\n");
return ;
} if (close(fd)) {
fprintf(stderr, "close error\n");
return ;
} for (int i = ; i < sbuf.st_size; ++i) {
fputc(ptr[i], stdout);
} if (munmap(ptr, sbuf.st_size) == -) {
fprintf(stderr, "munmap error\n");
return ;
}
return ;
}
mmap的优点:
(1) 读写内存映射文件 可以避免 read 和 write 系统调用发生的拷贝
(2) 除了潜在的内存页错误,读写内存映射文件不会有 read 和 write 系统调用时的上下文切换开销
(3) 当多个进程映射相同内存时,内存数据可以在多个进程之间共享
(4) 内存映射定位时只需要平常的指针操作,而不需要 lseek 系统调用
 
mmap的缺点:
(1) 内存映射必须是页大小的整数倍,通常在 调用返回的内存页 和 被映射的实际文件之间存在 剩余空间,例如内存页大小是4KB,那么映射7Byte文件将浪费4089Byte内存
(2) 内存映射必须适合进程的地址空间,32位地址空间,如果有大量的内存映射,将产生碎片,将很难寻找到连续的大区域
(3) 创建和维护 内存映射和内核相关数据结构 有一定开销
 
文件和映射内存之间的同步:
#include <sys/mman.h>
int msync (void *addr, size_t len, int flags);
msync() flushes back to disk any changes made to a file mapped via mmap(), synchronizing the mapped file with the mapping 
Without invocation of msync(), there is no guarantee that a dirty mapping will be written back to disk until the file is unmapped
 

4. 同步 异步

 
 
  
 

5. I/O调度和I/O性能

a single disk seek can average over 8 milliseconds,25 million times  longer than a single processor cycle!
I/O schedulers perform two basic operations: merging and sorting
Merging is the process of taking two or more adjacent I/O requests and combining them into a single request
假设一个请求读磁盘块5,另一个请求读磁盘块6和7,调度器可以合并成单一请求读磁盘块5到7,这将减少一半的I/O操作数
Sorting is the process of arranging pending I/O requests in ascending block order (尽可能保证顺序读写,而不是随机读写)
 
If an I/O scheduler always sorted new requests by the order of insertion,it would be possible to starve requests to
far-off blocks indefinitely (I/O调度算法,避免饥饿现象)
使用电梯调度算法避免某个I/O请求长时间得不到响应
 
Linus Elevator I/O scheduler
The Deadline I/O Scheduler
The Anticipatory I/O Scheduler
The CFQ I/O Scheduler
The Noop I/O Scheduler
Sorting by inode number is the most commonly used method for scheduling I/O requests in user space

Linux System Programming 学习笔记(四) 高级I/O的更多相关文章

  1. Linux System Programming 学习笔记(十一) 时间

    1. 内核提供三种不同的方式来记录时间 Wall time (or real time):actual time and date in the real world Process time:the ...

  2. Linux System Programming 学习笔记(七) 线程

    1. Threading is the creation and management of multiple units of execution within a single process 二 ...

  3. Linux System Programming 学习笔记(六) 进程调度

    1. 进程调度 the process scheduler is the component of a kernel that selects which process to run next. 进 ...

  4. Linux System Programming 学习笔记(二) 文件I/O

    1.每个Linux进程都有一个最大打开文件数,默认情况下,最大值是1024 文件描述符不仅可以引用普通文件,也可以引用套接字socket,目录,管道(everything is a file) 默认情 ...

  5. Linux System Programming 学习笔记(一) 介绍

    1. Linux系统编程的三大基石:系统调用.C语言库.C编译器 系统调用:内核向用户级程序提供服务的唯一接口.在i386中,用户级程序执行软件中断指令 INT n 之后切换至内核空间 用户程序通过寄 ...

  6. Linux System Programming 学习笔记(十) 信号

    1. 信号是软中断,提供处理异步事件的机制 异步事件可以是来源于系统外部(例如用户输入Ctrl-C)也可以来源于系统内(例如除0)   内核使用以下三种方法之一来处理信号: (1) 忽略该信号.SIG ...

  7. Linux System Programming 学习笔记(九) 内存管理

    1. 进程地址空间 Linux中,进程并不是直接操作物理内存地址,而是每个进程关联一个虚拟地址空间 内存页是memory management unit (MMU) 可以管理的最小地址单元 机器的体系 ...

  8. Linux System Programming 学习笔记(八) 文件和目录管理

    1. 文件和元数据 每个文件都是通过inode引用,每个inode索引节点都具有文件系统中唯一的inode number 一个inode索引节点是存储在Linux文件系统的磁盘介质上的物理对象,也是L ...

  9. Linux System Programming 学习笔记(五) 进程管理

    1. 进程是unix系统中两个最重要的基础抽象之一(另一个是文件) A process is a running program A thread is the unit of activity in ...

随机推荐

  1. NIOP 膜你题

    NOIp膜你题   Day1 duliu 出题人:ZAY    1.大美江湖(mzq.cpp/c) [题目背景] 细雪飘落长街,枫叶红透又一年不只为故友流连,其实我也恋长安听门外足音慢,依稀见旧时容颜 ...

  2. OI算法复习汇总

    各大排序 图论: spfa floyd dijkstra *拉普拉斯矩阵 hash表 拓扑排序 哈夫曼算法 匈牙利算法 分块法 二分法 费马小定理: a^(p-1) ≡1(mod p) 网络流 二分图 ...

  3. MHA

    MHA 1. MHA简介 1.1 MHA工作原理总结为如下 1.2 MHA工具包介绍 2. 部署MHA 2.1 环境介绍 2.2 一主两从复制搭建 2.3 配置互信 2.4 下载MHA 2.5 安装M ...

  4. C++输入密码不显示明文

    之前有遇到需求说输入密码不显示明文,但同时会有一些其他问题,暂时没做,如今经过尝试可以实现,但是得先知道要输入的是密码.主要利用的getch()函数的不回显特点.需要注意的是这个函数不是标准函数,而且 ...

  5. Python中类的声明,使用,属性,实例属性,计算属性及继承,重写

    Python中的类的定义以及使用: 类的定义: 定义类 在Python中,类的定义使用class关键字来实现 语法如下: class className: "类的注释" 类的实体 ...

  6. 如何固定电脑IP

    百度经验里有:http://jingyan.baidu.com/article/2f9b480d579fc041cb6cc297.html 但是就关于如何填写DNS时,就不知道咋办了,特意问了一下IT ...

  7. &与&&有什么区别?

    一.简要说明 按位与:a&b是把a和b都转换成二进制数然后再进行与的运算: 逻辑与:a&&b就是当且仅当两个操作数均为 true时,其结果才为 true:只要有一个为零,a&a ...

  8. UVa 10118 记忆化搜索 Free Candies

    假设在当前状态我们第i堆糖果分别取了cnt[i]个,那么篮子里以及口袋里糖果的个数都是可以确定下来的. 所以就可以使用记忆化搜索. #include <cstdio> #include & ...

  9. 反射的妙用-类名方法名做参数进行方法调用实例demo

    首先声明一点,大家都会说反射的效率低下,但是大多数的框架能少了反射吗?当反射能为我们带来代码上的方便就可以用,如有不当之处还望大家指出 1,项目结构图如下所示:一个ClassLb类库项目,一个为测试用 ...

  10. 缓存淘汰算法之LRU实现

    Java中最简单的LRU算法实现,就是利用 LinkedHashMap,覆写其中的removeEldestEntry(Map.Entry)方法即可 如果你去看LinkedHashMap的源码可知,LR ...