linux 如何降低入向软中断占比

最近遇到一个问题，当tcp收包的时候，我们的服务器的入向软中断比例很高。

我们知道，napi模式，可以降低收包入向软中断占比，那么，针对napi模式，能不能优化？本文针对2.6.32-358内核进行分析：

static void net_rx_action(struct softirq_action *h)

{

    struct list_head *list = &__get_cpu_var(softnet_data).poll_list;

    unsigned long time_limit = jiffies + ;

    int budget = netdev_budget;----------------这个值可以通过/proc/sys/net/core/netdev_budget修改，默认是300

    void *have;

    int select;

    struct rps_remote_softirq_cpus *rcpus;

    local_irq_disable();

    while (!list_empty(list)) {-------------------不为空，则一直循环

        struct napi_struct *n;

        int work, weight;

        /* If softirq window is exhuasted then punt.

         * Allow this to run for 2 jiffies since which will allow

         * an average latency of 1.5/HZ.

         */

        if (unlikely(budget <=  || time_after(jiffies, time_limit)))----------时间到，或者配额到了，则退出循环。

            goto softnet_break;

        local_irq_enable();

        /* Even though interrupts have been re-enabled, this

         * access is safe because interrupts can only add new

         * entries to the tail of this list, and only ->poll()

         * calls can remove this head entry from the list.

         */

        n = list_first_entry(list, struct napi_struct, poll_list);

        have = netpoll_poll_lock(n);

        weight = n->weight;--------------napi配置的weight，这个是一次poll的配额，和上面总的配合一起控制收包，这个在netif_napi_add 函数中设置。

/* This NAPI_STATE_SCHED test is for avoiding a race * with netpoll's poll_napi(). Only the entity which * obtains the lock and sees NAPI_STATE_SCHED set will * actually make the ->poll() call. Therefore we avoid * accidently calling ->poll() when NAPI is not scheduled. */ work = ; if (test_bit(NAPI_STATE_SCHED, &n->state)) { work = n->poll(n, weight);------------回调poll，不同的厂家有不同的实现，比如intel的ixgbe_poll实现 trace_napi_poll(n); } WARN_ON_ONCE(work > weight); budget -= work; local_irq_disable(); /* Drivers must not modify the NAPI state if they * consume the entire weight. In such cases this code * still "owns" the NAPI instance and therefore can * move the instance around on the list at-will. */ if (unlikely(work == weight)) { if (unlikely(napi_disable_pending(n))) { local_irq_enable(); napi_complete(n); local_irq_disable(); } else list_move_tail(&n->poll_list, list); } netpoll_poll_unlock(have); } out: rcpus = &__get_cpu_var(rps_remote_softirq_cpus); select = rcpus->select; rcpus->select ^= ; local_irq_enable(); net_rps_action(&rcpus->mask[select]); #ifdef CONFIG_NET_DMA /* * There may not be any more sk_buffs coming right now, so push * any pending DMA copies to hardware */ dma_issue_pending_all(); #endif return; softnet_break: __get_cpu_var(netdev_rx_stat).time_squeeze++; __raise_softirq_irqoff(NET_RX_SOFTIRQ); goto out; }

从代码可以看出，限制一次调用net_rx_action的地方，无非是时间，还有netdev_budget，如果把netdev_budget 调大，是不是就可以一次性多收一点包呢，意味着触发软中断的次数就会减少，答案是肯定的。

那么默认值来看，netdev_budget 比napi配置的weight要大。

/* initialize NAPI */
netif_napi_add(adapter->netdev, &q_vector->napi,ixgbe_poll, 64);-------传入64

void netif_napi_add(struct net_device *dev, struct napi_struct *napi,

            int (*poll)(struct napi_struct *, int), int weight)

{

    INIT_LIST_HEAD(&napi->poll_list);

    napi->gro_count = ;

    napi->gro_list = NULL;

    napi->skb = NULL;

    napi->poll = poll;

    if (weight > NAPI_POLL_WEIGHT)

        pr_err_once("netif_napi_add() called with weight %d on device %s\n",--------会打印，但是也不会限制

                weight, dev->name);

    napi->weight = weight;

    list_add(&napi->dev_list, &dev->napi_list);

    napi->dev = dev;

#ifdef CONFIG_NETPOLL

    spin_lock_init(&napi->poll_lock);

    napi->poll_owner = -;

#endif

    set_bit(NAPI_STATE_SCHED, &napi->state);

我们通过将传入的64改大到256，因为在ixgbe_poll 函数中，这个值控制了一次循环收包的个数。

int ixgbe_poll(struct napi_struct *napi, int budget)

{

    struct ixgbe_q_vector *q_vector =

                container_of(napi, struct ixgbe_q_vector, napi);

    struct ixgbe_adapter *adapter = q_vector->adapter;

    struct ixgbe_ring *ring;

    int per_ring_budget;

    bool clean_complete = true;

#ifdef CONFIG_IXGBE_DCA

    if (adapter->flags & IXGBE_FLAG_DCA_ENABLED)

        ixgbe_update_dca(q_vector);

#endif

    ixgbe_for_each_ring(ring, q_vector->tx)----------------------遍历tx队列，跟本文讨论的内容不相关，本文讨论收包

        clean_complete &= !!ixgbe_clean_tx_irq(q_vector, ring);

    if (!ixgbe_qv_lock_napi(q_vector))

        return budget;

    /* attempt to distribute budget to each queue fairly, but don't allow

     * the budget to go below 1 because we'll exit polling */

    if (q_vector->rx.count > )

        per_ring_budget = max(budget/q_vector->rx.count, );---通过增大 budget，从64改大到256，增加了一次循环收包的个数

    else

        per_ring_budget = budget;

    ixgbe_for_each_ring(ring, q_vector->rx)-----------遍历收包队列，

        clean_complete &= (ixgbe_clean_rx_irq(q_vector, ring,

                   per_ring_budget) < per_ring_budget);

    ixgbe_qv_unlock_napi(q_vector);

    /* If all work not completed, return budget and keep polling */

    if (!clean_complete)

        return budget;

    /* all work done, exit the polling mode */

    napi_complete(napi);

    if (adapter->rx_itr_setting & )

        ixgbe_set_itr(q_vector);

    if (!test_bit(__IXGBE_DOWN, &adapter->state))

        ixgbe_irq_enable_queues(adapter, ((u64) << q_vector->v_idx));

    return ;

}

通过将 budget 从64改大到256，同样的入向流量，软中断从25%降低到23%，效果很明显。

那么，能否无限制改大呢，显然不行，一则是改大后，各个队列收包就很难均衡，因为每个队列收完对应的报文之后（除非收空了），才能返回，这样，就关中断时间太长了。

可能有人会问，这个改大收包个数，但是处理软中断的总时间应该没变化，为什么会降低呢，取个极限的例子，napi出来就是应对以前单个包就需要一个中断的问题的，所以单次收包

多一些应该是有用的。

除此之外，通过设置网卡的rx-usecs属性，将这个值改大些，也可以降低软中断的占比。

ethtool -c eth0

Coalesce parameters for eth0:

Adaptive RX: off  TX: off

stats-block-usecs:

sample-interval:

pkt-rate-low:

pkt-rate-high: 

rx-usecs: 1-------------默认是1，改成512

ethtool –C eth0 rx-usecs 512

这个减少了硬中断的触发次数，但是呢，显而易见的是，增加了延迟，如果你的系统是要求实时性极高的，可能要减少该值。

第三个方法就是开启gro了，gro开启之后，收包的时候，如果是同一个流的包，且网卡支持gro属性的话，根据协议的回调会尝试进行合并报文，当前由于目前这个流的宏值

只有最多8个流，超过的会直接送上协议栈，不会合并。当然如果是服务器，发包为主的话，其实gro是有害无益的，因为目前的flow个数限制太多，而且对于tcp来说，ack的报文也不会合并。

所以如果是服务器端，要定制化自己的gro，比如使用hash链来管理flow，使用ack合并来减少软中断消耗。

linux 如何降低入向软中断占比的更多相关文章

linux top命令中各cpu占用率含义
linux top命令中各cpu占用率含义 [尊重原创文章摘自:http://www.iteye.com/topic/1137848]0.3% us 用户空间占用CPU百分比 1.0% sy 内核空间 ...
(转)linux top命令中各cpu占用率含义及案例分析
原文:https://blog.csdn.net/ydyang1126/article/details/72820349 linux top命令中各cpu占用率含义 0 性能监控介绍 1 确定应用类型 ...
Linux显示登入系统的帐号名称和总人数
Linux显示登入系统的帐号名称和总人数 youhaidong@youhaidong-ThinkPad-Edge-E545:~$ who -q youhaidong youhaidong # 用户数= ...
Linux下分析某个进程CPU占用率高的原因
Linux下分析某个进程CPU占用率高的原因通过top命令找出消耗资源高的线程id,利用strace命令查看该线程所有系统调用 1.top 查到占用cpu高的进程pid 2.查看该pid的线程 ...
Linux系统下如何查看物理内存占用率
Linux系统下如何查看物理内存占用率 Linux下看内存和CPU使用率一般都用top命令,但是实际在用的时候,用top查看出来的内存占用率都非常高,如:Mem: 4086496k total, ...
Linux下如何查看高CPU占用率线程
转于:http://www.cnblogs.com/lidabo/p/4738113.html 目录(?)[-] proc文件系统 proccpuinfo文件 procstat文件 procpidst ...
Linux下如何查看高CPU占用率线程 LINUX CPU利用率计算
目录(?)[-] proc文件系统 proccpuinfo文件 procstat文件 procpidstat文件 procpidtasktidstat文件系统中有关进程cpu使用率的常用命令 ps ...
linux中断源码分析 - 软中断(四)
本文为原创,转载请注明:http://www.cnblogs.com/tolimit/ 在上一篇文章中,我们看到中断实际分为了两个部分,俗称就是一部分是硬中断,一部分是软中断.软中断是专门用于处理中断 ...
Linux环境下进程的CPU占用率
阿里云服务器网站:https://promotion.aliyun.com/ntms/yunparter/invite.html?userCode=qqwovx6h 文字来源:http://www.s ...

随机推荐

flex弹性布局语法介绍及使用
一.语法介绍 Flex布局(弹性布局) ,一种新的布局解决方案可简单.快速的实现网页布局目前市面浏览器已全部支持1.指定容器为flex布局 display: flex; Webkit内核的浏览器, ...
MNIST-NameError: name ‘input_data’ is not defined解决办法
在学习TensorFlow文档教程的时候, 在MNIST入门一节,发现直接运行下载MNIST数据集的代码会报错.原代码如下: import tensorflow.examples.tutorials. ...
对比Tornado和Twisted两种异步Python框架
做Python的人,一定知道两个性能优秀的异步网络框架:tornado,和twisted. 那么,这两个著名的框架,又有什么异同呢?tornado和twisted,我都用在几个游戏项目中,做过后端,觉 ...
python安装第三方库的三种方法
使用pip 大多数库都可以通过pip安装,安装方法为,在命令行窗口输入 pip install libname libname为库名某些库通过pip安装不了,可能是因为没有打包上传到pypi中,可以 ...
设计模式（三）装饰者模式Decorator
装饰者模式针对的问题是:对一个结构已经确定的类,在不改变该类的结构的情况下,动态增加一些功能. 一般来说,都是对一些已经写好的架构增加自己的功能,或者应对多种情况,增加功能. 我们还是来玩一句红警,首 ...
【转】Appium的安装-Mac平台（命令行 dmg）
其实Appium的安装方式主要有两种: 1)自己安装配置nodejs的环境,然后通过npm进行appium的安装 2)直接下载官网提供的dmg进行安装,dmg里面已经有nodejs的环境和appium ...
虚拟机迁移（QEMU动态迁移，Libvirt动（静）态迁移）
动静态迁移的原理静态迁移是指在虚拟机关闭或暂停的情况下,将源宿主机上虚拟机的磁盘文件和配置文件拷贝到目标宿主机上.这种方式需要显式的停止虚拟机运行,对服务可用性要求高的需求不合适. *** 动态迁移 ...
Java集合系列[1]----ArrayList源码分析
本篇分析ArrayList的源码,在分析之前先跟大家谈一谈数组.数组可能是我们最早接触到的数据结构之一,它是在内存中划分出一块连续的地址空间用来进行元素的存储,由于它直接操作内存,所以数组的性能要比集 ...
uImage和zImage的区别
1.各种文件的意义 vmlinux 编译出来的最原始的内核文件,未压缩. zImage 是vmlinux经过gzip压缩后的文件. bzImage bz表示“big zImage”,不是用bzi ...
Java与算法之(2) - 快速排序
快速排序的基本思路是,每次选定数列中的一个基准数,将小于基准数的数字都放到基准数左边,大于基准数的数字都放到基准数右边.然后再分别对基准数左右的两个数列分别重复以上过程.仍以4 3 6 2 7 1 5 ...

linux 如何降低入向软中断占比

linux 如何降低入向软中断占比的更多相关文章

随机推荐

热门专题