ContextSwitch 学习与使用


说明

github上面有一个简单的测试系统调用以及上下文切换的工具.
contextswitch.
下载之后直接make就可以进行简单的测试 需要注意的是 部分arm环境没有:
-mno-avx
这个参数, 需要去掉一下.

官方文档以及说明

Little micro-benchmarks to assess the performance overhead of context
switching. timesyscall: Benchmarks the overhead of a system call.
timectxsw: Benchmarks the overhead of context switching between 2 processes.
timetctxsw: Benchmarks the overhead of context switching between 2 threads.
timectxswws: Benchmarks the overhead of context switching between 2 processes
using a working set of the size specified in argument.
timetctxsw2: Benchmarks the overhead of context switching between 2 threads,
by using a shed_yield() method.
If you do taskset -a 1, all threads should be scheduled on the
same processor, so you are really doing thread context switch.
Then to be sure that you are really doing it, just do:
strace -ff -tt -v taskset -a 1 ./timetctxsw2
Now why sched_yield() is enough for testing ? Because, it place
the current thread at the end of the ready queue. So the next
ready thread will be scheduled.
I also added sched_setscheduler(SCHED_FIFO) to get the best
performances.
From: https://github.com/tsuna/contextswitch

脚本说明

runbench() {
$* ./timesyscall
$* ./timectxsw
$* ./timetctxsw
$* ./timetctxsw2
}
每一组测试内的内容分别为: 1. 系统调用的时间.
2. 2个进程之间的上下文切换的时间.
3. 同一进程内的连个线程切换的时间.
4. shed_yield() method 方法的切换时间 (不太了解) 一共分为三组
第一组不进行设置
第二组绑定CPU但是在两个核心上
第三组绑定到同一个CPU核心上面.

测试结果说明

在我所有的测试环境内:
1. AMD 9T34 无可争议的排第一
2. 相同硬件不同操作系统的差异比较大, 如果比较必须使用相同的操作系统来进行.
3. 国产里面与SPECJVM和SPECCPU的结果完全一样.飞腾<海光<鲲鹏<阿里倚天
阿里倚天无可争议的王者.
4. 十年前的CPU的确不如现在新的CPU. 必须更新换代,性能更好,速度更快.
5. CPU绑核非常有用途,需要进行优化.
6. 协程,轻量级线程是未来. 只有这样性能才会好.

结果图表-1


结果图表-2


E5-2620 2.0Ghz

2 physical CPUs, 6 cores/CPU, 2 hardware threads/core = 24 hw threads total
-- No CPU affinity --
10000000 system calls in 11841646290ns (1184.2ns/syscall)
2000000 process context switches in 6039748545ns (3019.9ns/ctxsw)
2000000 thread context switches in 6745297188ns (3372.6ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 755823488ns (377.9ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 14343751134ns (1434.4ns/syscall)
2000000 process context switches in 16353343542ns (8176.7ns/ctxsw)
2000000 thread context switches in 13617487377ns (6808.7ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 2363107269ns (1181.6ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 11929472188ns (1192.9ns/syscall)
2000000 process context switches in 6915983386ns (3458.0ns/ctxsw)
2000000 thread context switches in 6837489882ns (3418.7ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 795652256ns (397.8ns/ctxsw)

Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz 云海OS虚拟机

1 physical CPUs, 8 cores/CPU, 1 hardware threads/core = 8 hw threads total
-- No CPU affinity --
10000000 system calls in 2841917410ns (284.2ns/syscall)
2000000 process context switches in 7404178178ns (3702.1ns/ctxsw)
2000000 thread context switches in 7502081647ns (3751.0ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 222130514ns (111.1ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 2835862084ns (283.6ns/syscall)
2000000 process context switches in 4990890087ns (2495.4ns/ctxsw)
2000000 thread context switches in 4311646652ns (2155.8ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 870608240ns (435.3ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 2844931708ns (284.5ns/syscall)
2000000 process context switches in 7601947691ns (3801.0ns/ctxsw)
2000000 thread context switches in 7914561498ns (3957.3ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 247057805ns (123.5ns/ctxsw)

Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz 云海OS物理机

2 physical CPUs, 12 cores/CPU, 2 hardware threads/core = 48 hw threads total
-- No CPU affinity --
10000000 system calls in 5769760409ns (577.0ns/syscall)
2000000 process context switches in 7245677219ns (3622.8ns/ctxsw)
2000000 thread context switches in 7069213271ns (3534.6ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 475086926ns (237.5ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 5762431985ns (576.2ns/syscall)
2000000 process context switches in 8692364627ns (4346.2ns/ctxsw)
2000000 thread context switches in 6572286258ns (3286.1ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 1304249661ns (652.1ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 5774310295ns (577.4ns/syscall)
2000000 process context switches in 6869635514ns (3434.8ns/ctxsw)
2000000 thread context switches in 6927117249ns (3463.6ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 473255745ns (236.6ns/ctxsw)

飞腾S2500-物理机器-NFSV3

2 physical CPUs, 128 cores/CPU, 1 hardware threads/core = 256 hw threads total
-- No CPU affinity --
10000000 system calls in 3838470070ns (383.8ns/syscall)
2000000 process context switches in 10913991269ns (5457.0ns/ctxsw)
2000000 thread context switches in 10987973614ns (5494.0ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 354962539ns (177.5ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 3851009222ns (385.1ns/syscall)
2000000 process context switches in 10500204985ns (5250.1ns/ctxsw)
2000000 thread context switches in 8605107251ns (4302.6ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 1694906366ns (847.5ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 3871134715ns (387.1ns/syscall)
2000000 process context switches in 8211223439ns (4105.6ns/ctxsw)
2000000 thread context switches in 8915611368ns (4457.8ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 362941497ns (181.5ns/ctxsw)

飞腾S2500-物理机器-银河麒麟V10

model name : HUAWEI,Kunpeng 920
2 physical CPUs, 128 cores/CPU, 1 hardware threads/core = 256 hw threads total
-- No CPU affinity --
10000000 system calls in 1104251960ns (110.4ns/syscall)
2000000 process context switches in 5502095280ns (2751.0ns/ctxsw)
2000000 thread context switches in 5057680610ns (2528.8ns/ctxsw)
2000000 thread context switches in 159336010ns (79.7ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 1104213220ns (110.4ns/syscall)
2000000 process context switches in 3157105260ns (1578.6ns/ctxsw)
2000000 thread context switches in 2749304460ns (1374.7ns/ctxsw)
2000000 thread context switches in 520588690ns (260.3ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 1104361790ns (110.4ns/syscall)
2000000 process context switches in 2554260900ns (1277.1ns/ctxsw)
2000000 thread context switches in 2501093900ns (1250.5ns/ctxsw)
2000000 thread context switches in 159835540ns (79.9ns/ctxsw)

飞腾S2500-KVM虚拟机

10000000 system calls in 2016128780ns (201.6ns/syscall)
2000000 process context switches in 20813179318ns (10406.6ns/ctxsw)
2000000 thread context switches in 21270077053ns (10635.0ns/ctxsw)
2000000 thread context switches in 283497350ns (141.7ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 2003773606ns (200.4ns/syscall)
2000000 process context switches in 7149973534ns (3575.0ns/ctxsw)
2000000 thread context switches in 6041671015ns (3020.8ns/ctxsw)
2000000 thread context switches in 1184706267ns (592.4ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 1996452026ns (199.6ns/syscall)
2000000 process context switches in 20093433102ns (10046.7ns/ctxsw)
2000000 thread context switches in 20838253803ns (10419.1ns/ctxsw)
2000000 thread context switches in 284723964ns (142.4ns/ctxsw)

海光机器

model name : Hygon C86 7285 32-core Processor
pgrep: cannot allocate 4611686018427387903 bytes
2 physical CPUs, 32 cores/CPU, 2 hardware threads/core = 128 hw threads total
-- No CPU affinity --
10000000 system calls in 1188373575ns (118.8ns/syscall)
2000000 process context switches in 7182741168ns (3591.4ns/ctxsw)
2000000 thread context switches in 5057264353ns (2528.6ns/ctxsw)
2000000 thread context switches in 218741918ns (109.4ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 1199538092ns (120.0ns/syscall)
2000000 process context switches in 4926579090ns (2463.3ns/ctxsw)
2000000 thread context switches in 4116607893ns (2058.3ns/ctxsw)
2000000 thread context switches in 877003690ns (438.5ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 1207213049ns (120.7ns/syscall)
2000000 process context switches in 4803238321ns (2401.6ns/ctxsw)
2000000 thread context switches in 5033478360ns (2516.7ns/ctxsw)
2000000 thread context switches in 218102516ns (109.1ns/ctxsw)

鲲鹏机器

2 physical CPUs, 128 cores/CPU, 1 hardware threads/core = 256 hw threads total
-- No CPU affinity --
10000000 system calls in 1628256836ns (162.8ns/syscall)
2000000 process context switches in 3567828849ns (1783.9ns/ctxsw)
2000000 thread context switches in 3366796751ns (1683.4ns/ctxsw)
2000000 thread context switches in 208056729ns (104.0ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 3957162873ns (395.7ns/syscall)
2000000 process context switches in 66176473553ns (33088.2ns/ctxsw)
2000000 thread context switches in 64858764678ns (32429.4ns/ctxsw)
2000000 thread context switches in 9224336984ns (4612.2ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 1658580824ns (165.9ns/syscall)
2000000 process context switches in 4162672768ns (2081.3ns/ctxsw)
2000000 thread context switches in 3930988507ns (1965.5ns/ctxsw)
2000000 thread context switches in 206905930ns (103.5ns/ctxsw)

Intel 8369HB 3.3Ghz

10000000 system calls in 2039800553ns (204.0ns/syscall)
2000000 process context switches in 3484116193ns (1742.1ns/ctxsw)
2000000 thread context switches in 3504345370ns (1752.2ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 163336302ns (81.7ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 2042749498ns (204.3ns/syscall)
2000000 process context switches in 3512477901ns (1756.2ns/ctxsw)
2000000 thread context switches in 3037479215ns (1518.7ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 589604636ns (294.8ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 2037861063ns (203.8ns/syscall)
2000000 process context switches in 3543912186ns (1772.0ns/ctxsw)
2000000 thread context switches in 3575216872ns (1787.6ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 164079529ns (82.0ns/ctxsw)

阿里倚天710

1 physical CPUs, 8 cores/CPU, 1 hardware threads/core = 8 hw threads total
-- No CPU affinity --
10000000 system calls in 672626352ns (67.3ns/syscall)
2000000 process context switches in 3586487130ns (1793.2ns/ctxsw)
2000000 thread context switches in 3228362627ns (1614.2ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 102817391ns (51.4ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 672290182ns (67.2ns/syscall)
2000000 process context switches in 1990312435ns (995.2ns/ctxsw)
2000000 thread context switches in 1682598464ns (841.3ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 328222163ns (164.1ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 672409838ns (67.2ns/syscall)
2000000 process context switches in 3347526340ns (1673.8ns/ctxsw)
2000000 thread context switches in 3100110717ns (1550.1ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 102631615ns (51.3ns/ctxsw)

AMD 9T34

model name : AMD EPYC 9T34 64-Core Processor
1 physical CPUs, 8 cores/CPU, 2 hardware threads/core = 16 hw threads total
-- No CPU affinity --
10000000 system calls in 553414290ns (55.3ns/syscall)
2000000 process context switches in 1963917388ns (982.0ns/ctxsw)
2000000 thread context switches in 2131473467ns (1065.7ns/ctxsw)
2000000 thread context switches in 115396178ns (57.7ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 554322086ns (55.4ns/syscall)
2000000 process context switches in 2730693871ns (1365.3ns/ctxsw)
2000000 thread context switches in 2559121196ns (1279.6ns/ctxsw)
2000000 thread context switches in 550724648ns (275.4ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 553295602ns (55.3ns/syscall)
2000000 process context switches in 2011838005ns (1005.9ns/ctxsw)
2000000 thread context switches in 2027328701ns (1013.7ns/ctxsw)
2000000 thread context switches in 114914625ns (57.5ns/ctxsw)

ContextSwitch 学习与使用的更多相关文章

  1. 从直播编程到直播教育:LiveEdu.tv开启多元化的在线学习直播时代

    2015年9月,一个叫Livecoding.tv的网站在互联网上引起了编程界的注意.缘于Pingwest品玩的一位编辑在上网时无意中发现了这个网站,并写了一篇文章<一个比直播睡觉更奇怪的网站:直 ...

  2. Angular2学习笔记(1)

    Angular2学习笔记(1) 1. 写在前面 之前基于Electron写过一个Markdown编辑器.就其功能而言,主要功能已经实现,一些小的不影响使用的功能由于时间关系还没有完成:但就代码而言,之 ...

  3. ABP入门系列(1)——学习Abp框架之实操演练

    作为.Net工地搬砖长工一名,一直致力于挖坑(Bug)填坑(Debug),但技术却不见长进.也曾热情于新技术的学习,憧憬过成为技术大拿.从前端到后端,从bootstrap到javascript,从py ...

  4. 消息队列——RabbitMQ学习笔记

    消息队列--RabbitMQ学习笔记 1. 写在前面 昨天简单学习了一个消息队列项目--RabbitMQ,今天趁热打铁,将学到的东西记录下来. 学习的资料主要是官网给出的6个基本的消息发送/接收模型, ...

  5. js学习笔记:webpack基础入门(一)

    之前听说过webpack,今天想正式的接触一下,先跟着webpack的官方用户指南走: 在这里有: 如何安装webpack 如何使用webpack 如何使用loader 如何使用webpack的开发者 ...

  6. Unity3d学习 制作地形

    这周学习了如何在unity中制作地形,就是在一个Terrain的对象上盖几座小山,在山底种几棵树,那就讲一下如何完成上述内容. 1.在新键得项目的游戏的Hierarchy目录中新键一个Terrain对 ...

  7. 《Django By Example》第四章 中文 翻译 (个人学习,渣翻)

    书籍出处:https://www.packtpub.com/web-development/django-example 原作者:Antonio Melé (译者注:祝大家新年快乐,这次带来<D ...

  8. 菜鸟Python学习笔记第一天:关于一些函数库的使用

    2017年1月3日 星期二 大一学习一门新的计算机语言真的很难,有时候连函数拼写出错查错都能查半天,没办法,谁让我英语太渣. 关于计算机语言的学习我想还是从C语言学习开始为好,Python有很多语言的 ...

  9. 多线程爬坑之路-学习多线程需要来了解哪些东西?(concurrent并发包的数据结构和线程池,Locks锁,Atomic原子类)

    前言:刚学习了一段机器学习,最近需要重构一个java项目,又赶过来看java.大多是线程代码,没办法,那时候总觉得多线程是个很难的部分很少用到,所以一直没下决定去啃,那些年留下的坑,总是得自己跳进去填 ...

  10. node.js学习(三)简单的node程序&&模块简单使用&&commonJS规范&&深入理解模块原理

    一.一个简单的node程序 1.新建一个txt文件 2.修改后缀 修改之后会弹出这个,点击"是" 3.运行test.js 源文件 使用node.js运行之后的. 如果该路径下没有该 ...

随机推荐

  1. 华为云发布CodeArts Inspector漏洞管理服务,守护产品研发安全

    本文分享自华为云社区<华为云发布CodeArts Inspector漏洞管理服务,守护产品研发安全>,作者: 华为云头条. 2023年9月7日,华为云正式发布CodeArts Inspec ...

  2. 干货时间:聊聊DevOps下的技术系列之契约测试

    摘要:本期和大家简单聊聊在服务交互场景下使用服务契约的重要性,以及契约管理的必要性,最后简单介绍了下契约测试. 1.服务交互带来的问题 在上一篇文章中,我们系统的列举了DevOps各个流程中常用的测试 ...

  3. Intellij Java JNI 调用 C++

    也可以用 JNA,但性能没有 JNI 好.JNA的Demo没有做,可以参考(https://www.bilibili.com/video/BV1xU4y1F7Ep/?spm_id_from=autoN ...

  4. RabbitMQ--工作模式

    单一模式 即单机不做集群 普通模式 即默认模式,对于消息队列载体,消息实体只存在某个节点中,每个节点仅有 相同的元数据,即队列的结构 当消息进入A节点的消息队列载体后,消费 者从B节点消费时,rabb ...

  5. 神经网络优化篇:详解RMSprop

    RMSprop 知道了动量(Momentum)可以加快梯度下降,还有一个叫做RMSprop的算法,全称是root mean square prop算法,它也可以加速梯度下降,来看看它是如何运作的. 回 ...

  6. Codeforces Round #667 (Div. 3) A - D题题解

    Codeforces Round #667 (Div. 3) A - D Problem A - Yet Another Two Integers Problem https://codeforces ...

  7. 2022 开源之夏 | Serverless Devs 陪你“变得更强”

    Serverless 是近年来云计算领域热门话题,凭借极致弹性.按量付费.降本提效等众多优势受到很多人的追捧,各云厂商也在不断地布局 Serverless 领域.但是随着时间的发展,Serverles ...

  8. 定向减免!函数计算让 ETL 数据加工更简单

    业内较为常见的高频短时 ETL 数据加工场景,即频率高时延短,一般费用大头均在函数调用次数上,推荐方案一般为攒批处理,高额的计算成本往往令用户感到头疼,函数计算推出定向减免方案,让 ETL数据加工更简 ...

  9. Blazor的技术优点

    Blazor是一种使用.NET和C#构建客户端Web应用程序的新兴技术.它允许开发者在浏览器中直接运行.NET代码,而无需依赖JavaScript.Blazor的技术优点主要表现在以下几个方面: 单一 ...

  10. 神经网络优化篇:详解学习率衰减(Learning rate decay)

    学习率衰减 加快学习算法的一个办法就是随时间慢慢减少学习率,将之称为学习率衰减,来看看如何做到,首先通过一个例子看看,为什么要计算学习率衰减. 假设要使用mini-batch梯度下降法,mini-ba ...