ContextSwitch 学习与使用


说明

github上面有一个简单的测试系统调用以及上下文切换的工具.
contextswitch.
下载之后直接make就可以进行简单的测试 需要注意的是 部分arm环境没有:
-mno-avx
这个参数, 需要去掉一下.

官方文档以及说明

Little micro-benchmarks to assess the performance overhead of context
switching. timesyscall: Benchmarks the overhead of a system call.
timectxsw: Benchmarks the overhead of context switching between 2 processes.
timetctxsw: Benchmarks the overhead of context switching between 2 threads.
timectxswws: Benchmarks the overhead of context switching between 2 processes
using a working set of the size specified in argument.
timetctxsw2: Benchmarks the overhead of context switching between 2 threads,
by using a shed_yield() method.
If you do taskset -a 1, all threads should be scheduled on the
same processor, so you are really doing thread context switch.
Then to be sure that you are really doing it, just do:
strace -ff -tt -v taskset -a 1 ./timetctxsw2
Now why sched_yield() is enough for testing ? Because, it place
the current thread at the end of the ready queue. So the next
ready thread will be scheduled.
I also added sched_setscheduler(SCHED_FIFO) to get the best
performances.
From: https://github.com/tsuna/contextswitch

脚本说明

runbench() {
$* ./timesyscall
$* ./timectxsw
$* ./timetctxsw
$* ./timetctxsw2
}
每一组测试内的内容分别为: 1. 系统调用的时间.
2. 2个进程之间的上下文切换的时间.
3. 同一进程内的连个线程切换的时间.
4. shed_yield() method 方法的切换时间 (不太了解) 一共分为三组
第一组不进行设置
第二组绑定CPU但是在两个核心上
第三组绑定到同一个CPU核心上面.

测试结果说明

在我所有的测试环境内:
1. AMD 9T34 无可争议的排第一
2. 相同硬件不同操作系统的差异比较大, 如果比较必须使用相同的操作系统来进行.
3. 国产里面与SPECJVM和SPECCPU的结果完全一样.飞腾<海光<鲲鹏<阿里倚天
阿里倚天无可争议的王者.
4. 十年前的CPU的确不如现在新的CPU. 必须更新换代,性能更好,速度更快.
5. CPU绑核非常有用途,需要进行优化.
6. 协程,轻量级线程是未来. 只有这样性能才会好.

结果图表-1


结果图表-2


E5-2620 2.0Ghz

2 physical CPUs, 6 cores/CPU, 2 hardware threads/core = 24 hw threads total
-- No CPU affinity --
10000000 system calls in 11841646290ns (1184.2ns/syscall)
2000000 process context switches in 6039748545ns (3019.9ns/ctxsw)
2000000 thread context switches in 6745297188ns (3372.6ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 755823488ns (377.9ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 14343751134ns (1434.4ns/syscall)
2000000 process context switches in 16353343542ns (8176.7ns/ctxsw)
2000000 thread context switches in 13617487377ns (6808.7ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 2363107269ns (1181.6ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 11929472188ns (1192.9ns/syscall)
2000000 process context switches in 6915983386ns (3458.0ns/ctxsw)
2000000 thread context switches in 6837489882ns (3418.7ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 795652256ns (397.8ns/ctxsw)

Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz 云海OS虚拟机

1 physical CPUs, 8 cores/CPU, 1 hardware threads/core = 8 hw threads total
-- No CPU affinity --
10000000 system calls in 2841917410ns (284.2ns/syscall)
2000000 process context switches in 7404178178ns (3702.1ns/ctxsw)
2000000 thread context switches in 7502081647ns (3751.0ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 222130514ns (111.1ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 2835862084ns (283.6ns/syscall)
2000000 process context switches in 4990890087ns (2495.4ns/ctxsw)
2000000 thread context switches in 4311646652ns (2155.8ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 870608240ns (435.3ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 2844931708ns (284.5ns/syscall)
2000000 process context switches in 7601947691ns (3801.0ns/ctxsw)
2000000 thread context switches in 7914561498ns (3957.3ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 247057805ns (123.5ns/ctxsw)

Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz 云海OS物理机

2 physical CPUs, 12 cores/CPU, 2 hardware threads/core = 48 hw threads total
-- No CPU affinity --
10000000 system calls in 5769760409ns (577.0ns/syscall)
2000000 process context switches in 7245677219ns (3622.8ns/ctxsw)
2000000 thread context switches in 7069213271ns (3534.6ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 475086926ns (237.5ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 5762431985ns (576.2ns/syscall)
2000000 process context switches in 8692364627ns (4346.2ns/ctxsw)
2000000 thread context switches in 6572286258ns (3286.1ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 1304249661ns (652.1ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 5774310295ns (577.4ns/syscall)
2000000 process context switches in 6869635514ns (3434.8ns/ctxsw)
2000000 thread context switches in 6927117249ns (3463.6ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 473255745ns (236.6ns/ctxsw)

飞腾S2500-物理机器-NFSV3

2 physical CPUs, 128 cores/CPU, 1 hardware threads/core = 256 hw threads total
-- No CPU affinity --
10000000 system calls in 3838470070ns (383.8ns/syscall)
2000000 process context switches in 10913991269ns (5457.0ns/ctxsw)
2000000 thread context switches in 10987973614ns (5494.0ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 354962539ns (177.5ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 3851009222ns (385.1ns/syscall)
2000000 process context switches in 10500204985ns (5250.1ns/ctxsw)
2000000 thread context switches in 8605107251ns (4302.6ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 1694906366ns (847.5ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 3871134715ns (387.1ns/syscall)
2000000 process context switches in 8211223439ns (4105.6ns/ctxsw)
2000000 thread context switches in 8915611368ns (4457.8ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 362941497ns (181.5ns/ctxsw)

飞腾S2500-物理机器-银河麒麟V10

model name : HUAWEI,Kunpeng 920
2 physical CPUs, 128 cores/CPU, 1 hardware threads/core = 256 hw threads total
-- No CPU affinity --
10000000 system calls in 1104251960ns (110.4ns/syscall)
2000000 process context switches in 5502095280ns (2751.0ns/ctxsw)
2000000 thread context switches in 5057680610ns (2528.8ns/ctxsw)
2000000 thread context switches in 159336010ns (79.7ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 1104213220ns (110.4ns/syscall)
2000000 process context switches in 3157105260ns (1578.6ns/ctxsw)
2000000 thread context switches in 2749304460ns (1374.7ns/ctxsw)
2000000 thread context switches in 520588690ns (260.3ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 1104361790ns (110.4ns/syscall)
2000000 process context switches in 2554260900ns (1277.1ns/ctxsw)
2000000 thread context switches in 2501093900ns (1250.5ns/ctxsw)
2000000 thread context switches in 159835540ns (79.9ns/ctxsw)

飞腾S2500-KVM虚拟机

10000000 system calls in 2016128780ns (201.6ns/syscall)
2000000 process context switches in 20813179318ns (10406.6ns/ctxsw)
2000000 thread context switches in 21270077053ns (10635.0ns/ctxsw)
2000000 thread context switches in 283497350ns (141.7ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 2003773606ns (200.4ns/syscall)
2000000 process context switches in 7149973534ns (3575.0ns/ctxsw)
2000000 thread context switches in 6041671015ns (3020.8ns/ctxsw)
2000000 thread context switches in 1184706267ns (592.4ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 1996452026ns (199.6ns/syscall)
2000000 process context switches in 20093433102ns (10046.7ns/ctxsw)
2000000 thread context switches in 20838253803ns (10419.1ns/ctxsw)
2000000 thread context switches in 284723964ns (142.4ns/ctxsw)

海光机器

model name : Hygon C86 7285 32-core Processor
pgrep: cannot allocate 4611686018427387903 bytes
2 physical CPUs, 32 cores/CPU, 2 hardware threads/core = 128 hw threads total
-- No CPU affinity --
10000000 system calls in 1188373575ns (118.8ns/syscall)
2000000 process context switches in 7182741168ns (3591.4ns/ctxsw)
2000000 thread context switches in 5057264353ns (2528.6ns/ctxsw)
2000000 thread context switches in 218741918ns (109.4ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 1199538092ns (120.0ns/syscall)
2000000 process context switches in 4926579090ns (2463.3ns/ctxsw)
2000000 thread context switches in 4116607893ns (2058.3ns/ctxsw)
2000000 thread context switches in 877003690ns (438.5ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 1207213049ns (120.7ns/syscall)
2000000 process context switches in 4803238321ns (2401.6ns/ctxsw)
2000000 thread context switches in 5033478360ns (2516.7ns/ctxsw)
2000000 thread context switches in 218102516ns (109.1ns/ctxsw)

鲲鹏机器

2 physical CPUs, 128 cores/CPU, 1 hardware threads/core = 256 hw threads total
-- No CPU affinity --
10000000 system calls in 1628256836ns (162.8ns/syscall)
2000000 process context switches in 3567828849ns (1783.9ns/ctxsw)
2000000 thread context switches in 3366796751ns (1683.4ns/ctxsw)
2000000 thread context switches in 208056729ns (104.0ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 3957162873ns (395.7ns/syscall)
2000000 process context switches in 66176473553ns (33088.2ns/ctxsw)
2000000 thread context switches in 64858764678ns (32429.4ns/ctxsw)
2000000 thread context switches in 9224336984ns (4612.2ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 1658580824ns (165.9ns/syscall)
2000000 process context switches in 4162672768ns (2081.3ns/ctxsw)
2000000 thread context switches in 3930988507ns (1965.5ns/ctxsw)
2000000 thread context switches in 206905930ns (103.5ns/ctxsw)

Intel 8369HB 3.3Ghz

10000000 system calls in 2039800553ns (204.0ns/syscall)
2000000 process context switches in 3484116193ns (1742.1ns/ctxsw)
2000000 thread context switches in 3504345370ns (1752.2ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 163336302ns (81.7ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 2042749498ns (204.3ns/syscall)
2000000 process context switches in 3512477901ns (1756.2ns/ctxsw)
2000000 thread context switches in 3037479215ns (1518.7ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 589604636ns (294.8ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 2037861063ns (203.8ns/syscall)
2000000 process context switches in 3543912186ns (1772.0ns/ctxsw)
2000000 thread context switches in 3575216872ns (1787.6ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 164079529ns (82.0ns/ctxsw)

阿里倚天710

1 physical CPUs, 8 cores/CPU, 1 hardware threads/core = 8 hw threads total
-- No CPU affinity --
10000000 system calls in 672626352ns (67.3ns/syscall)
2000000 process context switches in 3586487130ns (1793.2ns/ctxsw)
2000000 thread context switches in 3228362627ns (1614.2ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 102817391ns (51.4ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 672290182ns (67.2ns/syscall)
2000000 process context switches in 1990312435ns (995.2ns/ctxsw)
2000000 thread context switches in 1682598464ns (841.3ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 328222163ns (164.1ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 672409838ns (67.2ns/syscall)
2000000 process context switches in 3347526340ns (1673.8ns/ctxsw)
2000000 thread context switches in 3100110717ns (1550.1ns/ctxsw)
sched_setscheduler(): Operation not permitted
2000000 thread context switches in 102631615ns (51.3ns/ctxsw)

AMD 9T34

model name : AMD EPYC 9T34 64-Core Processor
1 physical CPUs, 8 cores/CPU, 2 hardware threads/core = 16 hw threads total
-- No CPU affinity --
10000000 system calls in 553414290ns (55.3ns/syscall)
2000000 process context switches in 1963917388ns (982.0ns/ctxsw)
2000000 thread context switches in 2131473467ns (1065.7ns/ctxsw)
2000000 thread context switches in 115396178ns (57.7ns/ctxsw)
-- With CPU affinity --
10000000 system calls in 554322086ns (55.4ns/syscall)
2000000 process context switches in 2730693871ns (1365.3ns/ctxsw)
2000000 thread context switches in 2559121196ns (1279.6ns/ctxsw)
2000000 thread context switches in 550724648ns (275.4ns/ctxsw)
-- With CPU affinity to CPU 0 --
10000000 system calls in 553295602ns (55.3ns/syscall)
2000000 process context switches in 2011838005ns (1005.9ns/ctxsw)
2000000 thread context switches in 2027328701ns (1013.7ns/ctxsw)
2000000 thread context switches in 114914625ns (57.5ns/ctxsw)

ContextSwitch 学习与使用的更多相关文章

  1. 从直播编程到直播教育:LiveEdu.tv开启多元化的在线学习直播时代

    2015年9月,一个叫Livecoding.tv的网站在互联网上引起了编程界的注意.缘于Pingwest品玩的一位编辑在上网时无意中发现了这个网站,并写了一篇文章<一个比直播睡觉更奇怪的网站:直 ...

  2. Angular2学习笔记(1)

    Angular2学习笔记(1) 1. 写在前面 之前基于Electron写过一个Markdown编辑器.就其功能而言,主要功能已经实现,一些小的不影响使用的功能由于时间关系还没有完成:但就代码而言,之 ...

  3. ABP入门系列(1)——学习Abp框架之实操演练

    作为.Net工地搬砖长工一名,一直致力于挖坑(Bug)填坑(Debug),但技术却不见长进.也曾热情于新技术的学习,憧憬过成为技术大拿.从前端到后端,从bootstrap到javascript,从py ...

  4. 消息队列——RabbitMQ学习笔记

    消息队列--RabbitMQ学习笔记 1. 写在前面 昨天简单学习了一个消息队列项目--RabbitMQ,今天趁热打铁,将学到的东西记录下来. 学习的资料主要是官网给出的6个基本的消息发送/接收模型, ...

  5. js学习笔记:webpack基础入门(一)

    之前听说过webpack,今天想正式的接触一下,先跟着webpack的官方用户指南走: 在这里有: 如何安装webpack 如何使用webpack 如何使用loader 如何使用webpack的开发者 ...

  6. Unity3d学习 制作地形

    这周学习了如何在unity中制作地形,就是在一个Terrain的对象上盖几座小山,在山底种几棵树,那就讲一下如何完成上述内容. 1.在新键得项目的游戏的Hierarchy目录中新键一个Terrain对 ...

  7. 《Django By Example》第四章 中文 翻译 (个人学习,渣翻)

    书籍出处:https://www.packtpub.com/web-development/django-example 原作者:Antonio Melé (译者注:祝大家新年快乐,这次带来<D ...

  8. 菜鸟Python学习笔记第一天:关于一些函数库的使用

    2017年1月3日 星期二 大一学习一门新的计算机语言真的很难,有时候连函数拼写出错查错都能查半天,没办法,谁让我英语太渣. 关于计算机语言的学习我想还是从C语言学习开始为好,Python有很多语言的 ...

  9. 多线程爬坑之路-学习多线程需要来了解哪些东西?(concurrent并发包的数据结构和线程池,Locks锁,Atomic原子类)

    前言:刚学习了一段机器学习,最近需要重构一个java项目,又赶过来看java.大多是线程代码,没办法,那时候总觉得多线程是个很难的部分很少用到,所以一直没下决定去啃,那些年留下的坑,总是得自己跳进去填 ...

  10. node.js学习(三)简单的node程序&&模块简单使用&&commonJS规范&&深入理解模块原理

    一.一个简单的node程序 1.新建一个txt文件 2.修改后缀 修改之后会弹出这个,点击"是" 3.运行test.js 源文件 使用node.js运行之后的. 如果该路径下没有该 ...

随机推荐

  1. 快速入门Mybatis完成基本CURD(注解实现)

    一.什么是Mybatis? MyBatis 是一款优秀的持久层框架,它支持定制化 SQL.存储过程以及高级映射. MyBatis 避免了几乎所有的 JDBC 代码和手动设置参数以及获取结果集. MyB ...

  2. 花了1块钱体验一把最近很火的ChatGPT

    前言 最近 OpenAI 发布了 ChatGPT,一经发布就在科技圈火得不行. ChatGPT是什么呢? 简单得说,ChatGPT,是一种基于对话的 AI 聊天工具.我们来看看ChatGPT自己得回答 ...

  3. 神经网络基础篇:关于 python_numpy 向量的说明(A note on python or numpy vectors)

    关于 python_numpy 向量的说明 主要讲Python中的numpy一维数组的特性,以及与行向量或列向量的区别.并说一下在实际应用中的一些小技巧,去避免在coding中由于这些特性而导致的bu ...

  4. ROMA集成关键技术:增量数据集成

    摘要:本文将详解ROMA集成关键技术-增量数据集成技术. 本文分享自华为云社区<ROMA集成关键技术(2)-增量数据集成技术>,作者:华为云PaaS服务小智 . 1.概述 ROMA平台的核 ...

  5. 华为云应用服务网格最佳实践之从Spring Cloud 到 Istio

    摘要:在全球首届社区峰会IstioCon 2021中,华为云应用服务网格首席架构师张超盟发表了<Best practice:from Spring Cloud to Istio>主题演讲, ...

  6. 一文掌握GaussDB(DWS) SQL进阶技能:全文检索

    摘要:本文简要介绍了GaussDB(DWS)全文检索的原理和使用方法. 全文检索(Text search)顾名思义,就是在给定的文档中查找指定模式(pattern)的过程.GaussDB(DWS)支持 ...

  7. 火山引擎DataLeap一站式数据治理解决方案及平台架构

    更多技术交流.求职机会,欢迎关注字节跳动数据平台微信公众号,回复[1]进入官方交流群 在字节跳动内部,DataLeap数据平台数据治理团队致力于建立一站式.全链路的数据治理解决方案平台. 数据治理的概 ...

  8. 火山引擎 DataLeap 助你拥有 Notebook 交互式的开发体验

    更多技术交流.求职机会,欢迎关注字节跳动数据平台微信公众号,回复[1]进入官方交流群   Notebook 是一种支持 REPL 模式的开发环境.所谓「REPL」,即「读取-求值-输出」循环:输入一段 ...

  9. Django增删改查

    增删改查.配置对应路由,函数,视图.报错注意看控制台. 添加取到前台传来的参数,后端给予验证.入库 编辑,取到当前编辑得id,在后台查到对应数据.重新update 删除,取到当前点击ID,后台dele ...

  10. AIGC

    博客目录 本地部署modelscope-agent python 使用 Google Gemini API MetaGPT MetaGPT day01: MetaGPT作者代码走读.软件公司初始示例