Linux thread process and kernel mode and user mode page table
Linux 中线程和进程切换的开销:
Linux 操作系统层面的进程和线程的实现都是task_struct描述符. task_struct 包含成员变量:内核态stack. 这些都存在3-4G虚拟地址空间的内核态空间中。内核栈用于保存各个寄存器值:CS,DS,SS等. os层面的线程进程切换,都是在kernel mode下操作的。每个process都有自己unique的内核栈(因为每个process对应一个task_struct,kernel stack is member of the struct).
process context switch: 从user mode 到kernel mode, 内核stack用于保存user mode的寄存器值,用于下次返回用户态时候,能够通过寄存器找到指令和内存地址。user mode 通过中断进去kernel mode,通过int $80 syscall mechanism,找到中断处理程序:
包括:
The int instruction is a complex multi step instruction. Here is an explanation of what it does:
1.) Extracts descriptor from IDT (IDT address stored in special register) and checks that CPL <= DPL. CPL is a current privilege level, which could be read from CS register. DPL is stored in the IDT descriptor. As a consequence of this - you can't generate some exceptions (f.e. page fault) from user space directly by int instruction. If you will try to do this, you will get general protection exception
2.) The processor switches to the stack defined in TSS. TSS was initialized earlier, and already contains values of ESP and SS, which holds the kernel stack address. So now ESP points to kernel stack.
3.) The processor pushes to the newly switched kernel stack user space registers: ss, esp, eflags, cs, eip. We need to return back after syscall is served, right?
4.) Next processor set CS and EIP from IDT descriptor. This address defines exception vector entry point.
5.) Here we are in the syscall exception vector in kernel.
以上是user to kernel,那么如果是线程进程切换呢?sched_yield system call会接着把选择一个线程进行切换,把new 线程的内核栈pop到寄存器中,正式进入新线程的内核态,然后返回user mode。完成切换
区别呢?proces 切换包括 虚拟地址空间的切换,切换的实质就是cr3切换(内存空间切换,在switch_mm函数中)+ 寄存器切换(包括EIP,ESP等,均在switch_to函数中). 任何线程内核态的页表完全一样,是共享的。只有用户态页表不同。这就是主要区别,就是页表,由此到来的TLB 失效,导致的性能开销。 所谓TLB,是因为TLB存在最近使用的页表项,页表本身是物理内存。TLB减少了页表项的寻址.
用户层面的线程栈大小为什么是8MB限制。因为很多语言都支持多线程。例如C++ pthread,所谓线程栈都在进程地址空间的stack栈区。不同线程栈不应该相互重叠,否则会写坏各自的栈区crash。所以如果不事先规定stack的地址和大小。而是无限增长,那么肯定会重叠。且分配过大会导致可create的线程数变小。用户态线程切换的本质就是寄存器的切换,非常轻量级别
CPU的特权级别:ring 0- ring 3. cs段选择子本质就是cs寄存器的值,包括index 和 CPL,index用于找到段描述符表的一个段描述符entry的偏移地址。段描述符包含段基址和DPL,也就是段地址:线性地址。同时表明这个线性地址的特权级别。注意分段机制下,cs和ds,ss段看成不同的段,现代os已经废除分段机制,intel只是为了兼容。内核态的cs,ss,ds段都会把DPL置成0,表明user mode 的指令不能操作它们。这就是保护模式。那么为什么需要RPL呢?
RPL – Requested Privilege Level
These are the last two bits of DS, ES, SS, FS, GS registers. RPL field is used to harden the CPL, when higher-privileged code is servicing lower-privileged processes requests.
Assume a higher-privileged device-driver that supports a mechanism where, it can copy data from disks directly into lower-privileged processes’ data-segments. Lower-privileged processes must pass their data-segment details (selector, address and size of data to copy) to the device-driver so that device-driver can copy data into appropriate location.
Since a device-driver is higher-privileged, a lower-privileged process can trick the driver to copy data into high-privileged data-segments, simply by passing wrong selector value. This kind of exploit is called, Privilege Escalation.
How RPL helps to solve Privilege Escalation problem?
Continuing the above example, whenever device-driver loads the destination segment, it modifies the destination segment’s RPL to match the requestor (lower-privileged) process. Since protection rules for data-segments check for both CPL <= DPL and RPL <= DPL conditions, higher-privileged process gets a protection-fault on RPL <= DPL check.
The point to note is, higher-privileged code, when it is providing services to lower-privileged processes should reduce its privilege temporarily to the requestors’ privilege-level.
cpu 的privilege 模式可以保护内存,如果user态范围了受保护的内存地址,会触发segment fault error.
至于二级页表的根本目的就是减少连续虚拟地址空间的需求,不然32位的process 会需要4MB的页表大小(单页4KB前提下)。 因为物理页框的大小是4KB,那么虚拟线性地址空间如果找到物理地址呢?假如采用直接映射的话,一个页表项对应一个页框,4GB/4KB=1MB。需要1mb个页表项进行映射,那么每个页表项需要多少bytes呢?1MB有20bit,所以最少需要20bit,3bytes大小,实际取4bytes大小。所以不采用分页目录,每个进程页表4MB物理内存。 4KB的物理页框是2的12次方个的物理地址。说明如果是32位的话,后12位可以不考虑,直接寻址前20位。
https://blog.csdn.net/displayMessage/article/details/80905810
Linux thread process and kernel mode and user mode page table的更多相关文章
- WSL(Windows Subsystem for Linux)--Pico Process Overview
[转载] Windows Subsystem for Linux -- Pico Process Overview Overview This post discusses pico processe ...
- Android开发:Android虚拟机启动错误Can't find 'Linux version ' string in kernel image file
Android启动出错,虚拟机报错信息如下: Starting emulator for AVD 'test' emulator: ERROR: Can't find 'Linux version ' ...
- yum安装提示错误Thread/process failed: Thread died in Berkeley DB library
问题描述: yum 安装更新提示 rpmdb: Thread/process failed: Thread died in Berkeley DB library 问题解决: 01.删除yum临时库文 ...
- rpmdb: Thread/process 9180/139855524558592 failed: Thread died in Berkeley DB library
使用yum安装出现问题:rpmdb: Thread/process 9180/139855524558592 failed: Thread died in Berkeley DB library 解决 ...
- rpmdb: Thread/process 10646/3086534416 failed: Thread died in Berkeley DB library
明明用rpm查看包存在,但删除的时候进程就停住了.后来出现以下错误:rpmdb: Thread/process 10646/3086534416 failed: Thread died in Berk ...
- js in depth: event loop & micro-task, macro-task & stack, queue, heap & thread, process
js in depth: event loop & micro-task, macro-task & stack, queue, heap & thread, process ...
- linux page table entry struct
Page Table Entry The access control information is held in the PTE and is CPU specific; figure bit f ...
- Kernel Page Global Directory (PGD) of Page table of Process created in Linux Kernel
Kernel Page Global Directory (PGD) of User process created 在早期版本: 在fork一个进程的时候,必须建立进程自己的内核页目录项(内核页目录 ...
- TCP Socket Establish;UDP Send Package Process In Kernel Sourcecode Learning
目录 . 引言 . TCP握手流程 . TCP connect() API原理 . TCP listen() API原理 . UDP交互过程 . UDP send() API原理 . UDP bind ...
随机推荐
- 上下左右居中 无固定高的div
<style type=“text/css”> #vc { display:table; background-color:#C2300B; width:500px; height:200 ...
- Go语言 - 函数 | 作用域 | 匿名函数 | 闭包 | 内置函数
函数是组织好的.可重复使用的.用于执行指定任务的代码块.本文介绍了Go语言中函数的相关内容. 介绍 Go语言中支持函数.匿名函数和闭包,并且函数在Go语言中属于“一等公民”. 函数可以赋值给变量 函数 ...
- robot framework设置更高级别的关键字
robot framework中除了内置的关键字,以及低级别的用户自定义关键字外,为了使用例更加整洁,我们还可以形成更高级别的关键字 方法如下: 在Keywords里面设置 其中Run Success ...
- [Javascript] Window.matchMedia()
window.matchMedia() allow to listen to browser window size changes and trigger the callback for diff ...
- L3956棋盘
1,记得之前要复习.上次先写的题是数的划分. 虽然我不想说,估计全忘了.复习就当把上次的题写了把. 应该比较稳了. 2,题中的要求. 一,所在的位置必须是有颜色的.(很明显要用bool去涂一遍) 二, ...
- learning scala repreated parameters
- 第03组 团队git现场编程实战
1.组员职责分工 张逸杰:复制监督整个编程任务的进程以及协助组员编程 黄智锋.刘汪洋:负责UI设计 苏凯婷.鲍冰如:爬取数据并负责测评出福州最受欢迎的商圈 陈荣杰.杨锦镔:爬取数据并负责测评出福州人均 ...
- js MD5加密与 java MD5加密不一致
因为该项目会部署到多台机器,所以需要用字符生成唯一的MD5,但是js生成的MD5和java生成的MD5不一致.经过博主查阅资料发现java生成MD5用的是utf-8的编码,而且js用的是2进制.那我就 ...
- mysql 唯一键
唯一键特点: 1.唯一键在一张表中可以有多个. 2.唯一键允许字段数据为NULL,NULL可以有多个(NULL不参与比较) //一个表中允许存在多个唯一键,唯一键允许为空,在不为空的情况下,不允许重复 ...
- Python3之logging模块浅析
Python3之logging模块浅析 目录 Python3之logging模块浅析 简单用法 日志与控制台同时输出 一个同时输出到屏幕.文件的完成例子 日志文件截取 日志重复打印问题解决 问题分 ...