Linux thread process and kernel mode and user mode page table

Linux 中线程和进程切换的开销：

Linux 操作系统层面的进程和线程的实现都是task_struct描述符. task_struct 包含成员变量：内核态stack. 这些都存在3-4G虚拟地址空间的内核态空间中。内核栈用于保存各个寄存器值：CS,DS,SS等. os层面的线程进程切换，都是在kernel mode下操作的。每个process都有自己unique的内核栈（因为每个process对应一个task_struct，kernel stack is member of the struct).

process context switch: 从user mode 到kernel mode, 内核stack用于保存user mode的寄存器值，用于下次返回用户态时候，能够通过寄存器找到指令和内存地址。user mode 通过中断进去kernel mode，通过int $80 syscall mechanism，找到中断处理程序：

包括：

The int instruction is a complex multi step instruction. Here is an explanation of what it does:

1.) Extracts descriptor from IDT (IDT address stored in special register) and checks that CPL <= DPL. CPL is a current privilege level, which could be read from CS register. DPL is stored in the IDT descriptor. As a consequence of this - you can't generate some exceptions (f.e. page fault) from user space directly by int instruction. If you will try to do this, you will get general protection exception

2.) The processor switches to the stack defined in TSS. TSS was initialized earlier, and already contains values of ESP and SS, which holds the kernel stack address. So now ESP points to kernel stack.

3.) The processor pushes to the newly switched kernel stack user space registers: ss, esp, eflags, cs, eip. We need to return back after syscall is served, right?

4.) Next processor set CS and EIP from IDT descriptor. This address defines exception vector entry point.

5.) Here we are in the syscall exception vector in kernel.

以上是user to kernel，那么如果是线程进程切换呢？sched_yield system call会接着把选择一个线程进行切换，把new 线程的内核栈pop到寄存器中，正式进入新线程的内核态，然后返回user mode。完成切换

区别呢？proces 切换包括虚拟地址空间的切换，切换的实质就是cr3切换（内存空间切换，在switch_mm函数中）+ 寄存器切换（包括EIP，ESP等，均在switch_to函数中）. 任何线程内核态的页表完全一样，是共享的。只有用户态页表不同。这就是主要区别，就是页表，由此到来的TLB 失效，导致的性能开销。所谓TLB，是因为TLB存在最近使用的页表项，页表本身是物理内存。TLB减少了页表项的寻址.

用户层面的线程栈大小为什么是8MB限制。因为很多语言都支持多线程。例如C++ pthread，所谓线程栈都在进程地址空间的stack栈区。不同线程栈不应该相互重叠，否则会写坏各自的栈区crash。所以如果不事先规定stack的地址和大小。而是无限增长，那么肯定会重叠。且分配过大会导致可create的线程数变小。用户态线程切换的本质就是寄存器的切换，非常轻量级别

CPU的特权级别：ring 0- ring 3. cs段选择子本质就是cs寄存器的值，包括index 和 CPL，index用于找到段描述符表的一个段描述符entry的偏移地址。段描述符包含段基址和DPL，也就是段地址：线性地址。同时表明这个线性地址的特权级别。注意分段机制下，cs和ds，ss段看成不同的段，现代os已经废除分段机制，intel只是为了兼容。内核态的cs，ss，ds段都会把DPL置成0，表明user mode 的指令不能操作它们。这就是保护模式。那么为什么需要RPL呢？

RPL – Requested Privilege Level

These are the last two bits of DS, ES, SS, FS, GS registers. RPL field is used to harden the CPL, when higher-privileged code is servicing lower-privileged processes requests.

Assume a higher-privileged device-driver that supports a mechanism where, it can copy data from disks directly into lower-privileged processes’ data-segments. Lower-privileged processes must pass their data-segment details (selector, address and size of data to copy) to the device-driver so that device-driver can copy data into appropriate location.

Since a device-driver is higher-privileged, a lower-privileged process can trick the driver to copy data into high-privileged data-segments, simply by passing wrong selector value. This kind of exploit is called, Privilege Escalation.

How RPL helps to solve Privilege Escalation problem?

Continuing the above example, whenever device-driver loads the destination segment, it modifies the destination segment’s RPL to match the requestor (lower-privileged) process. Since protection rules for data-segments check for both CPL <= DPL and RPL <= DPL conditions, higher-privileged process gets a protection-fault on RPL <= DPL check.

The point to note is, higher-privileged code, when it is providing services to lower-privileged processes should reduce its privilege temporarily to the requestors’ privilege-level.

cpu 的privilege 模式可以保护内存，如果user态范围了受保护的内存地址，会触发segment fault error.

至于二级页表的根本目的就是减少连续虚拟地址空间的需求，不然32位的process 会需要4MB的页表大小（单页4KB前提下）。因为物理页框的大小是4KB，那么虚拟线性地址空间如果找到物理地址呢？假如采用直接映射的话，一个页表项对应一个页框，4GB/4KB=1MB。需要1mb个页表项进行映射，那么每个页表项需要多少bytes呢？1MB有20bit，所以最少需要20bit，3bytes大小，实际取4bytes大小。所以不采用分页目录，每个进程页表4MB物理内存。 4KB的物理页框是2的12次方个的物理地址。说明如果是32位的话，后12位可以不考虑，直接寻址前20位。

https://blog.csdn.net/displayMessage/article/details/80905810

Linux thread process and kernel mode and user mode page table的更多相关文章

WSL(Windows Subsystem for Linux)--Pico Process Overview
[转载] Windows Subsystem for Linux -- Pico Process Overview Overview This post discusses pico processe ...
Android开发：Android虚拟机启动错误Can't find 'Linux version ' string in kernel image file
Android启动出错,虚拟机报错信息如下: Starting emulator for AVD 'test' emulator: ERROR: Can't find 'Linux version ' ...
yum安装提示错误Thread/process failed: Thread died in Berkeley DB library
问题描述: yum 安装更新提示 rpmdb: Thread/process failed: Thread died in Berkeley DB library 问题解决: 01.删除yum临时库文 ...
rpmdb: Thread/process 9180/139855524558592 failed: Thread died in Berkeley DB library
使用yum安装出现问题:rpmdb: Thread/process 9180/139855524558592 failed: Thread died in Berkeley DB library 解决 ...
rpmdb: Thread/process 10646/3086534416 failed: Thread died in Berkeley DB library
明明用rpm查看包存在,但删除的时候进程就停住了.后来出现以下错误:rpmdb: Thread/process 10646/3086534416 failed: Thread died in Berk ...
js in depth: event loop & micro-task, macro-task & stack, queue, heap & thread, process
js in depth: event loop & micro-task, macro-task & stack, queue, heap & thread, process ...
linux page table entry struct
Page Table Entry The access control information is held in the PTE and is CPU specific; figure bit f ...
Kernel Page Global Directory (PGD) of Page table of Process created in Linux Kernel
Kernel Page Global Directory (PGD) of User process created 在早期版本: 在fork一个进程的时候,必须建立进程自己的内核页目录项(内核页目录 ...
TCP Socket Establish；UDP Send Package Process In Kernel Sourcecode Learning
目录 . 引言 . TCP握手流程 . TCP connect() API原理 . TCP listen() API原理 . UDP交互过程 . UDP send() API原理 . UDP bind ...

随机推荐

axios基本设置
Python tkinter模块弹出窗口及传值回到主窗口操作详解
这篇文章主要介绍了Python tkinter模块弹出窗口及传值回到主窗口操作,结合实例形式分析了Python使用tkinter模块实现的弹出窗口及参数传递相关操作技巧,需要的朋友可以参考下本文实例 ...
Browsersync 省时浏览器同步测试工具，浏览器自动刷新，多终端同步
官网地址 http://www.browsersync.cn/ 1.安装 BrowserSync npm install -g browser-sync 2.启动 BrowserSync // --f ...
51Node1228序列求和 ——自然数幂和模板&&伯努利数
伯努利数法伯努利数原本就是处理等幂和的问题,可以推出 $$ \sum_{i=1}^{n}i^k={1\over{k+1}}\sum_{i=1}^{k+1}C_{k+1}^i*B_{k+1-i}*(n ...
Python垃圾回收机制？
Python的GC模块主要运用了“引用计数”(reference counting)来跟踪和回收垃圾.在引用计数的基础上,还可以通过“标记-清除”(mark and sweep)解决容器对象可能产生的 ...
[转]查看 docker 容器使用的资源
作者:sparkdev 出处:http://www.cnblogs.com/sparkdev/ 在容器的使用过程中,如果能及时的掌握容器使用的系统资源,无论对开发还是运维工作都是非常有益的.幸 ...
JavaScript高级程序编程(三)
2017-06-24 更新北京连续三天下雨啦乘性操作符 1.ECMA中定义了三种操作符,乘法除法和求模并与其他语言相应操作符相同,再计算之前如果不是数值,会先去调用number()方法转 ...
洛谷P2877 [USACO07NOV]防晒霜Sunscreen
题目此题有多种贪心方法. 首先简化题意: 有几个在数轴上的区间,和几个在数轴上确定的位置的点,问用这些数目的点,最多能满足多少个区间里有点. 注意:此题跟区间选点问题不一样,每个点只能满足一个区间, ...
解决Git - git push origin master 报错
关注我,每天都有优质技术文章推送,工作,学习累了的时候放松一下自己. 欢迎大家关注我的微信公众号:「醉翁猫咪」原因:github仓库中没有README.md文件解决如下: 重新输入git push ...
java基础-数据输入
import java.util.Scanner; public class ScannerTest { public static void main(String[] args) { //身高未知 ...

Linux thread process and kernel mode and user mode page table

Linux thread process and kernel mode and user mode page table的更多相关文章

随机推荐

热门专题