Linux内核TSS的使用

参见文章：http://blog.chinaunix.net/uid-22695386-id-272098.html

linux2.4之前的内核有进程最大数的限制，受限制的原因是，每一个进程都有自已的TSS和LDT，而TSS（任务描述符)和LDT(私有描述符)必须放在GDT中，GDT最大只能存放8192个描述符，除掉系统用掉的12描述符之外，最大进程数=(8192-12)/2, 总共4090个进程。从Linux2.4以后，全部进程使用同一个TSS，准确的说是，每个CPU一个TSS，在同一个CPU上的进程使用同一个TSS。TSS的定义在asm-i386/processer.h中，定义如下：
extern struct tss_struct init_tss[NR_CPUS];
在start_kernel()->trap_init()->cpu_init()初始化并加载TSS:
void __init cpu_init (void)
{
int nr = smp_processor_id();    //获取当前cpu
struct tss_struct * t = &init_tss[nr]; //当前cpu使用的tss
t->esp0 = current->thread.esp0;            //把TSS中esp0更新为当前进程的esp0
set_tss_desc(nr,t);
gdt_table[__TSS(nr)].b &= 0xfffffdff;
load_TR(nr);                                              //加载TSS
load_LDT(&init_mm.context);                //加载LDT
}
我们知道，任务切换(硬切换)需要用到TSS来保存全部寄存器(2.4以前使用jmp来实现切换)，
中断发生时也需要从TSS中读取ring0的esp0，那么，进程使用相同的TSS，任务切换怎么办？
其实2.4以后不再使用硬切换，而是使用软切换，寄存器不再保存在TSS中了，而是保存在
task->thread中，只用TSS的esp0和IO许可位图，所以，在进程切换过程中，只需要更新TSS中
的esp0、io bitmap，代码在sched.c中：
schedule()->switch_to()->__switch_to()，
void fastcall __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
{
struct thread_struct *prev = &prev_p->thread,
     *next = &next_p->thread;
struct tss_struct *tss = init_tss + smp_processor_id(); //当前cpu的TSS
/*
* Reload esp0, LDT and the page table pointer:
*/
ttss->esp0 = next->esp0; //用下一个进程的esp0更新tss->esp0
//拷贝下一个进程的io_bitmap到tss->io_bitmap
if (prev->ioperm || next->ioperm) {
   if (next->ioperm) {
    /*
    * 4 cachelines copy ... not good, but not that
    * bad either. Anyone got something better?
    * This only affects processes which use ioperm().
    * [Putting the TSSs into 4k-tlb mapped regions
    * and playing VM tricks to switch the IO bitmap
    * is not really acceptable.]
    */
    memcpy(tss->io_bitmap, next->io_bitmap,
     IO_BITMAP_BYTES);
    tss->bitmap = IO_BITMAP_OFFSET;
   } else
    /*
    * a bitmap offset pointing outside of the TSS limit
    * causes a nicely controllable SIGSEGV if a process
    * tries to use a port IO instruction. The first
    * sys_ioperm() call sets up the bitmap properly.
    */
    tss->bitmap = INVALID_IO_BITMAP_OFFSET;
}
}

以及代码：

   1: /*

   2:  *    switch_to(x,yn) should switch tasks from x to y.

   3:  *

   4:  * We fsave/fwait so that an exception goes off at the right time

   5:  * (as a call from the fsave or fwait in effect) rather than to

   6:  * the wrong process. Lazy FP saving no longer makes any sense

   7:  * with modern CPU's, and this simplifies a lot of things (SMP

   8:  * and UP become the same).

   9:  *

  10:  * NOTE! We used to use the x86 hardware context switching. The

  11:  * reason for not using it any more becomes apparent when you

  12:  * try to recover gracefully from saved state that is no longer

  13:  * valid (stale segment register values in particular). With the

  14:  * hardware task-switch, there is no way to fix up bad state in

  15:  * a reasonable manner.

  16:  *

  17:  * The fact that Intel documents the hardware task-switching to

  18:  * be slow is a fairly red herring - this code is not noticeably

  19:  * faster. However, there _is_ some room for improvement here,

  20:  * so the performance issues may eventually be a valid point.

  21:  * More important, however, is the fact that this allows us much

  22:  * more flexibility.

  23:  *

  24:  * The return value (in %ax) will be the "prev" task after

  25:  * the task-switch, and shows up in ret_from_fork in entry.S,

  26:  * for example.

  27:  */

  28: __notrace_funcgraph struct task_struct *

  29: __switch_to(struct task_struct *prev_p, struct task_struct *next_p)

  30: {

  31:     struct thread_struct *prev = &prev_p->thread,

  32:                  *next = &next_p->thread;

  33:     int cpu = smp_processor_id();

  34:     struct tss_struct *tss = &per_cpu(init_tss, cpu);

  35:     fpu_switch_t fpu;

36:

  37:     /* never put a printk in __switch_to... printk() calls wake_up*() indirectly */

38:

  39:     fpu = switch_fpu_prepare(prev_p, next_p);

40:

  41:     /*

  42:      * Reload esp0.

  43:      */

  44:     load_sp0(tss, next);

45:

  46:     /*

  47:      * Save away %gs. No need to save %fs, as it was saved on the

  48:      * stack on entry.  No need to save %es and %ds, as those are

  49:      * always kernel segments while inside the kernel.  Doing this

  50:      * before setting the new TLS descriptors avoids the situation

  51:      * where we temporarily have non-reloadable segments in %fs

  52:      * and %gs.  This could be an issue if the NMI handler ever

  53:      * used %fs or %gs (it does not today), or if the kernel is

  54:      * running inside of a hypervisor layer.

  55:      */

  56:     lazy_save_gs(prev->gs);

57:

  58:     /*

  59:      * Load the per-thread Thread-Local Storage descriptor.

  60:      */

  61:     load_TLS(next, cpu);

62:

  63:     /*

  64:      * Restore IOPL if needed.  In normal use, the flags restore

  65:      * in the switch assembly will handle this.  But if the kernel

  66:      * is running virtualized at a non-zero CPL, the popf will

  67:      * not restore flags, so it must be done in a separate step.

  68:      */

  69:     if (get_kernel_rpl() && unlikely(prev->iopl != next->iopl))

  70:         set_iopl_mask(next->iopl);

71:

  72:     /*

  73:      * Now maybe handle debug registers and/or IO bitmaps

  74:      */

  75:     if (unlikely(task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV ||

  76:              task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT))

  77:         __switch_to_xtra(prev_p, next_p, tss);

78:

  79:     /*

  80:      * Leave lazy mode, flushing any hypercalls made here.

  81:      * This must be done before restoring TLS segments so

  82:      * the GDT and LDT are properly updated, and must be

  83:      * done before math_state_restore, so the TS bit is up

  84:      * to date.

  85:      */

  86:     arch_end_context_switch(next_p);

87:

  88:     /*

  89:      * Restore %gs if needed (which is common)

  90:      */

  91:     if (prev->gs | next->gs)

  92:         lazy_load_gs(next->gs);

93:

  94:     switch_fpu_finish(next_p, fpu);

95:

  96:     percpu_write(current_task, next_p);

97:

  98:     return prev_p;

  99: }

先分析一下comments

/*
* switch_to(x,yn) should switch tasks from x to y.
*
* We fsave/fwait so that an exception goes off at the right time
* (as a call from the fsave or fwait in effect) rather than to
* the wrong process. Lazy FP saving no longer makes any sense
* with modern CPU's, and this simplifies a lot of things (SMP
* and UP become the same).
*
* NOTE! We used to use the x86 hardware context switching. The
* reason for not using it any more becomes apparent when you
* try to recover gracefully from saved state that is no longer
* valid (stale 【变味的，失效的】segment register values in particular). With the
* hardware task-switch, there is no way to fix up bad state in
* a reasonable manner.
*
* The fact that Intel documents the hardware task-switching to
* be slow is a fairly red herring【题外话】 - this code is not noticeably
* faster. However, there _is_ some room for improvement here,
* so the performance issues may eventually be a valid point.
* More important, however, is the fact that this allows us much
* more flexibility.
*
* The return value (in %ax) will be the "prev" task after
* the task-switch, and shows up in ret_from_fork in entry.S,
* for example.
*/

大致意思是，为了灵活起见，我们将Intel硬件任务切换变为软任务切换。

根据开关引用的文章，每个CPU上执行的进程都使用同一个TSS段，

int cpu = smp_processor_id();
struct tss_struct *tss = &per_cpu(init_tss, cpu);

而且里面有效的信息只有esp0和io_map成员。

/*
     * Reload esp0.
     */
    load_sp0(tss, next);

原来保存在TSS段中，属于每个进程的上下文信息都保存在下面的结构体（thread_struct）中：

struct thread_struct *prev = &prev_p->thread,
*next = &next_p->thread;

   1: struct thread_struct {

   2:     /* Cached TLS descriptors: */

   3:     struct desc_struct    tls_array[GDT_ENTRY_TLS_ENTRIES];

   4:     unsigned long        sp0;

   5:     unsigned long        sp;

   6: #ifdef CONFIG_X86_32

   7:     unsigned long        sysenter_cs;

   8: #else

   9:     unsigned long        usersp;    /* Copy from PDA */

  10:     unsigned short        es;

  11:     unsigned short        ds;

  12:     unsigned short        fsindex;

  13:     unsigned short        gsindex;

  14: #endif

  15: #ifdef CONFIG_X86_32

  16:     unsigned long        ip;

  17: #endif

  18: #ifdef CONFIG_X86_64

  19:     unsigned long        fs;

  20: #endif

  21:     unsigned long        gs;

  22:     /* Save middle states of ptrace breakpoints */

  23:     struct perf_event    *ptrace_bps[HBP_NUM];

  24:     /* Debug status used for traps, single steps, etc... */

  25:     unsigned long           debugreg6;

  26:     /* Keep track of the exact dr7 value set by the user */

  27:     unsigned long           ptrace_dr7;

  28:     /* Fault info: */

  29:     unsigned long        cr2;

  30:     unsigned long        trap_no;

  31:     unsigned long        error_code;

  32:     /* floating point and extended processor state */

  33:     unsigned long        has_fpu;

  34:     struct fpu        fpu;

  35: #ifdef CONFIG_X86_32

  36:     /* Virtual 86 mode info */

  37:     struct vm86_struct __user *vm86_info;

  38:     unsigned long        screen_bitmap;

  39:     unsigned long        v86flags;

  40:     unsigned long        v86mask;

  41:     unsigned long        saved_sp0;

  42:     unsigned int        saved_fs;

  43:     unsigned int        saved_gs;

  44: #endif

  45:     /* IO permissions: */

  46:     unsigned long        *io_bitmap_ptr;

  47:     unsigned long        iopl;

  48:     /* Max allowed port in the bitmap, in bytes: */

  49:     unsigned        io_bitmap_max;

  50: };

在Linux操作系统中，gs寄存器用于存储存放TLS的地址。（在Windows中，使用fs寄存器来存放TEB结构体的地址）。

参见：http://www.linuxidc.com/Linux/2012-06/64079p2.htm

Linux的glibc使用GS寄存器来访问TLS，也就是说，GS寄存器指示的段指向本线程的TEB(Windows的术语)，也就是TLS，这么做有个好处，那就是可以高效的访问TLS里面存储的信息而不用一次次的调用系统调用，当然使用系统调用的方式也是可以的。之所以可以这么做，是因为Intel对各个寄存器的作用的规范规定的比较松散，因此你可以拿GS，FS等段寄存器来做几乎任何事，当然也就可以做TLS直接访问了，最终glibc在线程启动的时候首先将GS寄存器指向GDT的第6个段，完全使用段机制来支持针对TLS的寻址访问，后续的访问TLS信息就和访问用户态的信息一样高效了。

下面代码，将当前的CPU中的gs寄存器的内容写回到prev结构体中。

/*
     * Save away %gs. No need to save %fs, as it was saved on the
     * stack on entry. No need to save %es and %ds, as those are
     * always kernel segments while inside the kernel. Doing this
     * before setting the new TLS descriptors avoids the situation
     * where we temporarily have non-reloadable segments in %fs
     * and %gs. This could be an issue if the NMI handler ever
     * used %fs or %gs (it does not today), or if the kernel is
     * running inside of a hypervisor layer.
     */
    lazy_save_gs(prev->gs);

接下来

/*
     * Load the per-thread Thread-Local Storage descriptor.
     */
    load_TLS(next, cpu);

更新GDT表中表示TLS的相关表项。

   1: #define load_TLS(t, cpu)            native_load_tls(t, cpu)

2:

   3: static inline void native_load_tls(struct thread_struct *t, unsigned int cpu)

   4: {

   5:     struct desc_struct *gdt = get_cpu_gdt_table(cpu);

   6:     unsigned int i;

7:

   8:     for (i = 0; i < GDT_ENTRY_TLS_ENTRIES; i++)

   9:         gdt[GDT_ENTRY_TLS_MIN + i] = t->tls_array[i];

  10: }

首先获取到当前CPU的GDT表

struct gdt_page {
struct desc_struct gdt[GDT_ENTRIES];
} __attribute__((aligned(PAGE_SIZE)));

DECLARE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page);

static inline struct desc_struct *get_cpu_gdt_table(unsigned int cpu)
{
return per_cpu(gdt_page, cpu).gdt;
}

per_cpu机制，是保证每个CPU都有一份自己的关键数据结构

参见：http://www.unixresources.net/linux/clf/linuxK/archive/00/00/47/91/479165.html

在该函数中，为每个CPU分配一段专有数据区，并将.data.percpu中的数据拷贝到其中，
每个CPU各有一份。由于数据从__per_cpu_start处转移到各CPU自己的专有数据区中了，
因此存取其中的变量就不能再用原先的值了，比如存取per_cpu__runqueues
就不能再用per_cpu__runqueues了，需要做一个偏移量的调整，
即需要加上各CPU自己的专有数据区首地址相对于__per_cpu_start的偏移量。
在这里也就是__per_cpu_offset[i]，其中CPU i的专有数据区相对于
__per_cpu_start的偏移量为__per_cpu_offset[i]。
这样，就可以方便地计算专有数据区中各变量的新地址，比如对于per_cpu_runqueues，
其新地址即变成per_cpu_runqueues+__per_cpu_offset[i]。

load_TLS的深入到此结束，言归正传。

/*
     * Now maybe handle debug registers and/or IO bitmaps
     */
    if (unlikely(task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV ||
             task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT))
        __switch_to_xtra(prev_p, next_p, tss);

接下来处理调试寄存器，以及io位图。

   1: void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,

10:

11:

13:

14:

15:

17:

18:

20:

21:

22:

23:

24:

25:

26:

27:

29:

30:

31:

32:

33:

34:

35:

36:

37:

38:

39:

40:

41:

42:

"); >Linux内核TSS的使用的更多相关文章 o.com/A/obzb02E35E/">linux内核——TSS 0, 0.32);margin-bottom: 8px;">task state segment,任务状态段. 关于每个cpu对应不同TSS段的问题,如下解释: TSS段主要用在当前的任务从用户态切入内核态时去找到该任务的内核堆栈. 多核上的任务是真正的并发, ...

Linux内核分析学习总结

20135125陈智威原创作品转载请注明出处 <Linux内核分析>MOOC课程http://mooc.study.163.com/course/USTC-1000029000 这学期开 ...

Linux内核分析总结

张潇月 + <Linux内核分析>MOOC课程http://mooc.study.163.com/course/USTC-1000029000 学习目录: (1)计算机是如何工作的 h ...

linux内核分析之fork()

从一个比较有意思的题开始说起,最近要找工作无意间看到一个关于unix/linux中fork()的面试题: #include<sys/types.h> #include<stdio.h ...

24小时学通Linux内核之进程

都说这个主题不错,连我自己都觉得有点过大了,不过我想我还是得坚持下去,努力在有限的时间里学习到Linux内核的奥秘,也希望大家多指点,让我更有进步.今天讲的全是进程,这点在大二的时候就困惑了我,结果那 ...

linux内核学习之四：进程切换简述

在讲述专业知识前,先讲讲我学习linux内核使用的入门书籍:<深入理解linux内核>第三版(英文原版叫<Understanding the Linux Kernel>),不过 ...

十天学Linux内核之第二天---进程

原文:十天学Linux内核之第二天---进程都说这个主题不错,连我自己都觉得有点过大了,不过我想我还是得坚持下去,努力在有限的时间里学习到Linux内核的奥秘,也希望大家多指点,让我更有进步.今天讲 ...

20169207《Linux内核原理与分析》第八周作业

本章的作业依旧包括两部分,1.阅读学习教材「Linux内核设计与实现 (Linux Kernel Development)」第教材第11,12章. 2.学习MOOC「Linux内核分析」第六讲「进程的 ...

Linux内核分析——第八周学习笔记

实验作业:进程调度时机跟踪分析进程调度与进程切换的过程 20135313吴子怡.北京电子科技学院 [第一部分]理解Linux系统中进程调度的时机 1.Linux的调度程序是一个叫schedule()的 ...

随机推荐

VS2015 编写C++的DLL，并防止DLL导出的函数名出现乱码（以串口通信为例，实现串口通信）
参考链接:https://blog.csdn.net/songyi160/article/details/50754705 1.新建项目建立好的项目界面如下: 接着在解决方案中找到[头文件]然后右击 ...
OSX 创建 randisk(或称 tmpfs)
创建步骤: #!/bin/bash ramdisk_size_in_mb= mount_point=/private/tmp ramdisk_size_in_sectors=$((${ramdisk_ ...
python 装饰器第十一步：多层装饰器的嵌套
#第十一步:多层装饰器的嵌套 #装饰器1 def kuozhan1(func): #定义装饰之后的函数 def neweat1(): # 扩展功能1 print('1-----饭前洗手') # 调用基 ...
CommonJS规范 by ranyifeng
1,概述 CommonJS是服务器端模块的规范,Node.js采用了这个规范. 根据CommonJS规范,一个单独的文件就是一个模块.加载模块使用require方法,该方法读取一个文件并执行,最后返回 ...
Linux查看软件安装路径，和文件的位置
查看软件是否安装:rpm -qa|grep xx 列出软件安装包安装的文件:rpm -ql 直接使用rpm -qal |grep mysql 查看mysql所有安装包的文件存储位置通过find去查找 ...
MySQL Workbench无法显示左侧的navigator，只显示Object info和Session
问题描述:Mac版MySQL Workbench出现异常强制退出后,再次进入后左侧的navigator消失,左侧整个导航条消失了,只显示Object info和Session. 问题根源:MySQL ...
[FW]使用kprobes查看内核内部信息
使用printk打印变量等方法,是调试内核的有效方法之一,但是这种方法必须重新构建并用新内核启动,调试效率比较低.以内核模块的方式使用kprobes.jprobes,就可以在任意地址插入侦测器,执行包 ...
九、hibernate的查询（QBC）
QBC:Query By Criteria 条件查询比较适合组合条件查询 QBC查询简单查询创建Criteria对象:Criteria criteria = session.createCrit ...
Vue-基础(二)
一,Vue中的组件化开发: Vue中的组件也就是Vue实例组件类型: 通用组件(例如,表单,弹窗,布局类等) 业务组件(抽奖,机器分类) 页面组件(单页面开发程序的每个页面都是一个组件) 组件开发三 ...
go语言从例子开始之Example16.函数递归
Go 支持递归.这里是一个经典的阶乘示例. Example: package main import "fmt" func fact(n int) int{ //先设置退出条件 ...

Linux内核TSS的使用

随机推荐

热门专题