目录

. 系统调用简介
. 系统调用跟踪调试
. 系统调用内核源码分析

1. 系统调用简介

关于系统调用的基本原理,请参阅另一篇文章,本文的主要目标是从内核源代码的角度来学习一下系统调用在底层的内核中是如何实现的

Relevant Link:

http://www.cnblogs.com/LittleHann/p/3850653.html
http://www.kerneltravel.net/journal/iv/syscall.htm
http://zjuedward.blog.51cto.com/1445231/465997
http://www.ibm.com/developerworks/cn/linux/l-system-calls/
http://www.cnblogs.com/LittleHann/p/3850655.html

2. 系统调用跟踪调试

我们可以借助一些内核调试工具,来动态跟踪运行路径
code:

#include <syscall.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/types.h> int main(void)
{
long ID;
ID = getpid();
printf ("getpid()=%ld\n", ID);
return();
}

编译:

gcc sys.c -o sys

kdb调试

. 激活KDB(按下pause键,当然你必须已经给内核打了KDB补丁)
. 设置内核断点: "bp sys_getpid"
. 退出kdb: "go"
. 执行./getpid
. 进入内核调试状态,执行路径停止在断点sys_getpid处。
. 在KDB>提示符下,执行bt命令观察堆栈,发现调用的嵌套路径,可以看到在sys_getpid是在内核函数system_call中被嵌套调用的。
. 在KDB>提示符下,执行rd命令查看寄存器中的数值,可以看到eax中存放的getpid调用号: 0x00000014(=).
. 在KDB>提示符下,执行ssb(或ss)命令跟踪内核代码执行路径,可以发现sys_getpid执行后,会返回system_call函数,然后接者转入ret_from_sys_call例程

程序的执行流程大致可归结为以下几个步骤

. 该程序调用libc库的封装函数getpid。该封装函数将系统调用号_NR_getpid(第20个)压入EAX寄存器
. 通过IDT找到0x80中断的地址
. 调用软中断int 0x80 进入内核态
. 在内核中首先执行system_call,接着执行根据系统调用号在调用表中查找到的对应的系统调用服务例程sys_getpid
. 执行sys_getpid服务例程
. 执行完毕后,转入ret_from_sys_call例程,系统调用中返回

Relevant Link:

ftp://oss.sgi.com/projects/kdb/download/v4.4/
https://www.kernel.org/
http://www.cnblogs.com/shineshqw/articles/2359114.html
http://www.drdobbs.com/open-source/linux-kernel-debugging/184406318

3. 系统调用内核源码分析

还是拿文章第2节的getuid()这个小程序作为例如,我们来从源码角度分析一下linux中系统调用的原理

linux-2.6.32.63\arch\x86\kernel

/*
所有系统调用的入口点,参数system_call是所希望激活的系统调用的"调用号"
*/
ENTRY(system_call)
RING0_INT_FRAME # can't unwind into user space anyway
/*
保存orig_eax,这个值就是传入的"系统调用号"
*/
pushl %eax
CFI_ADJUST_CFA_OFFSET
/*
SAVE_ALL的宏定义如下
他的作用是先把所有寄存器的值压栈,然后在system_call返回之前使用RESTORE_ALL把栈从栈中弹出,在这其中system_call可以根据需要子去使用寄存器的值。任何它调用的c函数都可以从栈中查找到所希望的参数,因为
SAVE_ALL已经把所有寄存器的值都压入栈中了
.macro SAVE_ALL
cld
PUSH_GS
pushl %fs
CFI_ADJUST_CFA_OFFSET 4
pushl %es
CFI_ADJUST_CFA_OFFSET 4
pushl %ds
CFI_ADJUST_CFA_OFFSET 4
pushl %eax
CFI_ADJUST_CFA_OFFSET 4
CFI_REL_OFFSET eax, 0
pushl %ebp
CFI_ADJUST_CFA_OFFSET 4
CFI_REL_OFFSET ebp, 0
pushl %edi
CFI_ADJUST_CFA_OFFSET 4
CFI_REL_OFFSET edi, 0
pushl %esi
CFI_ADJUST_CFA_OFFSET 4
CFI_REL_OFFSET esi, 0
pushl %edx
CFI_ADJUST_CFA_OFFSET 4
CFI_REL_OFFSET edx, 0
pushl %ecx
CFI_ADJUST_CFA_OFFSET 4
CFI_REL_OFFSET ecx, 0
pushl %ebx
CFI_ADJUST_CFA_OFFSET 4
CFI_REL_OFFSET ebx, 0
movl $(__USER_DS), %edx
movl %edx, %ds
movl %edx, %es
movl $(__KERNEL_PERCPU), %edx
movl %edx, %fs
SET_KERNEL_GS %edx
.endm
*/
SAVE_ALL
GET_THREAD_INFO(%ebp)
/*
检查系统调用是否正在被跟踪
*/
testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags(%ebp)
jnz syscall_trace_entry
cmpl $(nr_syscalls), %eax
jae syscall_badsys
syscall_call:
/*
调用系统函数
sys_call_table也定义在是一张由指向实现各种系统调用的内核函数的函数指针组成的表:
linux-2.6.32.63\arch\x86\kernel\syscall_table_32.S
ENTRY(sys_call_table)
.long sys_restart_syscall /* 0 - old "setup()" system call, used for restarting */
.long sys_exit
.long ptregs_fork
.long sys_read
.long sys_write
.long sys_open /* 5 */
.long sys_close
.long sys_waitpid
.long sys_creat
.long sys_link
.long sys_unlink /* 10 */
.long ptregs_execve
.long sys_chdir
.long sys_time
.long sys_mknod
.long sys_chmod /* 15 */
.long sys_lchown16
.long sys_ni_syscall /* old break syscall holder */
.long sys_stat
.long sys_lseek
.long sys_getpid /* 20 */
.long sys_mount
.long sys_oldumount
.long sys_setuid16
.long sys_getuid16
.long sys_stime /* 25 */
.long sys_ptrace
.long sys_alarm
.long sys_fstat
.long sys_pause
.long sys_utime /* 30 */
.long sys_ni_syscall /* old stty syscall holder */
.long sys_ni_syscall /* old gtty syscall holder */
.long sys_access
.long sys_nice
.long sys_ni_syscall /* 35 - old ftime syscall holder */
.long sys_sync
.long sys_kill
.long sys_rename
.long sys_mkdir
.long sys_rmdir /* 40 */
.long sys_dup
.long sys_pipe
.long sys_times
.long sys_ni_syscall /* old prof syscall holder */
.long sys_brk /* 45 */
.long sys_setgid16
.long sys_getgid16
.long sys_signal
.long sys_geteuid16
.long sys_getegid16 /* 50 */
.long sys_acct
.long sys_umount /* recycled never used phys() */
.long sys_ni_syscall /* old lock syscall holder */
.long sys_ioctl
.long sys_fcntl /* 55 */
.long sys_ni_syscall /* old mpx syscall holder */
.long sys_setpgid
.long sys_ni_syscall /* old ulimit syscall holder */
.long sys_olduname
.long sys_umask /* 60 */
.long sys_chroot
.long sys_ustat
.long sys_dup2
.long sys_getppid
.long sys_getpgrp /* 65 */
.long sys_setsid
.long sys_sigaction
.long sys_sgetmask
.long sys_ssetmask
.long sys_setreuid16 /* 70 */
.long sys_setregid16
.long sys_sigsuspend
.long sys_sigpending
.long sys_sethostname
.long sys_setrlimit /* 75 */
.long sys_old_getrlimit
.long sys_getrusage
.long sys_gettimeofday
.long sys_settimeofday
.long sys_getgroups16 /* 80 */
.long sys_setgroups16
.long old_select
.long sys_symlink
.long sys_lstat
.long sys_readlink /* 85 */
.long sys_uselib
.long sys_swapon
.long sys_reboot
.long sys_old_readdir
.long old_mmap /* 90 */
.long sys_munmap
.long sys_truncate
.long sys_ftruncate
.long sys_fchmod
.long sys_fchown16 /* 95 */
.long sys_getpriority
.long sys_setpriority
.long sys_ni_syscall /* old profil syscall holder */
.long sys_statfs
.long sys_fstatfs /* 100 */
.long sys_ioperm
.long sys_socketcall
.long sys_syslog
.long sys_setitimer
.long sys_getitimer /* 105 */
.long sys_newstat
.long sys_newlstat
.long sys_newfstat
.long sys_uname
.long ptregs_iopl /* 110 */
.long sys_vhangup
.long sys_ni_syscall /* old "idle" system call */
.long ptregs_vm86old
.long sys_wait4
.long sys_swapoff /* 115 */
.long sys_sysinfo
.long sys_ipc
.long sys_fsync
.long ptregs_sigreturn
.long ptregs_clone /* 120 */
.long sys_setdomainname
.long sys_newuname
.long sys_modify_ldt
.long sys_adjtimex
.long sys_mprotect /* 125 */
.long sys_sigprocmask
.long sys_ni_syscall /* old "create_module" */
.long sys_init_module
.long sys_delete_module
.long sys_ni_syscall /* 130: old "get_kernel_syms" */
.long sys_quotactl
.long sys_getpgid
.long sys_fchdir
.long sys_bdflush
.long sys_sysfs /* 135 */
.long sys_personality
.long sys_ni_syscall /* reserved for afs_syscall */
.long sys_setfsuid16
.long sys_setfsgid16
.long sys_llseek /* 140 */
.long sys_getdents
.long sys_select
.long sys_flock
.long sys_msync
.long sys_readv /* 145 */
.long sys_writev
.long sys_getsid
.long sys_fdatasync
.long sys_sysctl
.long sys_mlock /* 150 */
.long sys_munlock
.long sys_mlockall
.long sys_munlockall
.long sys_sched_setparam
.long sys_sched_getparam /* 155 */
.long sys_sched_setscheduler
.long sys_sched_getscheduler
.long sys_sched_yield
.long sys_sched_get_priority_max
.long sys_sched_get_priority_min /* 160 */
.long sys_sched_rr_get_interval
.long sys_nanosleep
.long sys_mremap
.long sys_setresuid16
.long sys_getresuid16 /* 165 */
.long ptregs_vm86
.long sys_ni_syscall /* Old sys_query_module */
.long sys_poll
.long sys_nfsservctl
.long sys_setresgid16 /* 170 */
.long sys_getresgid16
.long sys_prctl
.long ptregs_rt_sigreturn
.long sys_rt_sigaction
.long sys_rt_sigprocmask /* 175 */
.long sys_rt_sigpending
.long sys_rt_sigtimedwait
.long sys_rt_sigqueueinfo
.long sys_rt_sigsuspend
.long sys_pread64 /* 180 */
.long sys_pwrite64
.long sys_chown16
.long sys_getcwd
.long sys_capget
.long sys_capset /* 185 */
.long ptregs_sigaltstack
.long sys_sendfile
.long sys_ni_syscall /* reserved for streams1 */
.long sys_ni_syscall /* reserved for streams2 */
.long ptregs_vfork /* 190 */
.long sys_getrlimit
.long sys_mmap_pgoff
.long sys_truncate64
.long sys_ftruncate64
.long sys_stat64 /* 195 */
.long sys_lstat64
.long sys_fstat64
.long sys_lchown
.long sys_getuid
.long sys_getgid /* 200 */
.long sys_geteuid
.long sys_getegid
.long sys_setreuid
.long sys_setregid
.long sys_getgroups /* 205 */
.long sys_setgroups
.long sys_fchown
.long sys_setresuid
.long sys_getresuid
.long sys_setresgid /* 210 */
.long sys_getresgid
.long sys_chown
.long sys_setuid
.long sys_setgid
.long sys_setfsuid /* 215 */
.long sys_setfsgid
.long sys_pivot_root
.long sys_mincore
.long sys_madvise
.long sys_getdents64 /* 220 */
.long sys_fcntl64
.long sys_ni_syscall /* reserved for TUX */
.long sys_ni_syscall
.long sys_gettid
.long sys_readahead /* 225 */
.long sys_setxattr
.long sys_lsetxattr
.long sys_fsetxattr
.long sys_getxattr
.long sys_lgetxattr /* 230 */
.long sys_fgetxattr
.long sys_listxattr
.long sys_llistxattr
.long sys_flistxattr
.long sys_removexattr /* 235 */
.long sys_lremovexattr
.long sys_fremovexattr
.long sys_tkill
.long sys_sendfile64
.long sys_futex /* 240 */
.long sys_sched_setaffinity
.long sys_sched_getaffinity
.long sys_set_thread_area
.long sys_get_thread_area
.long sys_io_setup /* 245 */
.long sys_io_destroy
.long sys_io_getevents
.long sys_io_submit
.long sys_io_cancel
.long sys_fadvise64 /* 250 */
.long sys_ni_syscall
.long sys_exit_group
.long sys_lookup_dcookie
.long sys_epoll_create
.long sys_epoll_ctl /* 255 */
.long sys_epoll_wait
.long sys_remap_file_pages
.long sys_set_tid_address
.long sys_timer_create
.long sys_timer_settime /* 260 */
.long sys_timer_gettime
.long sys_timer_getoverrun
.long sys_timer_delete
.long sys_clock_settime
.long sys_clock_gettime /* 265 */
.long sys_clock_getres
.long sys_clock_nanosleep
.long sys_statfs64
.long sys_fstatfs64
.long sys_tgkill /* 270 */
.long sys_utimes
.long sys_fadvise64_64
.long sys_ni_syscall /* sys_vserver */
.long sys_mbind
.long sys_get_mempolicy
.long sys_set_mempolicy
.long sys_mq_open
.long sys_mq_unlink
.long sys_mq_timedsend
.long sys_mq_timedreceive /* 280 */
.long sys_mq_notify
.long sys_mq_getsetattr
.long sys_kexec_load
.long sys_waitid
.long sys_ni_syscall /* 285 */ /* available */
.long sys_add_key
.long sys_request_key
.long sys_keyctl
.long sys_ioprio_set
.long sys_ioprio_get /* 290 */
.long sys_inotify_init
.long sys_inotify_add_watch
.long sys_inotify_rm_watch
.long sys_migrate_pages
.long sys_openat /* 295 */
.long sys_mkdirat
.long sys_mknodat
.long sys_fchownat
.long sys_futimesat
.long sys_fstatat64 /* 300 */
.long sys_unlinkat
.long sys_renameat
.long sys_linkat
.long sys_symlinkat
.long sys_readlinkat /* 305 */
.long sys_fchmodat
.long sys_faccessat
.long sys_pselect6
.long sys_ppoll
.long sys_unshare /* 310 */
.long sys_set_robust_list
.long sys_get_robust_list
.long sys_splice
.long sys_sync_file_range
.long sys_tee /* 315 */
.long sys_vmsplice
.long sys_move_pages
.long sys_getcpu
.long sys_epoll_pwait
.long sys_utimensat /* 320 */
.long sys_signalfd
.long sys_timerfd_create
.long sys_eventfd
.long sys_fallocate
.long sys_timerfd_settime /* 325 */
.long sys_timerfd_gettime
.long sys_signalfd4
.long sys_eventfd2
.long sys_epoll_create1
.long sys_dup3 /* 330 */
.long sys_pipe2
.long sys_inotify_init1
.long sys_preadv
.long sys_pwritev
.long sys_rt_tgsigqueueinfo /* 335 */
.long sys_perf_event_open sys_call_table的三个参数
) 默认数组的基地址,因为这里为空则赋予0,但它需要和偏移地址sys_call_table相加,简单的说是sys_call_table被当作数组的基地址
) 数组索引(系统调用号)
) 大小(每个数组元素中的字节数)
所以这行代码可以重写为:
call sys_call_table[%eax]();
*/
call *sys_call_table(,%eax,)
/*
系统调用返回
它在EAX寄存器中的返回值(这个值同时也是system_call的返回值)被存储了起来。返回值被存储在堆栈中的EAX内,以使得RESTORE_ALL可以迅速地恢复实际的EAX寄存器及其他寄存器的值
*/
movl %eax,PT_EAX(%esp)
syscall_exit:
LOCKDEP_SYS_EXIT
DISABLE_INTERRUPTS(CLBR_ANY) TRACE_IRQS_OFF
movl TI_flags(%ebp), %ecx
testl $_TIF_ALLWORK_MASK, %ecx
/*
退出系统调用
*/
jne syscall_exit_work
.....

Relevant Link:

http://bbs.chinaunix.net/thread-2284015-1-1.html
http://wenku.baidu.com/view/f367a0ccda38376baf1fae0d.html
http://www.ibm.com/developerworks/cn/linux/l-system-calls/

Copyright (c) 2014 LittleHann All rights reserved

Linux Systemcall By INT 0x80、Llinux Kernel Debug Based On Sourcecode的更多相关文章

  1. Linux Systemcall Int0x80方式、Sysenter/Sysexit Difference Comparation

    目录 . 系统调用简介 . Linux系统调用实现方式的演进 . 通过INT 0x80中断方式进入系统调用 . 通过sysenter指令方式直接进入系统调用 . sysenter/sysexit编程示 ...

  2. Linux Kernel - Debug Guide (Linux内核调试指南 )

    http://blog.csdn.net/blizmax6/article/details/6747601 linux内核调试指南 一些前言 作者前言 知识从哪里来 为什么撰写本文档 为什么需要汇编级 ...

  3. Linux平台延时之sleep、usleep、nanosleep、select比较

    Linux平台延时之sleep.usleep.nanosleep.select比较 标签: 嵌入式thread线程cpu多线程 2015-05-05 15:28 369人阅读 评论(0) 收藏 举报 ...

  4. I/O模型之二:Linux IO模式及 select、poll、epoll详解

    目录: <I/O模型之一:Unix的五种I/O模型> <I/O模型之二:Linux IO模式及 select.poll.epoll详解> <I/O模型之三:两种高性能 I ...

  5. Linux设备模型(热插拔、mdev 与 firmware)【转】

    转自:http://www.cnblogs.com/hnrainll/archive/2011/06/10/2077469.html 转自:http://blog.chinaunix.net/spac ...

  6. Linux NIO 系列(04-4) select、poll、epoll 对比

    目录 一.API 对比 1.1 select API 1.2 poll API 1.3 epoll API 二.总结 2.1 支持一个进程打开的 socket 描述符(FD)不受限制(仅受限于操作系统 ...

  7. Linux设备模型(总线、设备、驱动程序和类)

    Linux设备驱动程序学习(13) -Linux设备模型(总线.设备.驱动程序和类)[转] 文章的例子和实验使用<LDD3>所配的lddbus模块(稍作修改). 提示:在学习这部分内容是一 ...

  8. 获得Unix/Linux系统中的IP、MAC地址等信息

    获得Unix/Linux系统中的IP.MAC地址等信息 中高级  |  2010-07-13 16:03  |  分类:①C语言. Unix/Linux. 网络编程 ②手册  |  4,471 次阅读 ...

  9. linux下Mysql 的安装、配置、数据导入导出

    MySQL是一种开放源代码的关系型数据库管理系统(RDBMS),虽然功能未必很强大,但因它的免费开源而广受欢迎. 这次,接着上一篇<CentOs minimal安装和开发环境部署>,讲下L ...

随机推荐

  1. sublime配置全攻略

    大家好,今天给大家分享一款编辑器:sublime text2     我用过很多编辑器, EditPlus.EmEditor.Notepad++.Notepad2.UltraEdit.Editra.V ...

  2. 【夯实Mysql基础】MySQL在Linux系统下配置文件及日志详解

    本文地址 分享提纲: 1. 概述 2. 详解配置文件 3. 详解日志 1.概述 MySQL配置文件在Windows下叫my.ini,在MySQL的安装根目录下:在Linux下叫my.cnf,该文件位于 ...

  3. Delphi7下SuperObject的JSON使用方法

    uses superobject; procedure TForm1.FormCreate(Sender: TObject); var aJson: ISuperObject; aSuperArray ...

  4. 微软职位内部推荐-Principal Development Lead

    微软近期Open的职位: Job Title: Principal Development Lead Work Location: Suzhou, China This is a once in a ...

  5. Protocol Buffer多态

    java中有多态的概念,protobuf本身没有多态的概念,但是它有一个扩展的概念. 以聊天消息为例,先看下面这个类图,基类是ChatMessage,子类TextMessage和ImageMessag ...

  6. acl拒绝访问流量

        interface Ethernet0/0 ip address 12.1.1.2 255.255.255.0 ip access-group 10 in half-duplex   R1# ...

  7. JS原型-语法甘露

    初看原型 JS的所有函数都有一个prototype属性,这个prototype属性本身又是一个object类型的对象. prototype提供了一群同类对象共享属性和方法的机制. 将一个基类的实例作为 ...

  8. 在WebApi中实现Cors访问

    Cors是个比较热的技术,这在蒋金楠的博客里也有体现,Cors简单来说就是“跨域资源访问”的意思,这种访问我们指的是Ajax实现的异步访问,形象点说就是,一个A网站公开一些接口方法,对于B网站和C网站 ...

  9. GEOS库在windows中的编译和测试(vs2012)

    版本:vs2012, geos3.5 一.下载和编译 这类的文章比较,不再具体细说,可以参考 http://blog.csdn.net/wangqinghao/article/details/8201 ...

  10. ultraEdit32 /uedit32 自定义快捷键/自定义注释快捷键

    编辑器一直用vim,但同事写VHDL 用的是utraledit32 ,为了更好的沟通,我也下载了最新破解版本:http://pan.baidu.com/s/1qWCYP2W 刚开始用找不到注释的快捷键 ...