An Assembly Language
BUFFER OVERFLOW 3
An Assembly Language
IntroductionBasic of x86 ArchitectureAssembly LanguageCompiler, Assembler & LinkerFunction OperationStackStack OperationStack based Buffer OverflowShellcode: The PayloadVulnerability & Exploit Examples |
THE ASSEMBLY LANGUAGESome knowledge of assembly is necessary in order to understand the operation of the buffer overflow exploits. There are essentially three kinds of languages:
Assembly is a symbolic language that is assembled into machine language by an assembler. In other words, assembly is a mnemonic statement that corresponds directly to processor-specific instructions. Each type of processor has its own instruction set and thus its own assembly language. Assembly deals directly with the registers of the processor and memory locations. There are some general rules that are typically true for most assembly languages are listed below:
|
|||||||||||||||
Opcodes are the actual instructions that a program performs. Each opcode is represented by one line of code, which contains the opcode and the operands that are used by the opcode. The number of operands varies depending on the opcode. The entire suite of opcodes available to a processor is called an instruction set. Depending on the processor, OS, and disassembler used, the operands may be in reverse order. For example, on Windows:
MOV dst, src
Is equivalent to:
MOV %src, %dst on Linux.
Windows uses Intel assembly whereas Linux uses AT&T assembly. Another one you may find is Mac OS (PowerPC) that is Motorola processor instruction set. High Level Assembly (HLA) also quite popular among programmers. This paper will use both Windows and AT&T assembly. Whatever assembly used, there are several common categories of instructions based on their usages as listed in the following Table.
|
Instruction Category |
Meaning |
Example |
|
Data Transfer |
move from source to destination |
mov, lea, les, push, pop, pushf, popf |
|
Arithmetic |
arithmetic on integers |
add, adc, sub, sbb, mul, imul, div, idiv, cmp, neg, inc, dec, xadd, cmpxchg |
|
Floating point |
arithmetic on floating point |
fadd, fsub, fmul, div, cmp |
|
Logical, Shift, Rotate and Bit |
bitwise logic operations |
and, or, xor, not, shl/sal, shr, sar, shld and shrd, ror, rol, rcr andrcl |
|
Control transfer |
conditional and unconditional jumps, procedure calls |
jmp, jcc, call, ret, int, into, bound. |
|
String |
move, compare, input and output |
movs, lods, stos, scas, cmps, outs, rep, repz, repe, repnz, repne,ins |
|
I/O |
For input and output |
in, out |
|
Conversion |
Provide assembly data types conversion |
movzx, movsx, cbw, cwd, cwde, cdq, bswap, xlat |
|
Miscellaneous |
manipulate individual flags, provide special processor services, or handle privileged mode operations |
clc, stc, cmc, cld, std, cl, sti |
|
Table 2: Assembly instruction set categories. |
||
The following is C source code portion and the assembly equivalent example using Linux/Intel.
|
C code’s portion |
Label |
Mnemonic |
operands |
Comment |
|
if (a > b) |
movl |
a, %eax |
||
|
cmpl |
b, %eax |
#compare, a – b |
||
|
jle |
L1 |
#jump to L1 if a <= b |
||
|
c = a; |
movl |
a, %eax |
#a > b branch |
|
|
movl |
%eax, c |
|||
|
jmp |
L2 |
#finish, jump to L2 |
||
|
L1: |
#a <= b branch |
|||
|
else c = b; |
movl |
b, %eax |
||
|
movl |
%eax, c |
|||
|
L2: |
#Finish |
Figure 1: C and assembly codes.
Compilers available for assembly languages include Macro Assembler (MASM), GNU’s Assembler (GAS wiki, GAS manual), Borland’s TASM, Netwide (NASM) and GoASM. For HLA it is available from Webster at HLA.
COMPILER, ASSEMBLER, LINKER AND LOADER
Normally the C’s program building process involves four stages and utilizes different tools such as a preprocessor, compiler, assembler, and linker. At the end there should be a single executable image that ready to be loaded by loader as a running program. Below are the stages that happen in order regardless of the operating system/compiler and graphically illustrated in Figure 2.
Preprocessing is the first pass of any C compilation. It processes include-files, conditional compilation instructions and macros.
Compilation is the second pass. It takes the output of the preprocessor, and the source code, and generates assembler source code.
Assembly is the third stage of compilation. It takes the assembly source code and produces an assembly listing with offsets. The assembler output is stored in an object file.
Linking is the final stage of compilation. It takes one or more object files or libraries as input and combines them to produce a single (usually executable) file. In doing so, it resolves references to external symbols, assigns final addresses to procedures/functions and variables, and revises code and data to reflect new addresses (a process called relocation).
Loading the executable image for program running.
Bear in mind that if you use the Integrated Development Environment (IDE) type compilers, these processes quite transparent. Now we are going to examine more detail about the process that happens before and after the linking stage. For any given input file, the file name suffix (file extension) determines what kind of compilation is done and the example for gcc is listed in Table 3.
|
File extension |
Description |
|
file_name.c |
C source code which must be preprocessed. |
|
file_name.i |
C source code which should not be preprocessed. |
|
file_name.ii |
C++ source code which should not be preprocessed. |
|
file_name.h |
C header file (not to be compiled or linked). |
|
file_name.cc file_name.cp file_name.cxx file_name.cpp file_name.c++ file_name.C |
C++ source code which must be preprocessed. For file_name.cxx, the xx must both be literally character x and file_name.C, is capital c. |
|
file_name.s |
Assembler code. |
|
file_name.S |
Assembler code which must be preprocessed. |
|
file_name.o |
By default, the object file name for a source file is made by replacing the extension .c, .i, .s etc with .o |
|
Table 3: File suffix. |
|
The following Figure shows the steps involved in the process of building the C program starting from the compilation until the loading of the executable image into the memory for program running.
Figure 2: C program building process.
OBJECT FILES AND EXECUTABLE
After the source code has been assembled, it will produce an object files and then linked, producing an executable files. An object and executable come in several formats such as ELF (Executable and Linking Format) and COFF (Common Object-File Format). For example, ELF is used on Linux systems, while COFF is used on Windows systems. Other object file formats that you may find sometime somewhere is listed in the following Table.
Object File Format |
Description |
|
a.out |
The |
|
COFF |
The COFF (Common Object File Format) format was introduced with System V Release 3 (SVR3) Unix. COFF files may have multiple sections, each prefixed by a header. The number of sections is limited. The COFF specification includes support for debugging but the debugging information was limited. |
|
ECOFF |
A variant of COFF. ECOFF is an Extended COFF originally introduced for Mips and Alpha workstations. |
|
XCOFF |
The IBM RS/6000 running AIX uses an object file format called XCOFF (eXtended COFF). The COFF sections, symbols, and line numbers are used, but debugging symbols are |
|
PE |
Windows 9x and NT use the PE (Portable Executable) format for their executables. PE is basically COFF with additional headers. |
|
ELF |
The ELF (Executable and Linking Format) format came with System V Release 4 (SVR4) Unix. ELF is similar to COFF in being organized into a number of sections, but it removes many of COFF's limitations. ELF used on most modern Unix systems, including GNU/Linux, Solaris and Irix. Also used on many embedded systems. |
|
SOM/ESOM |
SOM (System Object Module) and ESOM (Extended SOM) is HP's object file and debug format (not to be confused with IBM's SOM, which is a cross-language Application Binary Interface - ABI). |
|
Table 4: Object file formats. |
|
|
When we examine the content of these object files there are areas called sections. Depend on the settings of the compilation and linking stages, sections can hold:
Some sections are loaded into the process image and some provide information needed in the building of a process image while still others are used only in linking object files. There are several sections that are common to all executable formats (may be named differently, depending on the compiler/linker) as listed below: |
|
Section |
Description |
|
.text |
This section contains the executable instruction codes and is shared among every process running the same binary. This section usually has READ and EXECUTE permissions only. This section is the one most affected by optimization. |
|
.bss |
BSS stands for ‘Block Started by Symbol’. It holds un-initialized global and static variables. Since the BSS only holds variables that don't have any values yet, it doesn't actually need to store the image of these variables. The size that BSS will require at runtime is recorded in the object file, but the BSS (unlike the data section) doesn't take up any actual space in the object file. |
|
.data |
Contains the initialized global and static variables and their values. It is usually the largest part of the executable. It usually hasREAD/WRITE permissions. |
|
.rdata |
Also known as .rodata (read-only data) section. This contains constants and string literals. |
|
.reloc |
Stores the information required for relocating the image while loading. |
|
Symbol table |
A symbol is basically a name and an address. Symbol table holds information needed to locate and relocate a program’s symbolic definitions and references. A symbol table index is a subscript into this array. Index 0 both designates the first entry in the table and serves as the undefined symbol index. The symbol table contains an array of symbol entries. |
|
Relocation records |
Relocation is the process of connecting symbolic references with symbolic definitions. For example, when a program calls a function, the associated call instruction must transfer control to the proper destination address at execution. Relocatable files must have relocation entries’ which are necessary because they contain information that describes how to modify their section contents, thus allowing executable and shared object files to hold the right information for a process's program image. Simply said relocation records are information used by the linker to adjust section contents. |
|
Table 5: Segments in executable file. |
|
The following is an example of the object file content dumped using readelf program (how to use the command was discussed in GCC & G++ 1 and GCC & G++ 2). Other program can be used is objdump. For Windows, dumpbin utility (coming with Visual C++ compiler) program can be used for the same purpose.
/* testprog1.c */
#include <stdio.h>
static void display(int i, int *ptr);
int main(void)
{
int x = 5;
int *xptr = &x;
printf("In main() program:\n");
printf("x value is %d and is stored at address %p.\n", x, &x);
printf("xptr pointer points to address %p which holds a value of %d.\n", xptr, *xptr);
display(x, xptr);
return 0;
}
void display(int y, int *yptr)
{
char var[7] = "ABCDEF";
printf("In display() function:\n");
printf("y value is %d and is stored at address %p.\n", y, &y);
printf("yptr pointer points to address %p which holds a value of %d.\n", yptr, *yptr);
}
[bodo@bakawali test]$ gcc -c testprog1.c
[bodo@bakawali test]$ readelf -a testprog1.o
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: REL (Relocatable file)
Machine: Intel 80386
Version: 0x1
Entry point address: 0x0
Start of program headers: 0 (bytes into file)
Start of section headers: 672 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 0 (bytes)
Number of program headers: 0
Size of section headers: 40 (bytes)
Number of section headers: 11
Section header string table index: 8
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00000000 000034 0000de 00 AX 0 0 4
[ 2] .rel.text REL 00000000 00052c 000068 08 9 1 4
[ 3] .data PROGBIT 00000000 000114 000000 00 WA 0 0 4
[ 4] .bss NOBIT 00000000 000114 000000 00 WA 0 0 4
[ 5] .rodata PROGBITS 00000000 000114 00010a 00 A 0 0 4
[ 6] .note.GNU-stack PROGBITS 00000000 00021e 000000 00 0 0 1
[ 7] .comment PROGBITS 00000000 00021e 000031 00 0 0 1
[ 8] .shstrtab STRTAB 00000000 00024f 000051 00 0 0 1
[ 9] .symtab SYMTAB 00000000 000458 0000b0 10 10 9 4
[10] .strtab STRTAB 00000000 000508 000021 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
There are no program headers in this file.
Relocation section '.rel.text' at offset 0x52c contains 13 entries:
Offset Info Type Sym.Value Sym. Name
0000002d 00000501 R_386_32 00000000 .rodata
00000032 00000a02 R_386_PC32 00000000 printf
00000044 00000501 R_386_32 00000000 .rodata
00000049 00000a02 R_386_PC32 00000000 printf
0000005c 00000501 R_386_32 00000000 .rodata
00000061 00000a02 R_386_PC32 00000000 printf
0000008c 00000501 R_386_32 00000000 .rodata
0000009c 00000501 R_386_32 00000000 .rodata
000000a1 00000a02 R_386_PC32 00000000 printf
000000b3 00000501 R_386_32 00000000 .rodata
000000b8 00000a02 R_386_PC32 00000000 printf
000000cb 00000501 R_386_32 00000000 .rodata
000000d0 00000a02 R_386_PC32 00000000 printf
There are no unwind sections in this file.
Symbol table '.symtab' contains 11 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 FILE LOCAL DEFAULT ABS testprog1.c
2: 00000000 0 SECTION LOCAL DEFAULT 1
3: 00000000 0 SECTION LOCAL DEFAULT 3
4: 00000000 0 SECTION LOCAL DEFAULT 4
5: 00000000 0 SECTION LOCAL DEFAULT 5
6: 00000080 94 FUNC LOCAL DEFAULT 1 display
7: 00000000 0 SECTION LOCAL DEFAULT 6
8: 00000000 0 SECTION LOCAL DEFAULT 7
9: 00000000 128 FUNC GLOBAL DEFAULT 1 main
10: 00000000 0 NOTYPE GLOBAL DEFAULT UND printf
No version information found in this file.
When writing a program using the assembly language it should be compatible with the sections in the assembler directives (x86) and the partial list that is interested to us is listed below:
|
Section |
Description |
|
|
1 |
Text (.section .text) |
Contain code (instructions). Contain the _start label. |
|
2 |
Read-Only Data (.section .rodata) |
Contains pre-initialized constants. |
|
3 |
Read-Write Data (.section .data) |
Contains pre-initialized variables. |
|
4 |
BSS (.section .bss) |
Contains un-initialized data. |
|
Table 6: Some sections in object file. |
||
The assembler directives in assembly programming can be used to identify code and data sections, allocate/initialize memory and making symbols externally visible or invisible. An example of the assembly code with some of the assembler directives (Intel) is shown below:
|
;initializing data |
|||
|
.section |
.data |
||
|
x: |
.byte |
;one byte initialized to 128 |
|
|
y: |
.long |
1,1000,10000 |
;3 long words |
|
;initializing ascii data |
|||
|
.ascii |
"hello" |
;ascii without null character |
|
|
asciz |
"hello" |
;ascii with \0 |
|
|
;allocating memory in bss |
|||
|
.section |
.bss |
||
|
.equ |
BUFFSIZE 1024 |
;define a constant |
|
|
.comm |
z, 4, 4 |
;allocate 4 bytes for x with 4-byte alignment |
|
|
;making symbols externally visible |
|||
|
.section |
.data |
||
|
.globl |
w |
;declare externally visible e.g: int w = 10 |
|
|
.text |
|||
|
.globl |
fool |
;e.g: fool(void) {…} |
|
|
fool: |
|||
|
… |
|||
|
leave |
|||
|
return |
|||
An Assembly Language的更多相关文章
- 1.2 ASSEMBLY LANGUAGE
People are much happier moving up the ladder,socially or even technically.So our profession has move ...
- Notes on <Assembly Language step by step>
By brant-ruan Yeah, I feel very happy When you want to give up, think why you have held on so long. ...
- Calling 64-bit assembly language functions lodged inside the Delphi source code
Code: http://www.atelierweb.com/calling-64-bit-assembly-language-functions-lodged-inside-the-delphi- ...
- PythonStudy——汇编语言 Assembly Language
汇编语言 汇编语言(assembly language)是一种用于电子计算机.微处理器.微控制器或其他可编程器件的低级语言,亦称为符号语言.在汇编语言中,用助记符(Mnemonics)代替机器指令的操 ...
- CS萌新的汇编学习之路02 Learning of Assembly Language
第二节课 寄存器 1. 寄存器的定义: 进行信息储存的器件,是CPU中程序员可以读写的部件,通过改变各种寄存器中的内容来实现对CPU的控制 2. 寄存器的种类: 本节课学习通用寄存器和段寄存器 2. ...
- CS萌新的汇编学习之路(其实是老师作业呵呵哒)Learning of Assembly Language
第一节课学习汇编语言,做笔记,做笔记 1.概念 首先是汇编语言这门课程的定义以及对于学习高级语言.深入理解计算机系统的作用 软硬件接口机器语言 汇编语言 高级语言 关系 机器语言和汇编语言可移植性差 ...
- 汇编语言教材assembly language
https://en.wikipedia.org/wiki/Assembly_language https://baike.baidu.com/item/%E6%B1%87%E7%BC%96%E8%A ...
- 《PC Assembly Language》读书笔记
本书下载地址:pcasm-book. 前言 8086处理器只支持实模式(real mode),不能满足安全.多任务等需求. Q:为什么实模式不安全.不支持多任务?为什么虚模式能解决这些问题? A: 以 ...
- IA-32 Assembly Language Reference Manual
Load Full Pointer (lds,les, lfs, lgs, and lss) lds{wl} mem[32|48], reg[16|32]les{wl} mem[32|48], reg ...
随机推荐
- redhat 网络配置
1. 查看网络 ifconfig 网卡名字(eth0.wlan0) ifconfig -a //查看所有网卡配置 2. 网卡打开\关闭 ifconfig eth0 down ifconfig eth0 ...
- Unity3d 发动机原理详细介绍
Unity3d 发动机原理详细介绍 www.MyException.Cn 发布于:2013-10-08 16:32:36 浏览:46次 0 Unity3d 引擎原理详细介绍 体系结构 ...
- Apache安全和强化的十三个技巧
Apache是一个很受欢迎的web服务器软件,其安全性对于网站的安全运营可谓生死攸关.下面介绍一些可帮助管理员在Linux上配置Apache确保其安全的方法和技巧. 本文假设你知道这些基本知识: 文档 ...
- Amzaon EC2虚拟化技术演进:从 Xen 到 Nitro
今年2月,由光环新网运营的 AWS 中国(北京)区域和由西云数据运营的 AWS 中国 (宁夏)区域发布新的实例类型,新的实例类型包括 C5.C5d.R5.R5d.除了这四种之外,在AWS国外部分区 ...
- webview长按保存图片
private String imgurl = ""; /*** * 功能:长按图片保存到手机 */ @Override public void onC ...
- Java 调用R 方法
JAVA 调用 R 语言 1 简介 R是统计计算的强大工具,而JAVA是做应用系统的主流语言,两者天然具有整合的需要.关于整合,一方面,R中可以创建JAVA对象调用JAVA方法,另一方面, ...
- IE下object元素遮挡div表单
目前遇到这样的一个问题: 我用ActiveX插件做了一个C#的播放器,要将这个插件放到web浏览器中,然后可以通过前台页面来控制视频的播放,暂停还有回放,这个时候发现object的onclick事件无 ...
- 一次 read by other session 的处理过程
一个哥们给我打电话.他说系统中一直出现等待事件 read by other session .而且该等待都是同一个sql引起的.比較紧急,请我帮忙远程看看. 远程过去之后,用脚本把 等待事件给抓 ...
- No breeds found in the signature, a signature update is recommended
cobbler 2.6.11 遇到这个问题,需要 >> cobbler signature update >> and cobblerd restart 转自: https:/ ...
- (总结)RHEL/CentOS 7.x的几点新改变
一.CentOS的Services使用了systemd来代替sysvinit管理 1.systemd的服务管理程序: systemctl是主要的工具,它融合之前service和chkconfig的功能 ...