Position Independent Code (PIC) in shared libraries on x64
E原文地址:http://eli.thegreenplace.net/2011/11/11/position-independent-code-pic-in-shared-libraries-on-x64/
The previous article explained how position independent code (PIC) works, with code compiled for the x86 architecture as an example. I promised to cover PIC on x64 [1] in a separate article, so here we are. This article will go into much less detail, since it assumes an understanding of how PIC works in theory. In general, the idea is similar for both platforms, but some details differ because of unique features of each architecture.
RIP-relative addressing
On x86, while function references (with the call instruction) use relative offsets from the instruction pointer, data references (with the mov instruction) only support absolute addresses. As we’ve seen in the previous article, this makes PIC code somewhat less efficient, since PIC by its nature requires making all offsets IP-relative; absolute addresses and position independence don’t go well together.
x64 fixes that, with a new "RIP-relative addressing mode", which is the default for all 64-bit movinstructions that reference memory (it’s used for other instructions as well, such as lea). A quote from the "Intel Architecture Manual vol 2a":
A new addressing form, RIP-relative (relative instruction-pointer) addressing, is implemented in 64-bit mode. An effective address is formed by adding displacement to the 64-bit RIP of the next instruction.
The displacement used in RIP-relative mode is 32 bits in size. Since it should be useful for both positive and negative offsets, roughly +/- 2GB is the maximal offset from RIP supported by this addressing mode.
x64 PIC with data references – an example
For easier comparison, I will use the same C source as in the data reference example of the previous article:
int myglob = 42; int ml_func(int a, int b)
{ return myglob + a + b;
}
Let’s look at the disassembly of ml_func:
00000000000005ec <ml_func>:
5ec: 55 push rbp
5ed: 48 89 e5 mov rbp,rsp
5f0: 89 7d fc mov DWORD PTR [rbp-0x4],edi
5f3: 89 75 f8 mov DWORD PTR [rbp-0x8],esi
5f6: 48 8b 05 db 09 20 00 mov rax,QWORD PTR [rip+0x2009db]
5fd: 8b 00 mov eax,DWORD PTR [rax]
5ff: 03 45 fc add eax,DWORD PTR [rbp-0x4]
602: 03 45 f8 add eax,DWORD PTR [rbp-0x8]
605: c9 leave
606: c3 ret
The most interesting instruction here is at 0x5f6: it places the address of myglobal into rax, by referencing an entry in the GOT. As we can see, it uses RIP relative addressing. Since it’s relative to the address of the next instruction, what we actually get is 0x5fd + 0x2009db = 0x200fd8. So the GOT entry holding the address of myglob is at 0x200fd8. Let’s check if it makes sense:
$ readelf -S libmlpic_dataonly.so
There are 35 section headers, starting at offset 0x13a8: Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align [...]
[20] .got PROGBITS 0000000000200fc8 00000fc8
0000000000000020 0000000000000008 WA 0 0 8
[...]
GOT starts at 0x200fc8, so myglob is in its third entry. We can also see the relocation inserted for the GOT reference to myglob:
$ readelf -r libmlpic_dataonly.so Relocation section '.rela.dyn' at offset 0x450 contains 5 entries:
Offset Info Type Sym. Value Sym. Name + Addend
[...]
000000200fd8 000500000006 R_X86_64_GLOB_DAT 0000000000201010 myglob + 0
[...]
Indeed, a relocation entry for 0x200fd8 telling the dynamic linker to place the address of myglobinto it once the final address of this symbol is known.
So it should be quite clear how the address of myglob is obtained in the code. The next instruction in the disassembly (at 0x5fd) then dereferences the address to get the value of myglob into eax [2].
x64 PIC with function calls – an example
Now let’s see how function calls work with PIC code on x64. Once again, we’ll use the same example from the previous article:
int myglob = 42; int ml_util_func(int a)
{ return a + 1;
} int ml_func(int a, int b)
{ int c = b + ml_util_func(a);
myglob += c; return b + myglob;
}
Disassembling ml_func, we get:
000000000000064b <ml_func>:
64b: 55 push rbp
64c: 48 89 e5 mov rbp,rsp
64f: 48 83 ec 20 sub rsp,0x20
653: 89 7d ec mov DWORD PTR [rbp-0x14],edi
656: 89 75 e8 mov DWORD PTR [rbp-0x18],esi
659: 8b 45 ec mov eax,DWORD PTR [rbp-0x14]
65c: 89 c7 mov edi,eax
65e: e8 fd fe ff ff call 560 <ml_util_func@plt>
[... snip more code ...]
The call is, as before, to ml_util_func@plt. Let’s see what’s there:
0000000000000560 <ml_util_func@plt>:
560: ff 25 a2 0a 20 00 jmp QWORD PTR [rip+0x200aa2]
566: 68 01 00 00 00 push 0x1
56b: e9 d0 ff ff ff jmp 540 <_init+0x18>
So, the GOT entry holding the actual address of ml_util_func is at 0x200aa2 + 0x566 = 0x201008.
And there’s a relocation for it, as expected:
$ readelf -r libmlpic.so Relocation section '.rela.dyn' at offset 0x480 contains 5 entries:
[...] Relocation section '.rela.plt' at offset 0x4f8 contains 2 entries:
Offset Info Type Sym. Value Sym. Name + Addend
[...]
000000201008 000600000007 R_X86_64_JUMP_SLO 000000000000063c ml_util_func + 0
Performance implications
In both examples, it can be seen that PIC on x64 requires less instructions than on x86. On x86, the GOT address is loaded into some base register (ebx by convention) in two steps – first the address of the instruction is obtained with a special function call, and then the offset to GOT is added. Both steps aren’t required on x64, since the relative offset to GOT is known to the linker and can simply be encoded in the instruction itself with RIP relative addressing.
When calling a function, there’s also no need to prepare the GOT address in ebx for the trampoline, as the x86 code does, since the trampoline just accesses its GOT entry directly through RIP-relative addressing.
So PIC on x64 still requires extra instructions when compared to non-PIC code, but the additional cost is smaller. The indirect cost of tying down a register to use as the GOT pointer (which is painful on x86) is also gone, since no such register is needed with RIP-relative addressing [3]. All in all, x64 PIC results in a much smaller performance hit than on x86, making it much more attractive. So attractive, in fact, that it’s the default method for writing shared libraries for this architecture.
Extra credit: Non-PIC code on x64
Not only does gcc encourage you to use PIC for shared libraries on x64, it requires it by default. For instance, if we compile the first example without -fpic [4] and then try to link it into a shared library (with -shared), we’ll get an error from the linker, something like this:
/usr/bin/ld: ml_nopic_dataonly.o: relocation R_X86_64_PC32 against symbol `myglob' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Bad value
collect2: ld returned 1 exit status
What’s going on? Let’s look at the disassembly of ml_nopic_dataonly.o [5]:
0000000000000000 <ml_func>:
0: 55 push rbp
1: 48 89 e5 mov rbp,rsp
4: 89 7d fc mov DWORD PTR [rbp-0x4],edi
7: 89 75 f8 mov DWORD PTR [rbp-0x8],esi
a: 8b 05 00 00 00 00 mov eax,DWORD PTR [rip+0x0]
10: 03 45 fc add eax,DWORD PTR [rbp-0x4]
13: 03 45 f8 add eax,DWORD PTR [rbp-0x8]
16: c9 leave
17: c3 ret
Note how myglob is accessed here, in instruction at address 0xa. It expects the linker to patch in a relocation to the actual location of myglob into the operand of the instruction (so no GOT redirection is required):
$ readelf -r ml_nopic_dataonly.o Relocation section '.rela.text' at offset 0xb38 contains 1 entries:
Offset Info Type Sym. Value Sym. Name + Addend
00000000000c 000f00000002 R_X86_64_PC32 0000000000000000 myglob - 4
[...]
Here is the R_X86_64_PC32 relocation the linker was complaining about. It just can’t link an object with such relocation into a shared library. Why? Because the displacement of the mov (the part that’s added to rip) must fit in 32 bits, and when a code gets into a shared library, we just can’t know in advance that 32 bits will be enough. After all, this is a full 64-bit architecture, with a vast address space. The symbol may eventually be found in some shared library that’s farther away from the reference than 32 bits will allow to reference. This makes R_X86_64_PC32 an invalid relocation for shared libraries on x64.
But can we still somehow create non-PIC code on x64? Yes! We should be instructing the compiler to use the "large code model", by adding the -mcmodel=large flag. The topic of code models is interesting, but explaining it would just take us too far from the real goal of this article [6]. So I’ll just say briefly that a code model is a kind of agreement between the programmer and the compiler, where the programmer makes a certain promise to the compiler about the size of offsets the program will be using. In exchange, the compiler can generate better code.
It turns out that to make the compiler generate non-PIC code on x64 that actually pleases the linker, only the large code model is suitable, because it’s the least restrictive. Remember how I explained why the simple relocation isn’t good enough on x64, for fear of an offset which will get farther than 32 bits away during linking? Well, the large code model basically gives up on all offset assumptions and uses the largest 64-bit offsets for all its data references. This makes load-time relocations always safe, and enables non-PIC code generation on x64. Let’s see the disassembly of the first example compiled without -fpic and with -mcmodel=large:
0000000000000000 <ml_func>:
0: 55 push rbp
1: 48 89 e5 mov rbp,rsp
4: 89 7d fc mov DWORD PTR [rbp-0x4],edi
7: 89 75 f8 mov DWORD PTR [rbp-0x8],esi
a: 48 b8 00 00 00 00 00 mov rax,0x0
11: 00 00 00
14: 8b 00 mov eax,DWORD PTR [rax]
16: 03 45 fc add eax,DWORD PTR [rbp-0x4]
19: 03 45 f8 add eax,DWORD PTR [rbp-0x8]
1c: c9 leave
1d: c3 ret
The instruction at address 0xa places the address of myglob into eax. Note that its argument is currently 0, which tells us to expect a relocation. Note also that it has a full 64-bit address argument. Moreover, the argument is absolute and not RIP-relative [7]. Note also that two instructions are actually required here to get the value of myglob into eax. This is one reason why the large code model is less efficient than the alternatives.
Now let’s see the relocations:
$ readelf -r ml_nopic_dataonly.o Relocation section '.rela.text' at offset 0xb40 contains 1 entries:
Offset Info Type Sym. Value Sym. Name + Addend
00000000000c 000f00000001 R_X86_64_64 0000000000000000 myglob + 0
[...]
Note the relocation type has changed to R_X86_64_64, which is an absolute relocation that can have a 64-bit value. It’s acceptable by the linker, which will now gladly agree to link this object file into a shared library.
Some judgmental thinking may bring you to ponder why the compiler generated code that isn’t suitable for load-time relocation by default. The answer to this is simple. Don’t forget that code also tends to get directly linked into executables, which don’t require load-time relocations at all. Therefore, by default the compiler assumes the small code model to generate the most efficient code. If you know your code is going to get into a shared library, and you don’t want PIC, then just tell it to use the large code model explicitly. I think gcc‘s behavior makes sense here.
Another thing to think about is why there are no problems with PIC code using the small code model. The reason is that the GOT is always located in the same shared library as the code that references it, and unless a single shared library is big enough for a 32-bit address space, there should be no problems addressing the PIC with 32-bit RIP-relative offsets. Such huge shared libraries are unlikely, but in case you’re working on one, the AMD64 ABI has a "large PIC code model" for this purpose.
Conclusion
This article complements its predecessor by showing how PIC works on the x64 architecture. This architecture has a new addressing mode that helps PIC code be faster, and thus makes it more desirable for shared libraries than on x86, where the cost is higher. Since x64 is currently the most popular architecture used in servers, desktops and laptops, this is important to know. Therefore, I tried to focus on additional aspects of compiling code into shared libraries, such as non-PIC code. If you have any questions and/or suggestions on future directions to explore, please let me know in the comments or by email.
[1] | As always, I’m using x64 as a convenient short name for the architecture known as x86-64, AMD64 or Intel 64. |
[2] | Into eax and not rax because the type of myglob is int, which is still 32-bit on x64. |
[3] | By the way, it would be much less "painful" to tie down a register on x64, since it has twice as many GPRs as x86. |
[4] | It also happens if we explicitly specify we don’t want PIC by passing -fno-pic to gcc. |
[5] | Note that unlike other disassembly listings we’ve been looking at in this and the previous article, this is an object file, not a shared library or executable. Therefore it will contain some relocations for the linker. |
[6] | For some good information on this subject, take a look at the AMD64 ABI, and man gcc. |
[7] | Some assemblers call this instruction movabs to distinguish it from the other mov instructions that accept a relative argument. The Intel architecture manual, however, keeps naming it just mov. Its opcode format is REX.W + B8 + rd. |
Related posts:
- Position Independent Code (PIC) in shared libraries
- Load-time relocation of shared libraries
- Understanding the x64 code models
- Where the top of the stack is on x86
- Finding out the mouse click position on a canvas with Javascript
阅读(343) | 评论(0) | 转发(0) |
-->
Position Independent Code (PIC) in shared libraries on x64的更多相关文章
- Position Independent Code (PIC) in shared libraries
E原文地址:http://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries/下一文: ...
- qt开发ROS gui 遇到:global.h:1087:4: error: #error "You must build your code with position independent code if Qt was built with -reduce-relocations. "......
具体错误如下: 一共出现38个错误 这个错误是在导入cmakelists.txt时产生的,其实不是工程本身的问题,是因为我卸载ros,再重新安装ros的过程中把qtcreator的部分包给删除了,导致 ...
- Load-time relocation of shared libraries
E原文地址:http://eli.thegreenplace.net/2011/08/25/load-time-relocation-of-shared-libraries/ This article ...
- Shared libraries with GCC on Linux
转:http://www.cprogramming.com/tutorial/shared-libraries-linux-gcc.html By anduril462 Libraries are a ...
- Shared Libraries with Eclipse on 86_64 (64 bits) systems[z]
If you followed my previous post http://linuxtortures.blogspot.com/2012/02/shared-libraries-with-ecl ...
- Shared libraries
Computer Systems A Programmer's Perspective Second Edition Shared libraries are modern innovations t ...
- [持续交付实践] pipeline使用:Shared Libraries
前言 随着pipeline交付流水线在团队中的推广,使用pipeline脚本的job也迅速增加.虽然我们已经基于公司的技术栈特点做了一个尽可能通用的pipeline脚本样例,让搭建者只需要修改几个赋值 ...
- 运行 puppeteer 报错 chrome: error while loading shared libraries: libpangocairo-1.0.so.0: cannot open shared object file: No such file or directory
运行 puppeteer 报错 chrome: error while loading shared libraries: libpangocairo-1.0.so.0: cannot open sh ...
- linux使用wkhtmltopdf报错error while loading shared libraries:
官网提示 linux需要这些动态库.depends on: zlib, fontconfig, freetype, X11 libs (libX11, libXext, libXrender) 在li ...
随机推荐
- SublimeText3常用插件安装与使用
packagecontroller的安装 https://packagecontrol.io/ 安装了它就可以更好的进行插件的安装和管理 复制代码,打开控制面板[ctrl+·]将代码拷贝,即可进行安装 ...
- 在TreeView 控件上,如果双击任何一个节点的checkbox 只会收到一次After_Check事件 但是check属性变化两次(从false到true 再从true到false),请问该如何解决,谢谢!
在TreeView 控件上,如果双击任何一个节点的checkbox 只会收到一次After_Check事件 但是check属性变化两次(从false到true 再从true到false),请问该如何解 ...
- 图像处理笔记(1): bmp文件结构处理与显示
1.1图和调色板的概念 如今Windows(3.x以及95,98,NT)系列已经成为绝大多数用户使用的操作系统,它比DOS成功的一个重要因素是它可视化的漂亮界面.那么Windows是如何显示图象的呢? ...
- appstore 上传需要的icon
<key>CFBundleIconFiles</key><array> <string>icon@2x.png</string> <s ...
- 移植LWIP(ENC28J60)
上图就是整个移植的基本思路,非常清晰的三个层次.其实想想,本质上就是收发数据,只是LWIP协议通过对数据的封装可以实现网络传输.从图中我们就可以看到这里首先需要ENC28J60的驱动,这个驱动需 ...
- 【转】配置Jmeter的自定义参数
配置Jmeter的自定义参数 User Defined Variables 在这个控件中,定义你所需要的参数,如 在对应的需要使用参数的位置,使用${host}替代. 应用场景: 当测试环境变化时,我 ...
- js中如何将字符串转化为时间,并计算时间差
在前台页面开发时通常会用到计算两个时间的时间差,先在此附上实现方法 //结束时间 end_str = ("2014-01-01 10:15:00").replace(/-/g,&q ...
- python开发mysql:视图、触发器、事务、存储过程、函数
一 视图 视图是一个虚拟表(非真实存在),其本质是[根据SQL语句获取动态的数据集,并为其命名],用户使用时只需使用[名称]即可获取结果集,可以将该结果集当做表来使用. 使用视图我们可以把查询过程中的 ...
- python开发mysql:mysql安装(windows)&密码找回&存储引擎简介&库表的增删改查
一,mysql安装 下载地址 https://dev.mysql.com/downloads/file/?id=471342 解压后,将目录C:\mysql-5.7.19-winx64\bin添加到计 ...
- 关于object-c类目的理解
类目:为已知的类增加新的方法: 一.类目: 1. 类目方法的应用: 对现有类进行扩展:比如:可以扩展Cocoa touch框架中的类,在类目中增加的方法会被子类继承,而且在运行时跟其他的方法没有区别. ...