[转]Data Structure Recovery using PIN and PyGraphviz
Source:http://v0ids3curity.blogspot.com/2015/04/data-structure-recovery-using-pin-and.html
--------------------------------
Data Structure Recovery using PIN and PyGraphviz
Tracing Size and Address of Allocations
Size of Allocation
Right now, we track libc functions malloc, realloc, calloc, sbrk, mmap and free. All these routines are instrumented using Rtn_InsertCall to fetch the size of requested allocation.
For example, for tracing malloc
- RTN_InsertCall( rtn,
- IPOINT_BEFORE,
- (AFUNPTR)AllocBefore,
- IARG_ADDRINT,
- funcname,
- IARG_G_ARG0_CALLEE,
- IARG_RETURN_IP,
- IARG_END);
We fetch the size of requested allocation using IARG_G_ARG0_CALLEE and IPOINT_BEFORE. Also, we need to identify the malloc calls that are only called from our main executable. To find this we use IARG_RETURN_IP to check if the return address of the call is part of main executable, if not we don't trace the allocation.
Address of Allocation
IARG_RETURN_IP is valid only at function entry point, so we cannot use IPOINT_AFTER along with IARG_RETURN_IP. As a work around, we save the return address during IPOINT_BEFORE. Then in instruction trace, if instruction pointer equals return address of an allocation call, we fetch the EAX value. This gives the address of allocation.
- if(insaddr == retaddress){
- INS_InsertCall( ins,
- IPOINT_BEFORE,
- (AFUNPTR)AllocAfter,
- #ifdef __i386__
- IARG_REG_VALUE, LEVEL_BASE::REG_EAX,
- #else
- IARG_REG_VALUE, LEVEL_BASE::REG_RAX,
- #endif
- IARG_END);
- }
Now we have both address and size of allocation. These details are stored as dictionary as pairs of address : size. Also we don't remove an address when free is called upon that, instead if an already existing address is returned during an allocation call ie. reallocation, we just update the size of existing allocation for the new allocation request.
- if(allocations.count(address)==0){
- allocations.insert(std::make_pair(address, allocsize));
- }
- else{
- std::map<addrint, addrint="" style="font-size: 14px;">::iterator it = allocations.find(retval);
- it->second = allocsize;
- }
.data and .bss sections
data and bss sections are also added to dictionary. The size and address of these segments are fetched from main executable and added as part of allocations
- if(!strcmp(sec_name.c_str(),".bss")||!strcmp(sec_name.c_str(),".data")){
- ADDRINT addr = SEC_Address(sec);
- USIZE size = SEC_Size(sec);
- if(allocations.count(addr)==0){
- allocations.insert(std::make_pair(addr, size));
- }
- }
Tracing Memory Writes
We trace instructions that writes into the allocated memory. As of now only XED_ICLASS_MOV class of instructions are traced. For all XED_ICLASS_MOV instruction, we check if its a memory write instruction using INS_IsMemoryWrite and is part of main executable.
In this case, we fetch the destination address of write operation using IARG_MEMORYWRITE_EA. Then we check if the destination address is part of any allocation, on success this instruction is traced.
- for(it = allocations.begin(); it != allocations.end(); it++){
- if((des_addr >= it->first)&&(des_addr < it->first+it->second))returntrue;
- }
Sample Trace
- .data[0x804b02c,0x8]
- .bss[0x804b040,0xfc4]
- 0x8048560@sbrk[0x420]
- ret[0x98de000]
- 0x8048565@mov dword ptr [0x804c000], eax : WRREG MEM[0x804c000] VAL[0x98de000]
- 0x8048575@mov dword ptr [eax+0x8],0x0 : WRIMM MEM[0x98de008] VAL[0]
- 0x804857f@mov dword ptr [edx+0x4], eax : WRREG MEM[0x98de004] VAL[0]
- 0x8048587@mov dword ptr [eax],0x10 : WRIMM MEM[0x98de000] VAL[0x10]
- 0x80485a0@mov dword ptr [eax+0x4], edx : WRREG MEM[0x98de004] VAL[0x98de010]
- 0x80485ac@mov dword ptr [eax+0x8], edx : WRREG MEM[0x98de018] VAL[0x98de000]
Graphing
Node Create
For each allocation in trace file generated by PIN tool, a new node is created in the graph. Each node is uniquely identified using a node id which is assigned sequentially. An ordered dictionary is maintained, key being node id and value is dictionary of address and size of allocation. New allocations are added to the start of ordered dictionary.
An edge count is associated with each of created node. This will be used for pruning away nodes without any edges.
Separate nodes are created for bss and data sections. But this is optional.
Example
Say a structure is allocated in heap using malloc, this is how a node will look like
- 0x80488c5@malloc[0x20]
- ret[0x8fcf030]
- -------------------
- |[0]0x8fcf030 |
- -------------------
[0] is the node id, this could signify the order of allocation. Every new allocator call gets a new id, irrespective of the return address
0x8fcf030 is the address returned by allocator call
Node Update
For each instruction, fetch the target address of write operation. If the target address is part of any allocation, update the node to which the target address belongs to. Basically we create a new port in the record node.
A new port signifies an element of an allocation, say element of a structure.
Then check if the source value is part of any allocation. If yes, we consider the source value as an address. Then update the node to which the source address belongs to. This operation could be interpreted as a pointer assignment [or link creation]
- 0x80488c5@malloc[0x20]
- ret[0x8fcf030]
- 0x8048957@movbyte ptr [eax+edx*1],0x0 : WRIMM MEM[0x8fcf031] VAL[0]
- 0x80489bb@mov dword ptr [eax+0x14], edx : WRREG MEM[0x8fcf044] VAL[0x8fcf058]
- 0x8048a40@mov dword ptr [eax+0x18], edx : WRREG MEM[0x8fcf048] VAL[0x8fcf008]
- 0x8048a4e@mov dword ptr [eax+0x1c], edx : WRREG MEM[0x8fcf04c] VAL[0x8fcf008]
- -------------------------------------------------------------------------
- |[0]0x8fcf030 |0x8fcf031 |0x8fcf044 | 0x8fcf048 | 0x8fcf04c |
- -------------------------------------------------------------------------
Now first field [0] 0x8fcf030 is the meta data for the chunk ie node id and return address of allocator call. The rest of 4 fields signifies 4 individual write operations into this allocation [example, 4 elements of a structure]
Create Link
If both source and destination values are valid address and belongs to a traced allocation, we link both ports of the nodes. Whenever a link is created, edge count of source and destination are incremented.
Similarly, during memory overwrite an edge is removed and edge count is decremented.
Example,
- 0x804882a@malloc[0x20]
- ret[0x8fcf008]
- …...
- 0x80488c5@malloc[0x20]
- ret[0x8fcf030]
- …...
- 0x80489bb@mov dword ptr [eax+0x14], edx : WRREG MEM[0x8fcf044] VAL[0x8fcf058]
- 0x8048a40@mov dword ptr [eax+0x18], edx : WRREG MEM[0x8fcf048] VAL[0x8fcf008]
- 0x8048a4e@mov dword ptr [eax+0x1c], edx : WRREG MEM[0x8fcf04c] VAL[0x8fcf008]
Above is a series of pointer writes into memory allocated at 0x8fcf030. The address points to another allocation at 0x8fcf008. Hence we link both
Prune Node
Finally after parsing all instructions, remove nodes that doesn't have any edges. For this, check if the edge count for a node is 0. If yes, remove the node.
Other Options
By default, we consider only the first non-NULL write operation for node update and link creation. This might be good enough to reveal some of data structures. Any memory writes to an address after first write non-NULL are skipped. But one can use relink option to consider more than single write operation for graphing. This could be useful when relink operations are done, say circular linked list.
NULL writes can also be enabled as option. This might be useful along with relink.
The tool itself doesn't itself have the intelligence to say what data structure is used, but can graph the allocation and links to help someone understand a data structure from the revealed shape.
Example - Singly Linked List

Example - Binary Tree

Example - HackIM Mixme Circular Doubly Linked List

The POC code which I use for CTF is available here. To repeat again, this works on small binaries, as things get complex the graph might make less sense. There is lot of scope for improvement though.
References
AES Whitebox Unboxing: No Such Problem, this served as excellent reference for the usage of PIN tool and pygraphviz to visualize memory access
Thanks to Danny K, for help with Intel PIN Framework.
[转]Data Structure Recovery using PIN and PyGraphviz的更多相关文章
- [LeetCode] All O`one Data Structure 全O(1)的数据结构
Implement a data structure supporting the following operations: Inc(Key) - Inserts a new key with va ...
- [LeetCode] Add and Search Word - Data structure design 添加和查找单词-数据结构设计
Design a data structure that supports the following two operations: void addWord(word) bool search(w ...
- [LeetCode] Two Sum III - Data structure design 两数之和之三 - 数据结构设计
Design and implement a TwoSum class. It should support the following operations:add and find. add - ...
- Finger Trees: A Simple General-purpose Data Structure
http://staff.city.ac.uk/~ross/papers/FingerTree.html Summary We present 2-3 finger trees, a function ...
- Mesh Data Structure in OpenCascade
Mesh Data Structure in OpenCascade eryar@163.com 摘要Abstract:本文对网格数据结构作简要介绍,并结合使用OpenCascade中的数据结构,将网 ...
- ✡ leetcode 170. Two Sum III - Data structure design 设计two sum模式 --------- java
Design and implement a TwoSum class. It should support the following operations: add and find. add - ...
- leetcode Add and Search Word - Data structure design
我要在这里装个逼啦 class WordDictionary(object): def __init__(self): """ initialize your data ...
- Java for LeetCode 211 Add and Search Word - Data structure design
Design a data structure that supports the following two operations: void addWord(word)bool search(wo ...
- HDU5739 Fantasia(点双连通分量 + Block Forest Data Structure)
题目 Source http://acm.hdu.edu.cn/showproblem.php?pid=5739 Description Professor Zhang has an undirect ...
随机推荐
- ASP.NET MVC上传文件----uploadify的使用
课程设计需要实现上传文件模块,本来ASP.NET是有内置的控件,但是ASP.NET MVC没有,所以就有两种方法:自定义和采用第三方插件.由于时间的关系,故采用第三方插件:uploadify. upl ...
- php_常用操作_读取文件_数据库操作
作为php新手 ,把经常用到的phpcode,做个备份 1: 文件处理 //读取配置 启动是指定文件 $filepath=$argv[1]; if(null==$filepath){ echo&quo ...
- POJ 2586 Y2K Accounting Bug(枚举洪水问题)
Y2K Accounting Bug Time Limit: 1000MS Memory Limit: 65536K Total Submissions: 10674 Accepted: 53 ...
- TS流文件
简单介绍编辑 随着从HDTV录制的高清节目在网上的流传,烧友们对TS这个名词大概已经不陌生了.但随之而来就是怎样播放.怎样加入字幕等等的一系列问题.本文将重点介绍一下这方面的应用操作. 先来简要介绍一 ...
- 各种加密解密函数(URL加密解密、sha1加密解密、des加密解密)
原文:各种加密解密函数(URL加密解密.sha1加密解密.des加密解密) 普通hash函数如md5.sha1.base64等都是不可逆函数.虽然我们利用php可以利用这些函数写出可逆函数来.但是跨语 ...
- 【百度地图API】如何获取行政区域的边界?
原文:[百度地图API]如何获取行政区域的边界? 摘要:以前教过大家如何自行获取行政区域,或者自定义获取一个区域的边界值.今天来教大家直接调用百度地图API1.3(目前最新版本)来获取行政区域的边界值 ...
- Qt5官方demo分析集29——Extending QML - Property Value Source Example
此系列的所有文章都可以在这里查看http://blog.csdn.net/cloud_castle/article/category/2123873 接上文Qt5官方demo解析集28--Extend ...
- Webbrowser控件史上最强技巧全集
原文:Webbrowser控件史上最强技巧全集 Webbrowser控件史上最强技巧全集 VB调用webbrowser技巧集 1.获得浏览器信息: Private Sub Command1_Click ...
- PL/SQL联系oracle成功可以sql解决的办法是检查表的名称无法显示
有时,因为机器突然断电或其他原因PL/SQL它甚至不能在数据库表后显示.序列和其它元素.使用SQL Windows运行查询一般查询,登录或同样的现象再次. 我是不是可以解决因重复登录的猜测是,PLSQ ...
- asp.net mvc3 数据验证(二)——错误信息的自定义及其本地化
原文:asp.net mvc3 数据验证(二)--错误信息的自定义及其本地化 一.自定义错误信息 在上一篇文章中所做的验证,在界面上提示的信息都是系统自带的,有些读起来比较生硬.比如: ...