Source:http://v0ids3curity.blogspot.com/2015/04/data-structure-recovery-using-pin-and.html

--------------------------------

Data Structure Recovery using PIN and PyGraphviz

 
This is a simple POC PIN tool to recover data structures in dynamically linked stripped executables, mainly for analyzing small programs. The PIN tool keeps track of heap allocations done by executable and traces all the write operations to the allocated heap memory. The trace file having allocations and write operations will be used to generate graph using pygraphviz.

Tracing Size and Address of Allocations

Size of Allocation

Right now, we track libc functions malloc, realloc, calloc, sbrk, mmap and free. All these routines are instrumented using Rtn_InsertCall to fetch the size of requested allocation.

For example, for tracing malloc

  1. RTN_InsertCall( rtn,
  2.                     IPOINT_BEFORE,
  3.                     (AFUNPTR)AllocBefore,
  4.                     IARG_ADDRINT,
  5.                     funcname,
  6.                     IARG_G_ARG0_CALLEE,
  7.                     IARG_RETURN_IP,
  8.                     IARG_END);

We fetch the size of requested allocation using IARG_G_ARG0_CALLEE and IPOINT_BEFORE. Also, we need to identify the malloc calls that are only called from our main executable. To find this we use IARG_RETURN_IP to check if the return address of the call is part of main executable, if not we don't trace the allocation. 

Address of Allocation

IARG_RETURN_IP is valid only at function entry point, so we cannot use IPOINT_AFTER along with IARG_RETURN_IP. As a work around, we save the return address during IPOINT_BEFORE. Then in instruction trace, if instruction pointer equals return address of an allocation call, we fetch the EAX value. This gives the address of allocation.

  1. if(insaddr == retaddress){
  2.         INS_InsertCall( ins,
  3.                 IPOINT_BEFORE,
  4.                 (AFUNPTR)AllocAfter,
  5.                 #ifdef __i386__
  6.                 IARG_REG_VALUE, LEVEL_BASE::REG_EAX,
  7.                 #else
  8.                 IARG_REG_VALUE, LEVEL_BASE::REG_RAX,
  9.                 #endif
  10.                 IARG_END);
  11. }

Now we have both address and size of allocation. These details are stored as dictionary as pairs of address : size. Also we don't remove an address when free is called upon that, instead if an already existing address is returned during an allocation call ie. reallocation, we just update the size of existing allocation for the new allocation request.

  1. if(allocations.count(address)==0){
  2.         allocations.insert(std::make_pair(address, allocsize));
  3. }
  4. else{
  5.         std::map<addrint, addrint="" style="font-size: 14px;">::iterator it = allocations.find(retval);
  6.         it->second = allocsize;
  7. }

.data and .bss sections

data and bss sections are also added to dictionary. The size and address of these segments are fetched from main executable and added as part of allocations

  1. if(!strcmp(sec_name.c_str(),".bss")||!strcmp(sec_name.c_str(),".data")){
  2.                 ADDRINT addr = SEC_Address(sec);
  3.                 USIZE size = SEC_Size(sec);
  4.                 if(allocations.count(addr)==0){
  5.                     allocations.insert(std::make_pair(addr, size));
  6.                 }
  7. }

Tracing Memory Writes

We trace instructions that writes into the allocated memory. As of now only XED_ICLASS_MOV class of instructions are traced. For all XED_ICLASS_MOV instruction, we check if its a memory write instruction using INS_IsMemoryWrite and is part of main executable.

In this case, we fetch the destination address of write operation using IARG_MEMORYWRITE_EA. Then we check if the destination address is part of any allocation, on success this instruction is traced.

  1. for(it = allocations.begin(); it != allocations.end(); it++){
  2.         if((des_addr >= it->first)&&(des_addr < it->first+it->second))returntrue;
  3. }

Sample Trace

  1. .data[0x804b02c,0x8]
  2. .bss[0x804b040,0xfc4]
  3. 0x8048560@sbrk[0x420]
  4. ret[0x98de000]
  5. 0x8048565@mov dword ptr [0x804c000], eax             : WRREG MEM[0x804c000] VAL[0x98de000]
  6. 0x8048575@mov dword ptr [eax+0x8],0x0               : WRIMM MEM[0x98de008] VAL[0]
  7. 0x804857f@mov dword ptr [edx+0x4], eax               : WRREG MEM[0x98de004] VAL[0]
  8. 0x8048587@mov dword ptr [eax],0x10                  : WRIMM MEM[0x98de000] VAL[0x10]
  9. 0x80485a0@mov dword ptr [eax+0x4], edx               : WRREG MEM[0x98de004] VAL[0x98de010]
  10. 0x80485ac@mov dword ptr [eax+0x8], edx               : WRREG MEM[0x98de018] VAL[0x98de000]

Graphing

Node Create

For each allocation in trace file generated by PIN tool, a new node is created in the graph. Each node is uniquely identified using a node id which is assigned sequentially. An ordered dictionary is maintained, key being node id and value is dictionary of address and size of allocation. New allocations are added to the start of ordered dictionary. 

An edge count is associated with each of created node. This will be used for pruning away nodes without any edges.

Separate nodes are created for bss and data sections. But this is optional.

Example

Say a structure is allocated in heap using malloc, this is how a node will look like

  1. 0x80488c5@malloc[0x20]
  2. ret[0x8fcf030]
  1. -------------------
  2. |[0]0x8fcf030   |
  3. -------------------

[0] is the node id, this could signify the order of allocation. Every new allocator call gets a new id, irrespective of the return address

0x8fcf030 is the address returned by allocator call

Node Update

For each instruction, fetch the target address of write operation. If the target address is part of any allocation, update the node to which the target address belongs to. Basically we create a new port in the record node.

A new port signifies an element of an allocation, say element of a structure.

Then check if the source value is part of any allocation. If yes, we consider the source value as an address. Then update the node to which the source address belongs to. This operation could be interpreted as a pointer assignment [or link creation]

  1. 0x80488c5@malloc[0x20]
  2. ret[0x8fcf030]
  3. 0x8048957@movbyte ptr [eax+edx*1],0x0      : WRIMM MEM[0x8fcf031] VAL[0]
  4. 0x80489bb@mov dword ptr [eax+0x14], edx      : WRREG MEM[0x8fcf044] VAL[0x8fcf058]
  5. 0x8048a40@mov dword ptr [eax+0x18], edx      : WRREG MEM[0x8fcf048] VAL[0x8fcf008]
  6. 0x8048a4e@mov dword ptr [eax+0x1c], edx      : WRREG MEM[0x8fcf04c] VAL[0x8fcf008]
  1. -------------------------------------------------------------------------
  2. |[0]0x8fcf030   |0x8fcf031  |0x8fcf044  |  0x8fcf048  |  0x8fcf04c  |
  3. -------------------------------------------------------------------------

Now first field [0] 0x8fcf030 is the meta data for the chunk ie node id and return address of allocator call. The rest of 4 fields signifies 4 individual write operations into this allocation [example, 4 elements of a structure] 

Create Link

If both source and destination values are valid address and belongs to a traced allocation, we link both ports of the nodes. Whenever a link is created, edge count of source and destination are incremented.

Similarly, during memory overwrite an edge is removed and edge count is decremented. 

Example,

  1. 0x804882a@malloc[0x20]
  2. ret[0x8fcf008]
  3. …...
  4. 0x80488c5@malloc[0x20]
  5. ret[0x8fcf030]
  6. …...
  7. 0x80489bb@mov dword ptr [eax+0x14], edx             : WRREG MEM[0x8fcf044] VAL[0x8fcf058]
  8. 0x8048a40@mov dword ptr [eax+0x18], edx             : WRREG MEM[0x8fcf048] VAL[0x8fcf008]
  9. 0x8048a4e@mov dword ptr [eax+0x1c], edx             : WRREG MEM[0x8fcf04c] VAL[0x8fcf008]

Above is a series of pointer writes into memory allocated at 0x8fcf030. The address points to another allocation at 0x8fcf008. Hence we link both

Prune Node

Finally after parsing all instructions, remove nodes that doesn't have any edges. For this, check if the edge count for a node is 0. If yes, remove the node.

Other Options

By default, we consider only the first non-NULL write operation for node update and link creation. This might be good enough to reveal some of data structures. Any memory writes to an address after first write non-NULL are skipped. But one can use relink option to consider more than single write operation for graphing. This could be useful when relink operations are done, say circular linked list.

NULL writes can also be enabled as option. This might be useful along with relink.

The tool itself doesn't itself have the intelligence to say what data structure is used, but can graph the allocation and links to help someone understand a data structure from the revealed shape.

Example - Singly Linked List

Example - Binary Tree

Example - HackIM Mixme Circular Doubly Linked List

The POC code which I use for CTF is available here. To repeat again, this works on small binaries, as things get complex the graph might make less sense. There is lot of scope for improvement though.

References

AES Whitebox Unboxing: No Such Problem, this served as excellent reference for the usage of PIN tool and pygraphviz to visualize memory access

Thanks to Danny K, for help with Intel PIN Framework.

[转]Data Structure Recovery using PIN and PyGraphviz的更多相关文章

  1. [LeetCode] All O`one Data Structure 全O(1)的数据结构

    Implement a data structure supporting the following operations: Inc(Key) - Inserts a new key with va ...

  2. [LeetCode] Add and Search Word - Data structure design 添加和查找单词-数据结构设计

    Design a data structure that supports the following two operations: void addWord(word) bool search(w ...

  3. [LeetCode] Two Sum III - Data structure design 两数之和之三 - 数据结构设计

    Design and implement a TwoSum class. It should support the following operations:add and find. add - ...

  4. Finger Trees: A Simple General-purpose Data Structure

    http://staff.city.ac.uk/~ross/papers/FingerTree.html Summary We present 2-3 finger trees, a function ...

  5. Mesh Data Structure in OpenCascade

    Mesh Data Structure in OpenCascade eryar@163.com 摘要Abstract:本文对网格数据结构作简要介绍,并结合使用OpenCascade中的数据结构,将网 ...

  6. ✡ leetcode 170. Two Sum III - Data structure design 设计two sum模式 --------- java

    Design and implement a TwoSum class. It should support the following operations: add and find. add - ...

  7. leetcode Add and Search Word - Data structure design

    我要在这里装个逼啦 class WordDictionary(object): def __init__(self): """ initialize your data ...

  8. Java for LeetCode 211 Add and Search Word - Data structure design

    Design a data structure that supports the following two operations: void addWord(word)bool search(wo ...

  9. HDU5739 Fantasia(点双连通分量 + Block Forest Data Structure)

    题目 Source http://acm.hdu.edu.cn/showproblem.php?pid=5739 Description Professor Zhang has an undirect ...

随机推荐

  1. CSharp设计模式读书笔记(13):代理模式(学习难度:★★★☆☆,使用频率:★★★★☆)

    代理模式:给某一个对象提供一个代理或占位符,并由代理对象来控制对原对象的访问. 模式角色与结构: 示例代码: using System; using System.Collections.Generi ...

  2. Vs2012 构建配置 Lua5.2.3

    随着手机游戏client程序员,当然,遇到这样的问题,该游戏已经提交出版.但第二天一早,发现有一个逻辑游戏BUG.怎么办,不严重,在一般情况下,非强制性的更新.假设一个严重BUG,他们将不得不强制更新 ...

  3. Excel 删除所有错误公式

    当前工作表的话可以F5-定位-公式-错误值 来选中所有含错误值的单元格,然后按delete删除. 多表的话没办法了,因为不能跨工作表多重选中,只能一页页的删,或者用vba编个宏来解决

  4. IOS中 init和initialize

    一.init和initialize 1.方法类型 1> init属于对象方法,-开头 2> initialize属于类方法,+开头 2.调用时刻 1> init:每个对象初始化的时候 ...

  5. 分布式基础学习(2)分布式计算系统(Map/Reduce)

    二. 分布式计算(Map/Reduce) 分 布式式计算,同样是一个宽泛的概念,在这里,它狭义的指代,按Google Map/Reduce框架所设计的分布式框架.在Hadoop中,分布式文件 系统,很 ...

  6. 多线程学习之BlockingQueue

    前言: 在新增的Concurrent包中,BlockingQueue很好的解决了多线程中,如何高效安全“传输”数据的问题.通过这些高效并且线程安全的队列 类,为我们快速搭建高质量的多线程程序带来极大的 ...

  7. 如何把一个c语言程序做成windows服务开机自启动

    原文:如何把一个c语言程序做成windows服务开机自启动 目前写的程序是一个用c语言实现socket侦听的,那么如何把这个程序做成开机自启动呢? 我们是通过vs6.0,编译后生成了.exe文件,然后 ...

  8. Linux下tomcat管理查看控制台|杀死tomcat进程

    查看控制台 # tail -f catalina.out 脚本执行权限chmod u+x *.sh #看是否已经有tomcat在运行了 ps -ef |grep tomcat #如果有,用kill; ...

  9. 【iOS】iOS它Container View获得ViewController

    近期使用Container View来在主View Controller建立自己的子Controller,但是遇到问题.不知道怎样用代码获取Controller View附带的View Control ...

  10. ASP.NET4.0新特性

    原文:ASP.NET4.0新特性 在以前试用VS2010的时候已经关注到它在Web开发支持上的一些变化了,为此我还专门做了一个ppt,当初是计划在4月12日那天讲的,结果因为莫名其妙的原因导致没有语音 ...