ref:https://www.sigpwn.io/blog/2018/5/13/adding-afl-bloom-filter-to-domato-for-fun

Adding AFL Bloom Filter to Domato for Fun

So I have been playing with Domato for few weeks now and wrote a blog about it's internals along the way. Unfortunately there is no feedback mechanism in the open-sourced version of Domato, although according to the blog post by Ivan Fratic (@ifsecure) of google project zero they are experimenting with coverage guided fuzzing using modified version of WinAFL's DynamoRIO client. I had a chat with Richard Johnson (@richinseattle) of Cisco Talos Vulndev Team while staying in Dubai for OPCDE 2018 and he gave me an idea of adding some kind of feedback mechanism in the syntax level instead of binary level feedback. First thing that came up to my mind was AFL's bloom filter. Since it is known to be so effective, why not try adding this to Domato as well?

Adding unique IDs to grammar rules

If you read my previous post, you will have understanding of how grammar rules are parsed and stored in data structures. In AFL, unique ids are assigned to each basic block(对于基本块制定一个唯一的id). In our case, assigning unique ids to each grammar rules seems appropriate since order of selecting certain grammar rules for expanding symbols is similar to visiting basic blocks in the control flow graph. (以选择不同的语法组合成为代码覆盖率计算的替代。)

So in the grammar.py file of Domato, I made following changes:

def _parse_code_line(self, line, helper_lines=False):
"""Parses a rule for generating code."""
rule = {
'type': 'code',
'parts': [],
'creates': [],
'uid' : self._get_new_uid() # add this field
} def _parse_grammar_line(self, line):
"""Parses a grammar rule."""
# Check if the line matches grammar rule pattern (<tagname> = ...).
match = re.match(r'^<([^>]*)>\s*=\s*(.*)$', line)
if not match:
raise GrammarError('Error parsing rule ' + line) # Parse the line to create a grammar rule.
rule = {
'type': 'grammar',
'creates': self._parse_tag_and_attributes(match.group(1)),
'parts': [],
'uid': self._get_new_uid() # add this field
}

Now each rule contains a field named 'uid' that stores unique integer value.

Selecting less chosen rule for expanding symbols

To record which rules were chosen in the previous generated case, I added a Python dictionary for this purpose.

# coverage map
self._cov = {}

Now when choosing the rule for expanding current symbol, we need to reference the coverage map. Note that algorithm used in AFL is like following.

cur_location = <COMPILE_TIME_RANDOM>;//选择的rule
shared_mem[cur_location ^ prev_location]++;
prev_location = cur_location >> 1;//prev_location代表之前选择的rule,以rule覆盖率来替代目标运行代码覆盖率

So in our case,  cur_location is the unique id of a rule we are considering to select. prev_location is previously chosen rule. Following code checks whether if there is an never selected order of grammar rules (thus bitmap_idx is not in the coverage map,代码选择顺序是否出现过) and selects first encountered rule of this case.

If every combination of previously selected grammar rule and current rule candidates have been selected before (so every possible bitmap_idx is already present as a key in coverage map), code saves how many times certain combination has been chosen.

def _select_creator(self, symbol, recursion_depth, force_nonrecursive):
...
# select creator based on coverage map
# select creator that has been selected less
hits = []
for c in creators:
bitmap_idx = c['uid'] ^ self._prev_id
if bitmap_idx not in self._cov:
self._cov[bitmap_idx] = 1
self._prev_id = c['uid'] >> 1
return c
else:
hits.append(self._cov[bitmap_idx])

Among the grammar rules chosen, code saves a list (less_visited_creators[]) of relatively least selected rules. If cdf is not available for rules for some symbol, it randomly chooses from less_visited_creators[] and records coverage info accordingly.

idx = random.randint(0, len(less_visited_creators) - 1)
curr_id = less_visited_creators[idx]['uid']
self._cov[(self._prev_id ^ curr_id) % MAP_SIZE] = self._cov[self._prev_id ^ curr_id] + 1
self._prev_id = curr_id >> 1
return less_visited_creators[idx]

Similarly, if cdf is available for some symbol, it uses cdf instead randomly choosing a grammar rule and records coverage information.

idx = bisect.bisect_left(less_visited_creators_cdf, random.random(), 0, len(less_visited_creators_cdf)-1)
curr_id = less_visited_creators[idx]['uid']
self._cov[(self._prev_id ^ curr_id) % MAP_SIZE] = self._cov[(self._prev_id ^ curr_id) % MAP_SIZE] + 1
self._prev_id = curr_id >> 1
return less_visited_creators[idx]

Results

Because I lack resources to be used for fuzzing, testing was somewhat limited. Also I believe Google fuzzed modern browsers with Domato (or customized internal versions) long enough so there is a very low chance of myself finding exploitable crashes with my limited resources. However, my version of Domato was able to find more unique crashes than the original open-sourced version after fuzzing IE11 on Windows 10.(发现更多不同的崩溃)

I generated 10,000 html files with both original Domato and my customized version and used them to fuzz IE11 to see which gets more unique crashes. The result if the following.

  • Original : 16 unique crashes
  • Customized : 20 unique crashes

I will soon release the code for those who are interested (it there is someone :P), although I guess by reading this blog you have enough information to go try out by yourself.

Any feedbacks or reporting of errors, please reach out !

ref:Adding AFL Bloom Filter to Domato for Fun的更多相关文章

  1. Skip List & Bloom Filter

    Skip List | Set 1 (Introduction)   Can we search in a sorted linked list in better than O(n) time?Th ...

  2. Bloom Filter:海量数据的HashSet

    Bloom Filter一般用于数据的去重计算,近似于HashSet的功能:但是不同于Bitmap(用于精确计算),其为一种估算的数据结构,存在误判(false positive)的情况. 1. 基本 ...

  3. 探索C#之布隆过滤器(Bloom filter)

    阅读目录: 背景介绍 算法原理 误判率 BF改进 总结 背景介绍 Bloom filter(后面简称BF)是Bloom在1970年提出的二进制向量数据结构.通俗来说就是在大数据集合下高效判断某个成员是 ...

  4. Bloom Filter 布隆过滤器

    Bloom Filter 是由伯顿.布隆(Burton Bloom)在1970年提出的一种多hash函数映射的快速查找算法.它实际上是一个很长的二进制向量和一些列随机映射函数.应用在数据量很大的情况下 ...

  5. Bloom Filter学习

    参考文献: Bloom Filters - the math    http://pages.cs.wisc.edu/~cao/papers/summary-cache/node8.html    B ...

  6. 【转】探索C#之布隆过滤器(Bloom filter)

    原文:蘑菇先生,http://www.cnblogs.com/mushroom/p/4556801.html 背景介绍 Bloom filter(后面简称BF)是Bloom在1970年提出的二进制向量 ...

  7. bloom filter

    Bloom filter 是由 Howard Bloom 在 1970 年提出的二进制向量数据结构,它具有很好的空间和时间效率,被用来检测一个元素是不是集合中的一个成员. 结    构 二进制 召回率 ...

  8. Bloom Filter 概念和原理

    Bloom filter 是由 Howard Bloom 在 1970 年提出的二进制向量数据结构,它具有很好的空间和时间效率,被用来检测一个元素是不是集合中的一个成员.如果检测结果为是,该元素不一定 ...

  9. 【转】Bloom Filter布隆过滤器的概念和原理

    转自:http://blog.csdn.net/jiaomeng/article/details/1495500 之前看数学之美丽,里面有提到布隆过滤器的过滤垃圾邮件,感觉到何其的牛,竟然有这么高效的 ...

随机推荐

  1. Bootstrap 按钮组

    连在一起的按钮:.btn-group <div class="btn-group" role="group"> <button class=& ...

  2. CF839 D 容斥

    求$gcd>1$的所有$gcd(a_i,a_{i+1}…a_{n})*(n-i+1)$的和 首先先标记所有出现的数.从高到低枚举一个数k,记录它的倍数出现次数cnt,那么当前所有组合的答案就是$ ...

  3. 【CodeForces】713 C. Sonya and Problem Wihtout a Legend

    [题目]C. Sonya and Problem Wihtout a Legend [题意]给定n个数字,每次操作可以对一个数字±1,求最少操作次数使数列递增.n<=10^5. [算法]动态规划 ...

  4. HDU 2044 Coins

    有一只经过训练的蜜蜂只能爬向右侧相邻的蜂房,不能反向爬行.请编程计算蜜蜂从蜂房a爬到蜂房b的可能路线数. 其中,蜂房的结构如下所示.   Input输入数据的第一行是一个整数N,表示测试实例的个数,然 ...

  5. 【洛谷 P4542】 [ZJOI2011]营救皮卡丘(费用流)

    题目链接 用最多经过\(k\)条经过\(0\)的路径覆盖所有点. 定义\(ds[i][j]\)表示从\(i\)到\(j\)不经过大于\(max(i,j)\)的点的最短路,显然可以用弗洛伊德求. 然后每 ...

  6. 阿里iconfont引入方法

    原文:iconfont的引入方法   第一步:使用font-face声明字体@font-face {font-family: 'iconfont';src: url('iconfont.eot'); ...

  7. css3同心圆闪烁扩散效果

    <!DOCTYPE html><html lang="en"><head> <meta charset="UTF-8" ...

  8. 21、利用selenium进行Web测试

    一.案例实施步骤思路分析 1.寻包 2.指定浏览器(实例化浏览器对象) 3.打开项目 4.找到元素(定位元素) 5.操作元素 6.暂停 7.关闭二.元素定位[重点] 1.id 说明:通过元素的id属性 ...

  9. python3学习笔记.2.基础

    1.编码 默认编码是 utf-8 # -*- coding: utf-8 -*- 2.注释 单行注释  # 多行注释,用三个单引号或双引号 3.关键字 可在交互窗口查询. >>> i ...

  10. Linux进程调度原理【转】

    转自:http://www.cnblogs.com/zhaoyl/archive/2012/09/04/2671156.html Linux进程调度的目标 1.高效性:高效意味着在相同的时间下要完成更 ...