课程回顾

Swarthmore学院16年开的编译系统课,总共10次大作业。本随笔记录了相关的课堂笔记以及第9次大作业。

  • 赋值的副作用:循环元组



    下面的代码展示了Python3是如何处理循环列表(print和==):
    >>> x = [1, 2]
    >>> x[1] = x
    >>> print(x)
    [1, [...]]
    >>> y = [1, 2]
    >>> y[1] = y
    >>> x is y
    False
    >>> x is x
    True
    >>> y is y
    True
    >>> x == y
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    RecursionError: maximum recursion depth exceeded in comparison
  • 内存管理(Memory Management):维护一个free list数据结构,或者将元数据存储到堆中,用来标记空闲的空间。

  • 自动内存管理(Automatic Memory Management)

    维护引用数



    维护标记



  • 标记缩并垃圾回收算法(mark/compact algorithm)

    下图展示了f((7, 8))执行时的堆栈情况,这时候内存空间已满,需要进行垃圾收集。



    下图展示了f((7, 8))(12)执行时的堆栈情况



    下图是回收算法的实现细节

  • General GC

    在eden区域中,使用semispace swapping将live的data拷贝到堆区的下半部分;如果超过了移动次数,将其拷贝到G1区域,依次进行;

    在tenured区域中使用mark/compact算法进行垃圾回收,这个区域中的数据不会频繁的被销毁或移动。

编程作业

Mark

In the first phase, we take the initial heap and stack, and set all the GC words of live data to 1. The live data is all the data reachable from references on the stack, excluding return pointers and base pointers (which don't represent data on the heap). We can do this by looping over the words on the stack, and doing a depth-first traversal of the heap from any reference values (pairs or closures) that we find.

The stack_top and first_frame arguments to mark point to the top of the stack, which contains a previous base pointer value, followed by a return pointer. If f had local variables, then they would cause stack_top to point higher. stack_bottom points at the highest (in terms of number) word on the stack. Thus, we want to loop from stack_top to stack_bottom, traversing the heap from each reference we find. We also want to skip the words corresponding to first_frame and the word after it, and each pair of base pointer and return pointer from there down (if there are multiple function calls active).

Along the way, we also keep track of the highest start address of a live value to use later, which in this case is 64, the address of the closure

int get_heap_index(int* current_addr, int offset, int* heap_start) {
return (*current_addr + offset - (int)heap_start) / 4;
} void mark_dfs(int* current_addr, int* heap_start, int* max_index) {
// pair: [ tag ][ GC word ][ value ][ value ]
if ((*current_addr & 0x00000007) == 1) {
DEBUG_PRINT("pair: %#010x\n", *current_addr - 1);
int index = get_heap_index(current_addr, -1, heap_start);
if (!heap_start[index + 1]) {
heap_start[index + 1] = 0x1;
if (index > *max_index) *max_index = index;
mark_dfs(&heap_start[get_heap_index(current_addr, 7, heap_start)], heap_start, max_index);
mark_dfs(&heap_start[get_heap_index(current_addr, 11, heap_start)], heap_start, max_index);
}
}
// closure: [ tag ][ GC word ][ varcount = N ][ arity ][ code ptr ][[ N vars' data ]][ maybe padding ]
else if ((*current_addr & 0x00000007) == 5) {
DEBUG_PRINT("closure: %#010x\n", *current_addr - 5);
int index = get_heap_index(current_addr, -5, heap_start);
if (!heap_start[index + 1]) {
heap_start[index + 1] = 0x1;
if (index > *max_index) *max_index = index;
int arity = heap_start[index + 3];
for (int i = 0; i < arity; i++) {
mark_dfs(&heap_start[get_heap_index(current_addr, i * 4 + 15, heap_start)], heap_start, max_index);
}
}
}
} int* mark(int* stack_top, int* first_frame, int* stack_bottom, int* heap_start) {
int max_index = 0;
for (int* stack_current = stack_top; stack_current != stack_bottom; stack_current++) {
// skip the words corresponding to first_frame and the word after it,
// and each pair of base pointer and return pointer from there down (if there are multiple function calls active).
if (*stack_current != *first_frame && *stack_current != *(first_frame + 1)) {
DEBUG_PRINT("stack_current: %#010x\n", *stack_current);
mark_dfs(stack_current, heap_start, &max_index);
}
} return heap_start + max_index;
}

Forward

To set up the forwarding of values, we traverse the heap starting from the beginning (heap_start). We keep track of two pointers, one to the next space to use for the eventual location of compacted data, and one to the currently-inspected value.

For each value, we check if it's live, and if it is, set its forwarding address to the current compacted data pointer and increase the compacted pointer by the size of the value. If it is not live, we simply continue onto the next value – we can use the tag and other metadata to compute the size of each value to determine which address to inspect next. The traversal stops when we reach the max_address stored above (so we don't accidentally treat the undefined data in spaces 72-80 as real data).

Then we traverse all of the stack and heap values again to update any internal pointers to use the new addresses.

In this case, the closure is scheduled to move from its current location of 64 to a new location of 16. So its forwarding pointer is set, and both references to it on the stack are also updated. The first tuple is already in its final position (starting at 0), so while its forwarding pointer is set, references to it do not change.

void forward_dfs(int* current_addr, int* heap_start) {
if ((*current_addr & 0x00000007) == 1) {
int index = get_heap_index(current_addr, -1, heap_start);
int gc_tag = heap_start[index + 1];
if (gc_tag != 0x0 && gc_tag != 0x1) {
DEBUG_PRINT("gc_tag: %p\n", (int *)gc_tag);
forward_dfs(&heap_start[get_heap_index(current_addr, 7, heap_start)], heap_start);
forward_dfs(&heap_start[get_heap_index(current_addr, 11, heap_start)], heap_start);
*current_addr = gc_tag;
}
} else if ((*current_addr & 0x00000007) == 5) {
int index = get_heap_index(current_addr, -5, heap_start);
int gc_tag = heap_start[index + 1];
if (gc_tag != 0x0 && gc_tag != 0x1) {
int arity = heap_start[index + 3];
for (int i = 0; i < arity; i++) {
forward_dfs(&heap_start[get_heap_index(current_addr, i * 4 + 15, heap_start)], heap_start);
*current_addr = gc_tag;
}
}
}
} void forward(int* stack_top, int* first_frame, int* stack_bottom, int* heap_start, int* max_address) {
int* p = heap_start; // current compacted data pointer
int* q = p; // current inspected value pointer while (q <= max_address) {
int tag = *q;
int gc_word = *(q + 1);
DEBUG_PRINT("tag: %d\n", tag);
DEBUG_PRINT("gc_word: %d\n", gc_word);
// pair
if ((tag & 0x00000007) == 1) {
if (gc_word) {
*(q + 1) |= (int)p;
p += 4;
}
q += 4;
}
// closure
else if ((tag & 0x00000007) == 5) {
int frees = *(q + 3);
int needed_space = 5 + frees;
int with_padding = needed_space + needed_space % 2;
if (gc_word) {
*(q + 1) |= (int)p;
p += with_padding;
}
q += with_padding;
}
} // traverse all of the stack and heap values again to update any internal pointers to use the new addresses.
for (int* stack_current = stack_top; stack_current != stack_bottom; stack_current++) {
if (*stack_current != *first_frame && *stack_current != *(first_frame + 1)) {
forward_dfs(stack_current, heap_start);
}
}
return;
}

Compact

Finally, we travers the heap, starting from the beginning, and copy the values into their forwarding positions. Since all the internal pointers and stack pointers have been updated already, once the values are copied, the heap becomes consistent again. We track the last compacted address so that we can return the first free address—in this case 40—which will be returned and used as the new start of allocation. While doing so, we also zero out all of the GC words, so that the next time we mark the heap we have a fresh start.

I also highly recommend that you walk the rest of the heap and set the words to some special value. The given tests suggest overwriting each word with the value 0x0cab005e – the “caboose” of the heap. This will make it much easier when debugging to tell where the heap ends, and also stop a runaway algorithm from interpreting leftover heap data as live data accidentally.

int* compact(int* heap_start, int* max_address, int* heap_end) {
int start_index;
int offset;
int size;
for (int i = 0; (heap_start + i) <= max_address; ) {
int tag = *(heap_start + i);
int *gc_tag = heap_start + i + 1; if ((tag & 0x00000007) == 1) {
size = 4;
offset = -1;
} else if ((tag & 0x00000007) == 5) {
offset = -5;
int frees = *(heap_start + i + 3);
int needed_space = 5 + frees;
size = needed_space + needed_space % 2;
} if (*gc_tag == 0x0) {
i += size;
} else {
start_index = get_heap_index(gc_tag, offset, heap_start);
*gc_tag = 0x0;
DEBUG_PRINT("start_index: %d, i: %d\n", start_index, i);
for (int j = start_index; j < start_index + size; j++) {
DEBUG_PRINT("value: %#010x\n", *(heap_start + i + j % 4));
*(heap_start + j) = *(heap_start + i + j % size);
}
i += size;
}
}
for (int i = start_index + size; (heap_start + i) != heap_end; i++) {
*(heap_start + i) = 0x0cab005e;
} return heap_start + size;
}

Test

➜  starter-garbage git:(master) ✗ ./gctest
heap: 0x7a615160, max address: 0x7a615190
0/0x7a615160: 0x1 (1)
1/0x7a615164: 0x0 (0)
2/0x7a615168: 0x7a615171 (2053198193)
3/0x7a61516c: 0x8 (8)
4/0x7a615170: 0x1 (1)
5/0x7a615174: 0x0 (0)
6/0x7a615178: 0xa (10)
7/0x7a61517c: 0xc (12)
8/0x7a615180: 0xcab005e (212533342)
9/0x7a615184: 0xcab005e (212533342)
10/0x7a615188: 0xcab005e (212533342)
11/0x7a61518c: 0xcab005e (212533342)
12/0x7a615190: 0xcab005e (212533342)
13/0x7a615194: 0xcab005e (212533342)
14/0x7a615198: 0xcab005e (212533342)
15/0x7a61519c: 0xcab005e (212533342)
. OK (1 test)

参考资料

starter-garbage

garbageSnake

[swarthmore cs75] Compiler 6 – Garbage Snake的更多相关文章

  1. [swarthmore cs75] Compiler 6 – Fer-de-lance

    课程回顾 Swarthmore学院16年开的编译系统课,总共10次大作业.本随笔记录了相关的课堂笔记以及第8次大作业. First-class function: It treats function ...

  2. [swarthmore cs75] Compiler 5 – Egg-eater

    课程回顾 Swarthmore学院16年开的编译系统课,总共10次大作业.本随笔记录了相关的课堂笔记以及第7次大作业. 抽象语法: 存储方式: 栈中的数据如果最后三位(tag bits)是001表示元 ...

  3. [swarthmore cs75] Compiler 4 – Diamondback

    课程回顾 Swarthmore学院16年开的编译系统课,总共10次大作业.本随笔记录了相关的课堂笔记以及第6次大作业. 函数声明 增加函数声明.函数调用的抽象语法:在转换成anf之前还要检查函数声明和 ...

  4. [swarthmore cs75] Compiler 3 – Cobra

    课程回顾 Swarthmore学院16年开的编译系统课,总共10次大作业.本随笔记录了相关的课堂笔记以及第5次大作业. 增加了bool数据表示和比较运算符的支持,具体语法参考下图: 第一种int和bo ...

  5. [swarthmore cs75] Compiler 2 – Boa

    课程回顾 Swarthmore学院16年开的编译系统课,总共10次大作业.本随笔记录了相关的课堂笔记以及第4次大作业. A-Normal Form 在80年代,函数式语言编译器主要使用Continua ...

  6. [swarthmore cs75] Compiler 1 – Adder

    课程回顾 Swarthmore学院16年开的编译系统课,总共10次大作业.本随笔记录了相关的课堂笔记以及第3次大作业. 编译的过程:首先解析(parse)源代码,然后成抽象语法树(AST),再生成汇编 ...

  7. [swarthmore cs75] inlab1 — Tiny Compiler

    课程回顾 Swarthmore学院16年开的编译系统课,总共10次大作业.本随笔记录了inlab1的实践过程. tiny compiler 这个迷你的编译器可以将一个源文件,编译成可执行的二进制代码. ...

  8. [swarthmore cs75] Lab 1 — OCaml Tree Programming

    课程回顾 Swarthmore学院16年开的编译系统课,总共10次大作业.本随笔记录了相关的课堂笔记以及第2大次作业. 比较两个lists的逻辑: let rec cmp l ll = match ( ...

  9. [swarthmore cs75] Lab 0 Warmup & Basic OCaml

    课程回顾 Swarthmore学院16年开的编译系统课,总共10次大作业.本随笔记录了相关的课堂笔记以及第1次大作业. 什么是编译 编译就是执行Program->Program'转换的过程,如下 ...

随机推荐

  1. 关于创建String对象过程的内存分配

    String是引用数据类型 但是String实际上java给我们提供的是一个类 注意:String 全类被fianl所修饰 所以 String 又叫 字符串常量 String 的值 一旦定义 不可以改 ...

  2. 在create-react-app里使用ant design

    使用create-react-app创建的项目,要使用ant design. 1.首先进入项目根目录,yarn add antd. 2.在App.css引入 样式文件,@import '~antd/d ...

  3. 我们一起踩过的坑----react(antd)(一)

    1.}]          && ){             ){ ){ ||){ ){ );); , }; }); }, beforeUpload: (file) => { ...

  4. 大数据实操2 - hadoop集群访问——Hadoop客户端访问、Java API访问

    上一篇中介绍了hadoop集群搭建方式,本文介绍集群的访问.集群的访问方式有两种:hadoop客户端访问,Java API访问. 一.集群客户端访问 Hadoop采用C/S架构,可以通过客户端对集群进 ...

  5. node杂谈(一)

    在node中var作用域为当前js文件 每一个js文件都是一个module对象 global为全局对象,可以用在不同js之间访问(不要设立过多的全局对象,除非必要,比如设立生产环境还是开发环境) a ...

  6. 爬虫利器 Puppeteer 实战

    Puppeteer 介绍 Puppeteer 翻译是操纵木偶的人,利用这个工具,我们能做一个操纵页面的人.Puppeteer是一个Nodejs的库,支持调用Chrome的API来操纵Web,相比较Se ...

  7. [转]tomcat启动报错too low setting for -Xss

    tomcat启动报错too low setting for -Xss 网上给的答案都是调整Xss参数,其实不是正确的做法, -Xss:每个线程的Stack大小,“-Xss 15120” 这使得tomc ...

  8. 校赛F

    问题描述 例如对于数列[1 2 3 4 5 6],排序后变为[6 1 5 2 4 3].换句话说,对于一个有序递增的序列a1, a2, a3, ……, an,排序后为an, a1, an-1, a2, ...

  9. HDU 6161.Big binary tree 二叉树

    Big binary tree Time Limit: 4000/2000 MS (Java/Others)    Memory Limit: 65536/65536 K (Java/Others)T ...

  10. Numpy 创建数组

    ndarray 数组除了可以使用底层 ndarray 构造器来创建外, 也可以通过以下几种方式来创建. numpy.empty numpy.empty 方法用来创建一个指定形状(shape),数据类型 ...