[swarthmore cs75] Compiler 6 – Garbage Snake
课程回顾
Swarthmore学院16年开的编译系统课,总共10次大作业。本随笔记录了相关的课堂笔记以及第9次大作业。
- 赋值的副作用:循环元组
下面的代码展示了Python3是如何处理循环列表(print和==):>>> x = [1, 2]
>>> x[1] = x
>>> print(x)
[1, [...]]
>>> y = [1, 2]
>>> y[1] = y
>>> x is y
False
>>> x is x
True
>>> y is y
True
>>> x == y
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RecursionError: maximum recursion depth exceeded in comparison
- 内存管理(Memory Management):维护一个free list数据结构,或者将元数据存储到堆中,用来标记空闲的空间。
- 自动内存管理(Automatic Memory Management)
维护引用数
维护标记
- 标记缩并垃圾回收算法(mark/compact algorithm)
下图展示了f((7, 8))执行时的堆栈情况,这时候内存空间已满,需要进行垃圾收集。
下图展示了f((7, 8))(12)执行时的堆栈情况
下图是回收算法的实现细节
- General GC
在eden区域中,使用semispace swapping将live的data拷贝到堆区的下半部分;如果超过了移动次数,将其拷贝到G1区域,依次进行;
在tenured区域中使用mark/compact算法进行垃圾回收,这个区域中的数据不会频繁的被销毁或移动。
编程作业
Mark
In the first phase, we take the initial heap and stack, and set all the GC words of live data to 1. The live data is all the data reachable from references on the stack, excluding return pointers and base pointers (which don't represent data on the heap). We can do this by looping over the words on the stack, and doing a depth-first traversal of the heap from any reference values (pairs or closures) that we find.
The stack_top and first_frame arguments to mark point to the top of the stack, which contains a previous base pointer value, followed by a return pointer. If f had local variables, then they would cause stack_top to point higher. stack_bottom points at the highest (in terms of number) word on the stack. Thus, we want to loop from stack_top to stack_bottom, traversing the heap from each reference we find. We also want to skip the words corresponding to first_frame and the word after it, and each pair of base pointer and return pointer from there down (if there are multiple function calls active).
Along the way, we also keep track of the highest start address of a live value to use later, which in this case is 64, the address of the closure
int get_heap_index(int* current_addr, int offset, int* heap_start) {
return (*current_addr + offset - (int)heap_start) / 4;
}
void mark_dfs(int* current_addr, int* heap_start, int* max_index) {
// pair: [ tag ][ GC word ][ value ][ value ]
if ((*current_addr & 0x00000007) == 1) {
DEBUG_PRINT("pair: %#010x\n", *current_addr - 1);
int index = get_heap_index(current_addr, -1, heap_start);
if (!heap_start[index + 1]) {
heap_start[index + 1] = 0x1;
if (index > *max_index) *max_index = index;
mark_dfs(&heap_start[get_heap_index(current_addr, 7, heap_start)], heap_start, max_index);
mark_dfs(&heap_start[get_heap_index(current_addr, 11, heap_start)], heap_start, max_index);
}
}
// closure: [ tag ][ GC word ][ varcount = N ][ arity ][ code ptr ][[ N vars' data ]][ maybe padding ]
else if ((*current_addr & 0x00000007) == 5) {
DEBUG_PRINT("closure: %#010x\n", *current_addr - 5);
int index = get_heap_index(current_addr, -5, heap_start);
if (!heap_start[index + 1]) {
heap_start[index + 1] = 0x1;
if (index > *max_index) *max_index = index;
int arity = heap_start[index + 3];
for (int i = 0; i < arity; i++) {
mark_dfs(&heap_start[get_heap_index(current_addr, i * 4 + 15, heap_start)], heap_start, max_index);
}
}
}
}
int* mark(int* stack_top, int* first_frame, int* stack_bottom, int* heap_start) {
int max_index = 0;
for (int* stack_current = stack_top; stack_current != stack_bottom; stack_current++) {
// skip the words corresponding to first_frame and the word after it,
// and each pair of base pointer and return pointer from there down (if there are multiple function calls active).
if (*stack_current != *first_frame && *stack_current != *(first_frame + 1)) {
DEBUG_PRINT("stack_current: %#010x\n", *stack_current);
mark_dfs(stack_current, heap_start, &max_index);
}
}
return heap_start + max_index;
}
Forward
To set up the forwarding of values, we traverse the heap starting from the beginning (heap_start). We keep track of two pointers, one to the next space to use for the eventual location of compacted data, and one to the currently-inspected value.
For each value, we check if it's live, and if it is, set its forwarding address to the current compacted data pointer and increase the compacted pointer by the size of the value. If it is not live, we simply continue onto the next value – we can use the tag and other metadata to compute the size of each value to determine which address to inspect next. The traversal stops when we reach the max_address stored above (so we don't accidentally treat the undefined data in spaces 72-80 as real data).
Then we traverse all of the stack and heap values again to update any internal pointers to use the new addresses.
In this case, the closure is scheduled to move from its current location of 64 to a new location of 16. So its forwarding pointer is set, and both references to it on the stack are also updated. The first tuple is already in its final position (starting at 0), so while its forwarding pointer is set, references to it do not change.
void forward_dfs(int* current_addr, int* heap_start) {
if ((*current_addr & 0x00000007) == 1) {
int index = get_heap_index(current_addr, -1, heap_start);
int gc_tag = heap_start[index + 1];
if (gc_tag != 0x0 && gc_tag != 0x1) {
DEBUG_PRINT("gc_tag: %p\n", (int *)gc_tag);
forward_dfs(&heap_start[get_heap_index(current_addr, 7, heap_start)], heap_start);
forward_dfs(&heap_start[get_heap_index(current_addr, 11, heap_start)], heap_start);
*current_addr = gc_tag;
}
} else if ((*current_addr & 0x00000007) == 5) {
int index = get_heap_index(current_addr, -5, heap_start);
int gc_tag = heap_start[index + 1];
if (gc_tag != 0x0 && gc_tag != 0x1) {
int arity = heap_start[index + 3];
for (int i = 0; i < arity; i++) {
forward_dfs(&heap_start[get_heap_index(current_addr, i * 4 + 15, heap_start)], heap_start);
*current_addr = gc_tag;
}
}
}
}
void forward(int* stack_top, int* first_frame, int* stack_bottom, int* heap_start, int* max_address) {
int* p = heap_start; // current compacted data pointer
int* q = p; // current inspected value pointer
while (q <= max_address) {
int tag = *q;
int gc_word = *(q + 1);
DEBUG_PRINT("tag: %d\n", tag);
DEBUG_PRINT("gc_word: %d\n", gc_word);
// pair
if ((tag & 0x00000007) == 1) {
if (gc_word) {
*(q + 1) |= (int)p;
p += 4;
}
q += 4;
}
// closure
else if ((tag & 0x00000007) == 5) {
int frees = *(q + 3);
int needed_space = 5 + frees;
int with_padding = needed_space + needed_space % 2;
if (gc_word) {
*(q + 1) |= (int)p;
p += with_padding;
}
q += with_padding;
}
}
// traverse all of the stack and heap values again to update any internal pointers to use the new addresses.
for (int* stack_current = stack_top; stack_current != stack_bottom; stack_current++) {
if (*stack_current != *first_frame && *stack_current != *(first_frame + 1)) {
forward_dfs(stack_current, heap_start);
}
}
return;
}
Compact
Finally, we travers the heap, starting from the beginning, and copy the values into their forwarding positions. Since all the internal pointers and stack pointers have been updated already, once the values are copied, the heap becomes consistent again. We track the last compacted address so that we can return the first free address—in this case 40—which will be returned and used as the new start of allocation. While doing so, we also zero out all of the GC words, so that the next time we mark the heap we have a fresh start.
I also highly recommend that you walk the rest of the heap and set the words to some special value. The given tests suggest overwriting each word with the value 0x0cab005e – the “caboose” of the heap. This will make it much easier when debugging to tell where the heap ends, and also stop a runaway algorithm from interpreting leftover heap data as live data accidentally.
int* compact(int* heap_start, int* max_address, int* heap_end) {
int start_index;
int offset;
int size;
for (int i = 0; (heap_start + i) <= max_address; ) {
int tag = *(heap_start + i);
int *gc_tag = heap_start + i + 1;
if ((tag & 0x00000007) == 1) {
size = 4;
offset = -1;
} else if ((tag & 0x00000007) == 5) {
offset = -5;
int frees = *(heap_start + i + 3);
int needed_space = 5 + frees;
size = needed_space + needed_space % 2;
}
if (*gc_tag == 0x0) {
i += size;
} else {
start_index = get_heap_index(gc_tag, offset, heap_start);
*gc_tag = 0x0;
DEBUG_PRINT("start_index: %d, i: %d\n", start_index, i);
for (int j = start_index; j < start_index + size; j++) {
DEBUG_PRINT("value: %#010x\n", *(heap_start + i + j % 4));
*(heap_start + j) = *(heap_start + i + j % size);
}
i += size;
}
}
for (int i = start_index + size; (heap_start + i) != heap_end; i++) {
*(heap_start + i) = 0x0cab005e;
}
return heap_start + size;
}
Test
➜ starter-garbage git:(master) ✗ ./gctest
heap: 0x7a615160, max address: 0x7a615190
0/0x7a615160: 0x1 (1)
1/0x7a615164: 0x0 (0)
2/0x7a615168: 0x7a615171 (2053198193)
3/0x7a61516c: 0x8 (8)
4/0x7a615170: 0x1 (1)
5/0x7a615174: 0x0 (0)
6/0x7a615178: 0xa (10)
7/0x7a61517c: 0xc (12)
8/0x7a615180: 0xcab005e (212533342)
9/0x7a615184: 0xcab005e (212533342)
10/0x7a615188: 0xcab005e (212533342)
11/0x7a61518c: 0xcab005e (212533342)
12/0x7a615190: 0xcab005e (212533342)
13/0x7a615194: 0xcab005e (212533342)
14/0x7a615198: 0xcab005e (212533342)
15/0x7a61519c: 0xcab005e (212533342)
.
OK (1 test)
参考资料
[swarthmore cs75] Compiler 6 – Garbage Snake的更多相关文章
- [swarthmore cs75] Compiler 6 – Fer-de-lance
课程回顾 Swarthmore学院16年开的编译系统课,总共10次大作业.本随笔记录了相关的课堂笔记以及第8次大作业. First-class function: It treats function ...
- [swarthmore cs75] Compiler 5 – Egg-eater
课程回顾 Swarthmore学院16年开的编译系统课,总共10次大作业.本随笔记录了相关的课堂笔记以及第7次大作业. 抽象语法: 存储方式: 栈中的数据如果最后三位(tag bits)是001表示元 ...
- [swarthmore cs75] Compiler 4 – Diamondback
课程回顾 Swarthmore学院16年开的编译系统课,总共10次大作业.本随笔记录了相关的课堂笔记以及第6次大作业. 函数声明 增加函数声明.函数调用的抽象语法:在转换成anf之前还要检查函数声明和 ...
- [swarthmore cs75] Compiler 3 – Cobra
课程回顾 Swarthmore学院16年开的编译系统课,总共10次大作业.本随笔记录了相关的课堂笔记以及第5次大作业. 增加了bool数据表示和比较运算符的支持,具体语法参考下图: 第一种int和bo ...
- [swarthmore cs75] Compiler 2 – Boa
课程回顾 Swarthmore学院16年开的编译系统课,总共10次大作业.本随笔记录了相关的课堂笔记以及第4次大作业. A-Normal Form 在80年代,函数式语言编译器主要使用Continua ...
- [swarthmore cs75] Compiler 1 – Adder
课程回顾 Swarthmore学院16年开的编译系统课,总共10次大作业.本随笔记录了相关的课堂笔记以及第3次大作业. 编译的过程:首先解析(parse)源代码,然后成抽象语法树(AST),再生成汇编 ...
- [swarthmore cs75] inlab1 — Tiny Compiler
课程回顾 Swarthmore学院16年开的编译系统课,总共10次大作业.本随笔记录了inlab1的实践过程. tiny compiler 这个迷你的编译器可以将一个源文件,编译成可执行的二进制代码. ...
- [swarthmore cs75] Lab 1 — OCaml Tree Programming
课程回顾 Swarthmore学院16年开的编译系统课,总共10次大作业.本随笔记录了相关的课堂笔记以及第2大次作业. 比较两个lists的逻辑: let rec cmp l ll = match ( ...
- [swarthmore cs75] Lab 0 Warmup & Basic OCaml
课程回顾 Swarthmore学院16年开的编译系统课,总共10次大作业.本随笔记录了相关的课堂笔记以及第1次大作业. 什么是编译 编译就是执行Program->Program'转换的过程,如下 ...
随机推荐
- 交叉熵理解:softmax_cross_entropy,binary_cross_entropy,sigmoid_cross_entropy简介
cross entropy 交叉熵的概念网上一大堆了,具体问度娘,这里主要介绍深度学习中,使用交叉熵作为类别分类. 1.二元交叉熵 binary_cross_entropy 我们通常见的交叉熵是二元交 ...
- Tomcat 配置文件
Tomcat 的配置文件并不多,由4个 xml 文件组成,分别是 context.xml.web.xml.server.xml.tomcat-users.xml 这几个文件.每个文件都有自己的功能与配 ...
- Pycharm--flake8的配置使用
前言:Flake8 是由Python官方发布的一款辅助检测Python代码是否规范的工具.Flake8检查规则灵活,支持集成额外插件,扩展性强. 一.安装flake8 进入虚拟环境: pip inst ...
- 【笔记】Python基础六:模块module介绍及常用模块
一,module模块和包的介绍 1,在Python中,一个.py文件就称之为一个模块(Module). 2,使用模块的好处? 最大的好处是大大提高了代码的可维护性 其次,编写代码不必从零开始,我们编写 ...
- CentOS7编译安装mysql-5.6.43
Step 1:安装编译需要的软件和工具 [root@node-1 ~]# yum install gcc gcc-c++ cmake ncurses-devel bison Step 2:创建mysq ...
- 转-CVE-2016-10190浅析-FFmpeg堆溢出漏洞
本文转载自CVE-2016-10190 FFmpeg Heap Overflow 漏洞分析及利用 前言 FFmpeg是一个著名的处理音视频的开源项目,使用者众多.2016年末paulcher发现FFm ...
- scrapy 的log功能
只需要在配置文件 setting.py文件中加入LOG_FILE = "mySpider.log"LOG_LEVEL = "INFO" Scrapy提供5层lo ...
- layabox typescript 安装固定版本
安装最新版本方法: https://blog.csdn.net/adelais__/article/details/79181474 固定版本(比如2.1.5): C:\Users\Administr ...
- 项目总结22:Java UDP Socket数据的发送和接收
项目总结22:Java UDP Socket数据的发送和接收 1-先上demo 客户端(发送数据) package com.hs.pretest.udp; import java.io.IOExcep ...
- 复制粘贴插件(不包含 Flash)——clipboard.js
clipboard.js是现代化的“复制到剪切板”插件.不包含 Flash.gzip 压缩后仅 3kb.不依赖 Flash 或其他臃肿的框架.API:https://clipboardjs.com c ...