MIT 6.824 : Spring 2015 lab1 训练笔记

源代码参见我的github： https://github.com/YaoZengzeng/MIT-6.824

Part I: Word count

MapReduce操作实际上就是将一个输入文件拆分成M份，交由M个Map task进行操作。每个Map task生成R个包含中间键值对的结果。R个Reduce task执行Reduce操作，其中第i个Reduce task操作每个Map task的第i个输出文件。最终，生成R个结果文件，通过Merge操作，将结果生成一个输出文件。

1、mapreduce.RunSingle执行流程分析：

（1）、src/mapreduce/mapreduce.go

func RunSingle(nMap int, nReduce int, file string, Map func(string) *list.List, Reduce func(string, *list.List) string)

该函数首先调用mr := InitMapReduce(nMap, nReduce, file, "")初始化一个MapReduce结构，再调用mr.Split(mr.file)切割输入文件，

最后再调用两个for循环，做nMap次DoMap(i, mr.file, mr.nReduce, Map　　)操作和nReduce次DoReduce(i, mr.file, mr.nMap, Reduce)操作。

MapReduce结构如下所示：

type MapReduce struct {

　　nMap　　　　　　　　int  // Number of Map jobs

　　nReduce　　　　  　int  // Number of Reduce jobs

　　file　　　　　　　  string  // Name of input file

　　MasterAddress　   string

　　registerChannel　 chan string

　　DoneChannel　　  　chan bool

　　alive　　　　　　   bool

　　l　　　　　　　　 　 net.Listener

　　stats　　　　　　   *list.List

　　// Map of registered workers that you need to keep up to date

　　Worker　　　　　　　map[string]*WorkerInfo

　　// add any additional state here

}

（2）、src/mapreduce/mapreduce.go

func InitMapReduce(nmap int, nreduce int, file string, master string) *MapReduce

该函数仅仅对一个MapReduce结构进行初始化，其中mr.alive = true, mr.registerChannel = make(chan string), mr.DoneChannel = make(chan bool)

// Split bytes of input file into nMap splits, but split only on white space

（3）、src/mapreduce/mapreduce.go

func (mr *MapReduce) Split(fileName string)

该函数的作用是将fileName切分成nMap块，先获取每个块的大小nchunk，再从输入文件中读取内容，写入输出文件中，每当读取内容的大小大于nchunk的整数倍时，再创建一个新的输出文件进行写操作。从而将输入文件切分到mr.nMap个输出文件中。其中，输出文件的命名方式为 "mrtmp." + fileName + "-" + strconv.Itoa(MapJob)，其中MapJob就是一个编号。

// Read split for job, call Map for that split, and create nreduce partitions

（4）、src/mapreduce/mapreduce.go

func DoMap(JobNumber int, fileName string, nreduce int, Map func(string) *list.List)

该函数先根据name := MapName(fileName, JobNumber)中获取输入文件名，再将文件的内容读入[]byte切片b中，最后调用res := Map(string(b))

再将中间键值对存入nreduce个临时文件中，临时文件的命名规则为 MapName(fileName, MapJob) + "-" + strconv.Itoa(ReduceJob)。

其中将哪个中间键值对存入哪个文件是由中间键的哈希值决定的。如果中间键的哈希值恰好和reduce Job的编号相等，则将该中间键值对存入。这样做有一个好处，就是每个Map task的产生的具有相同键的键值对都会被放在同一序号的临时输出文件中，因此能被同一个Reduce task取到，因此每个Reduce task产生的结果对于每个单词就是最终结果，从而不需要对R个Reduce task的结果再进行合并操作。

KeyValue结构如下所示：

type KeyValue struct {

　　Key　　　  string

　　Value　　　string

}

// Read map outputs for partition job, sort them by key, call reduce by each key

（5）、src/mapreduce/mapreduce.go

func DoReduce(job int, fileName string, nmap int, Reduce func(string, *list.List) string)

首先定义变量 kvs := make(map[string]*list.List），用于保存从nmap个文件中收集来的中间键值对，其中同一个中间键对应的中间值都保存在一个list中。

再对这些中间键进行排序，并且对每个中间键调用res := Reduce(k, kvs[k])函数，并将最终结果以KeyValue{k, res}的形式写入Merge文件中，Merge文件的命名形式为 "mrtmp." + fileName + "-res-" + strconv.Itoa(ReduceJob)

// Merge the results of the reduce jobs XXX use merge sort

（6）、src/mapreduce/mapreduce.go

func (mr *MapReduce) Merge()

首先定义变量 kvs := make(map[string]string)，再从nReduce个文件中读入结果，放入kvs中。最后对所有键值进行排序，并按排序结果输出键值和对应的最终结果。

Part II：Distributing MapReduce jobs

MASTER 创建流

1、src/mapreduce/mapreduce.go

func MakeMapReduce(nmap int, nreduce int, file string, master) *MapReduce

（1）、首先调用mr := InitMapReduce(nmap, nreduce, file, master)初始化MapReduce结构

（2）、调用mr.StartRegistrationServer()

（3）、调用go mr.Run()，并return mr

2、src/mapreduce/mapreduce.go

func (mr *MapReduce) StartRegistrationServer()

（1）、调用rpcs := rpc.NewServer()和rpcs.Register(mr)生成一个RPC server

（2）、调用l, e := net.Listen("unix", mr.MasterAddress)并且将l赋值给mr.l

// now that we are listening on the master address, can fork off accepting connections to another thread

（3）、启动一个goroutine，调用conn, err := mr.l.Accept()建立连接，每建立一个连接就创建一个goroutine，

其中调用rpcs.ServeConn(conn)对连接进行处理，再调用conn.Close()关闭连接

// Run jobs in parallel, assuming a shared file system

3、src/mapreduce/mapreduce.go

func (mr *MapReduce) Run()

该函数首先调用mr.Split(mr.file)将输入文件切分为mr.nMap个文件，再调用mr.stats = mr.RunMaster()，接着调用mr.Merge()对mr.nReduce个输出文件进行合并，最后调用mr.CleanupRegistration()，注销worker。

4、src/mapreduce/master.go

func (mr *MapReduce) RunMaster() *list.List

5、src/mapreduce/mapreduce.go

func (mr *MapReduce) CleanupRegistration()

首先创建变量 args := &ShutdownArgs{}，var reply ShutdownReply，最后调用ok := call(mr.MasterAddress, "MapReduce.Shutdown", args, &reply )

// call() returns true if the server responded, and false if call() was not able to contact the server.in particular, reply's

// contents are valid if and only if call() returned true

// you should assume that call() will time out and return an error after a while if it doesn't get a reply from the server

// please use call() to send all RPCs, in master.go, mapreduce.go, and worker.go. don't change this function

6、src/mapreduce/common.go

func call(srv string, rpcname string, args interface{}, reply interface{}) bool

首先调用 c, errx := rpc.Dial("unix", srv)建立连接，再调用err := c.Call(rpcname, args, reply)传送RPC

7、src/mapreduce/mapreduce.go

func (mr *MapReduce) Register(args *RegisterArgs, res *RegisterReply) error

调用mr.registerChannel <- args.Worker，res.Ok =true，并且返回nil

worker创建流

worker结构如下所示：

// Worker is a server waiting for DoJob or Shutdown RPCs

type Worker struct {

　　name　　string

　　Reduce　func(string, *list.List) string

　　Map　　  func(string) *list.List

　　nRPC　　int

　　nJobs　　int

　　l　　　　 net.Listener

}

// Set up a connection with the master, register with the master, and wait for jobs from the master

1、src/mapreduce/worker.go

func RunWorker(MasterAddress string, me string, MapFunc func(string) *list.List, ReduceFunc func(string, *list.List) string, nRPC int)

注：当参数nRPC的值为-1时，说明该worker永远不会fail，否则再接受nRPC个job之后fail

（1）、首先初始化一个Worker结构wk

（2）、调用rpcs := rpc.NewServer()和rpcs.Register(wk)，创建一个rpc server

（3）、调用l, e := net.Listen("unix", me)和wr.l = l

（4）、调用Register(MasterAddress, me)

（5）、当wk.nRPC不为0时，一致循环接收conn, err := wk.l.Accept()，并且在err为nil时，调用wk.nRPC -= 1，go rpcs.ServeConn(conn), wk.nJobs += 1

// Tell the master we exist and ready to work

2、src/mapreduce/worker.go

func Register(master string, me string)

创建变量 args := &RegisterArgs{}，args.Worker = me，var reply RegisterReply，最后调用ok := call(master, "MapReduce.Register", args, &reply)

Part III: Handling worker failures

tips:

（1）、master通过RPC超时来判断一个worker是否fail。

（2）、RPC failure并不意味着worker的故障；worker可能只是不可达了，但是仍然在进行计算。因此可能发生两个worker接受到了同一个job并且对它进行了计算。但是因为job都是幂等的，因此一个job是否被计算了两次是无所谓的，反正两次计算产生的是相同的结果。而且在我们的测试中，我们不会在job执行的过程中让worker发生故障，所以我们不需要担心多个worker写同一个输出文件的情况。

------------------------------------------------------------------------------------- 测试框架分析 -----------------------------------------------------------------------------------------

Test Basic:

1、src/mapreduce/test_test.go

func TestBasic(t *testing.T)

（1）、调用mr := setup()

（2）、for循环，调用go RunWorker(mr.MasterAddress, port("worker"+strconv.Itoa(i)), MapFunc, ReduceFunc, -1)

（3）、调用<-mr.DoneChannel等待MapReduce操作结束

（4）、最后依次调用check(t, mr.file)，checkWorker(t, mr.stats)，cleanup(mr)进行检查清理工作

2、src/mapreduce/test_test.go

func setup() *MapReduce

调用file := makeInput()创建输入文件，再调用master := port("master")创建一个UNIX-domain socket name，格式为/var/tmp/824-$(uid)/mr$(pid)-master

最后调用mr := MakeMapReduce(nMap, nReduce, file, master)

// Checks input file against output file: each input number should show up in the output file in string sorted order

3、src/mapreduce/test_test.go

func check(t *testing.T, file string)

该函数打开file文件，并从中读入所有行至var line []string中，并调用sort.Strings(lines)进行排序，最后逐行读取输出文件，并将两者进行比较

// Workers report back how many RPCs they have processed in the Shutdown reply.

// Check that they processed at least 1 RPC.

4、src/mapreduce/test_test.go

func checkWorker(t *testing.T, l *list.List)

遍历l，其中若有e.Value == 0，则报错

5、src/mapreduce/test_test.go

func cleanup(mr *MapReduce)

调用mr.CleanupFiles()删除所有临时文件，再调用RemoveFile(mr.file)删除输入文件

Test One Failure：

1、src/mapreduce/test_test.go

func TestOneFailure(t *testing.T)

首先调用mr := setup()建立MapReduce系统，再生成两个worker

其中一个worker的启动函数为go RunWorker(mr.MasterAddress, port("worker"+strconv.Itoa(0)), MapFunc, ReduceFunc, 10)

另一个worker的启动函数为go RunWorker(mr.MasterAddress, port("worker"+strconv.Itoa(1)), MapFunc, ReduceFunc, -1)，之后再对结果进行检查并完成清理，流程和Basic基本类似。

Test Many Failures：

同样，首先调用mr := setup()建立MapReduce系统。当系统为完成之前，不断地进行循环，每隔一秒生成一个worker，并且每个worker做完10个job之后就会发生故障。

MIT 6.824 : Spring 2015 lab1 训练笔记的更多相关文章

MIT 6.824 : Spring 2015 lab3 训练笔记
摘要: 源代码参见我的github:https://github.com/YaoZengzeng/MIT-6.824 Lab3: Paxos-based Key/Value Service Intro ...
MIT 6.824 : Spring 2015 lab2 训练笔记
源代码参见我的github:https://github.com/YaoZengzeng/MIT-6.824 Lab 2:Primary/Backup Key/Value Service Overvi ...
MIT 6.824(Spring 2020) Lab1: MapReduce 文档翻译
首发于公众号:努力学习的阿新前言大家好,这里是阿新. MIT 6.824 是麻省理工大学开设的一门关于分布式系统的明星课程,共包含四个配套实验,实验的含金量很高,十分适合作为校招生的项目经历,在文 ...
MIT 6.824学习笔记4 Lab1
现在我们准备做第一个作业Lab1啦 wjk大神也在做6.824,可以参考大神的笔记https://github.com/zzzyyyxxxmmm/MIT6824_Distribute_System P ...
MIT 6.824 lab1:mapreduce
这是 MIT 6.824 课程 lab1 的学习总结,记录我在学习过程中的收获和踩的坑. 我的实验环境是 windows 10,所以对lab的code 做了一些环境上的修改,如果你仅仅对code 感兴 ...
Spring in Action 学习笔记三-AOP
面向切面的Spring 2015年10月9日 11:30 屏幕剪辑的捕获时间: 2015-10-9 14:30 屏幕剪辑的捕获时间: 2015-10-9 ...
1、Spring In Action 4th笔记（1）
Spring In Action 4th笔记(1) 2016-12-28 1.Spring是一个框架,致力于减轻JEE的开发,它有4个特点: 1.1 基于POJO(Plain Ordinary Jav ...
spring cloud（学习笔记）高可用注册中心（Eureka）的实现（二）
绪论前几天我用一种方式实现了spring cloud的高可用,达到两个注册中心,详情见spring cloud(学习笔记)高可用注册中心(Eureka)的实现(一),今天我意外发现,注册中心可以无限 ...
spring 中bean学习笔记
spring 中bean 一.bean的定义和应用 1. bean 形象上类似于getXX()和setXX()的一种. 2. 由于java是面向对象的,类的方法和属性在使用中需要实例化. 3. 规律: ...

随机推荐

分享5种风格的 jQuery 分页效果【附代码】
jPaginate 是一款非常精致的分页插件,提供了五种不同风格的分页效果,支持鼠标悬停翻页,快速分页功能.这款插件还提供了丰富的配置选项,你可以根据需要进行设置. 效果演示源码下载各个 ...
前端优秀作品展示，JavaScript 版水果忍者
<水果忍者>是一款非常受喜欢的手机游戏,刚看到新闻说<水果忍者>四周年新版要上线了.网页版的切水果游戏由百度 JS 小组开发,采用 vml + svg 绘图,使用了 Rapha ...
[deviceone开发]-打开新页动画效果
一.简介 do_App的openPage支持16种过场动画,这个示例直观的展示16种动画的效果.适合初学者. 二.效果图三.相关下载 https://github.com/do-project/co ...
ABAP 数据字典中的参考表和参考字段的作用
ABAP数据字典中的参考表和参考字段的作用大家最初在SE11中创建表和结构的时候都会遇到一个问题,如果设定了某个字段为QUAN或者CURR类型,也就是数量或金额的时候,总会要求输入一个参考 ...
linux集群运维工具:clustershell和pssh
由于需要安装hadoop集群,有10台机器需要安装,一开始打算用SCP复制,后来觉得不可接受(实际现场可能数倍的机器集群,就是10台也不想干).后来在网上找了,发现了clustershell和pssh ...
IOS常用第三方开源类库&组件
1.AFNetworking AFNetworking 采用 NSURLConnection + NSOperation, 主要方便与服务端 API 进行数据交换, 操作简单, 功能强大, 现在许多人 ...
Android 手机卫士--解析json与消息机制发送不同类型消息
本文地址:http://www.cnblogs.com/wuyudong/p/5900800.html,转载请注明源地址. 1.解析json数据解析json的代码很简单 JSONObject jso ...
android 浅谈Aidl 通讯机制
服务端: 首先是编写一个aidl文件,注意AIDL只支持方法,不能定义静态成员,并且方法也不能有类似public等的修饰符:AIDL运行方法有任何类型的参数和返回值,在java的类型中,以下的类型使用 ...
objective-c系列-NSMutableString
********************************************** NSMutableString为NSString的子类,除了父类的方法,NSMutableStirng还有 ...
【读书笔记】iOS网络-保护网络传输
一,验证服务器通信. 二,HTTP认证. 手机银行应用有两种认证模式:标准验证与快速验证.标准验证只是提示用户输入用户名与密码,而快速验证则让用户注册设备,然后使用PIN进行验证,每次验证时无需用户名 ...

MIT 6.824 : Spring 2015 lab1 训练笔记

MIT 6.824 : Spring 2015 lab1 训练笔记的更多相关文章

随机推荐

热门专题