MapReduce -- 最短路径

示例：

给出各个节点到相邻节点的距离，要求算出初始节点到各个节点的最短路径。

数据：

A    (B,)    (D,)

B    (C,)    (D,)

C    (E,)

D    (B,)    (C,)    (E,)

E    (A,)    (C,)

A节点为初始节点，A到B的距离为10，A到D的距离为5。

B节点到C的距离为1，B到D的距离为2，其他类推。

MapReduce计算最短路径

Map阶段

如：

A　　(B,10)　　(D,5)

A　　0　　(B,10)　　(D,5) #A到A的最短距离为0

B　　10 #存在A到B的距离为10

D　　5 #存在A到D的距离为5

从初始节点开始，将节点到其他相连节点的距离列举出来，然后传递给reduce，找到距离最短的。

记住从初始节点开始，从A开始，找到B和D，然后再找B和D的相邻节点，依次类推，这个就是广度优先搜索。

从A节点出发，A节点没有到达的节点默认的距离为inf表示无穷大。

Reduce阶段

找到所有存在的距离中最短的，并更新记录中的最短距离。

如：针对key值为B的

B　　inf　　(C,1)　　(D,2) #inf为最远距离，

B　　10　　

B　　10　　(C,1)　　(D,2)　　#A到B的最短距离为10

MapReduce过程中数据的变化：

原始数据：
A    (B,)    (D,)

B    (C,)    (D,)

C    (E,)

D    (B,)    (C,)    (E,)

E    (A,)    (C,)

第一次mr结果：

A        (B,)    (D,)                #从初始节点A出发，找到A到B节点和D节点的距离

B        (C,)    (D,)　　　　　　　　　 #找到B节点，且更新值，A到B节点目前的最短距离

C    inf    (E,)

D        (B,)    (C,)    (E,)        #找到D节点，且更新值，A到D节点目前的最短距离

E    inf    (A,)    (C,)

第二次mr结果

A        (B,)    (D,)

B        (C,)    (D,)

C        (E,)

D        (B,)    (C,)    (E,)

E        (A,)    (C,)

第三次mr结果

A        (B,)    (D,)

B        (C,)    (D,)

C        (E,)

D        (B,)    (C,)    (E,)

E        (A,)    (C,)

第四次mr结果

A        (B,)    (D,)

B        (C,)    (D,)

C        (E,)

D        (B,)    (C,)    (E,)

E        (A,)    (C,)

接下来还要考虑，什么时候所有节点的最短距离都计算完成？

我的计算方式，假设如果所有节点没有距离进行更新，说明所有节点的最短距离都已经计算完成，则完成计算。

源代码：

RunJob.java

 import java.io.IOException;

 import org.apache.hadoop.conf.Configuration;

 import org.apache.hadoop.fs.FileSystem;

 import org.apache.hadoop.fs.Path;

 import org.apache.hadoop.io.Text;

 import org.apache.hadoop.mapreduce.Job;

 import org.apache.hadoop.mapreduce.Mapper;

 import org.apache.hadoop.mapreduce.Reducer;

 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

 import org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat;

 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

 import org.apache.hadoop.util.StringUtils;

 /**

  * Created by Edward on 2016/7/15.

  */

 public class RunJob {

     static enum eInf {

         COUNTER

     }

     public static void main(String[] args) {

         Configuration conf = new Configuration();

         conf.set("fs.defaultFS", "hdfs://node1:8020");

         try {

             FileSystem fs = FileSystem.get(conf);

             int i = 0;

             long num = 1;

             long tmp = 0;

             while (num > 0) {

                 i++;

                 conf.setInt("run.counter", i);

                 Job job = Job.getInstance(conf);

                 job.setJarByClass(RunJob.class);

                 job.setMapperClass(ShortestPathMapper.class);

                 job.setReducerClass(ShortestPathReducer.class);

                 job.setMapOutputKeyClass(Text.class);

                 job.setMapOutputValueClass(Text.class);

                 //key value 的格式   第一个item为key，后面的item为value

                 job.setInputFormatClass(KeyValueTextInputFormat.class);

                 if (i == 1)

                     FileInputFormat.addInputPath(job, new Path("/test/shortestpath/input/"));

                 else

                     FileInputFormat.addInputPath(job, new Path("/test/shortestpath/output/sp" + (i - 1)));

                 Path outPath = new Path("/test/shortestpath/output/sp" + i);

                 if (fs.exists(outPath)) {

                     fs.delete(outPath, true);

                 }

                 FileOutputFormat.setOutputPath(job, outPath);

                 boolean b = job.waitForCompletion(true);

                 if (b) {

                     num = job.getCounters().findCounter(eInf.COUNTER).getValue();

                     if (num == 0) {

                         System.out.println("执行了" + i + "次，完成最短路径的计算");

                     }

                 }

             }

         } catch (Exception e) {

             e.printStackTrace();

         }

     }

     /**

      * @author Edward

      *

      *         @1 A (B,10) (D,5) =>

      *            A 0 (B,10) (D,5)

      *            B 10

      *            D 5

      *         @2 B 10 (C,1) (D,2) =>

      *         B 10 (C,1) (D,2)

      *         C 11

      *         D 13

      */

     public static class ShortestPathMapper extends Mapper<Text, Text, Text, Text> {

         protected void map(Text key, Text value, Context context) throws IOException, InterruptedException {

             int conuter = context.getConfiguration().getInt("run.counter", 1);

             Node node = new Node();

             String distance = null;

             String str = null;

             // 第一次计算，填写默认距离 A:0 其他:inf

             if (conuter == 1) {

                 if (key.toString().equals("A") || key.toString().equals("1")) {

                     distance = "0";

                 } else {

                     distance = "inf";

                 }

                 str = distance + "\t" + value.toString();

             } else {

                 str = value.toString();

             }

             context.write(key, new Text(str));

             node.FormatNode(str);

             // 没走到此节点 退出

             if (node.getDistance().equals("inf"))

                 return;

             // 重新计算源点A到各点的距离

             for (int i = 0; i < node.getNodeNum(); i++) {

                 String k = node.getNodeKey(i);

                 String v = new String(

                         Integer.parseInt(node.getNodeValue(i)) + Integer.parseInt(node.getDistance()) + "");

                 context.write(new Text(k), new Text(v));

             }

         }

     }

     /**

      * @author Edward

      *

      *         B 10 (C,1) (D,2)

      *         B 8              =>

      *         B 8 (C,1) (D,2)

      *

      */

     public static class ShortestPathReducer extends Reducer<Text, Text, Text, Text> {

         protected void reduce(Text arg0, Iterable<Text> arg1, Context arg2) throws IOException, InterruptedException {

             String min = null;

             int i = 0;

             String dis = "inf";

             Node node = new Node();

             for (Text t : arg1) {

                 i++;

                 dis = StringUtils.split(t.toString(), '\t')[0];

                 // 如果存在inf节点，表示存在没有计算距离的节点。

                 // if(dis.equals("inf"))

                 // arg2.getCounter(eInf.COUNTER).increment(1L);

                 // 判断是否存在相邻节点，如果是则需要保留信息，并找到最小距离进行更新。

                 String[] strs = StringUtils.split(t.toString(), '\t');

                 if (strs.length > 1) {

                     node.FormatNode(t.toString());

                 }

                 // 第一条数据默认是最小距离

                 if (i == 1) {

                     min = dis;

                 } else {

                     if (dis.equals("inf"))

                         ;

                     else if (min.equals("inf"))

                         min = dis;

                     else if (Integer.parseInt(min) > Integer.parseInt(dis)) {

                         min = dis;

                     }

                 }

             }

             // 有新的最小值，说明还在进行优化计算，需要继续循环计算

             if (!min.equals("inf")) {

                 if (node.getDistance().equals("inf"))

                     arg2.getCounter(eInf.COUNTER).increment(1L);

                 else {

                     if (Integer.parseInt(node.getDistance()) > Integer.parseInt(min))

                         arg2.getCounter(eInf.COUNTER).increment(1L);

                 }

             }

             node.setDistance(min);

             arg2.write(arg0, new Text(node.toString()));

         }

     }

 }

Node.java

 import org.apache.hadoop.util.StringUtils;

 /**

  * Created by Edward on 2016/7/15.

  */

 public class Node {

     private String distance;

     private String[] adjs;

     public String getDistance() {

         return distance;

     }

     public void setDistance(String distance) {

         this.distance = distance;

     }

     public String getKey(String str)

     {

         return str.substring(1, str.indexOf(","));

     }

     public String getValue(String str)

     {

         return str.substring(str.indexOf(",")+1, str.indexOf(")"));

     }

     public String getNodeKey(int num)

     {

         return getKey(adjs[num]);

     }

     public String getNodeValue(int num)

     {

         return getValue(adjs[num]);

     }

     public int getNodeNum()

     {

         return adjs.length;

     }

     public void FormatNode(String str)

     {

         if(str.length() == 0)

             return ;

         String[] strs =  StringUtils.split(str, '\t');

         adjs = new String[strs.length-1];

         for(int i=0; i<strs.length; i++)

         {

             if(i == 0)

             {

                 setDistance(strs[i]);

                 continue;

             }

             this.adjs[i-1]=strs[i];

         }

     }

     public String toString()

     {

         String str = this.distance+"" ;

         if(this.adjs == null)

             return str;

         for(String s:this.adjs)

         {

             str = str+"\t"+s;

         }

         return str;

     }

     public static void main(String[] args)

     {

         Node node  = new Node();

         node.FormatNode("1    (A,20)    (B,30)");

         System.out.println(node.distance+"|"+node.getNodeNum()+"|"+node.toString());

     }

 }

MapReduce -- 最短路径的更多相关文章

Hadoop MapReduce编程 API入门系列之最短路径（十五）
不多说,直接上代码. ======================================= Iteration: 1= Input path: out/shortestpath/input. ...
mapreduce shortest way out
相关知识最优路径算法是无向图中满足通路上所有顶点(除起点.终点外)各异,所有边也各异的通路.应用在公路运输中,可以提供起点和终点之间的最短路径,节省运输成本.可以大大提高交通运输效率. 本实验采用D ...
Mapreduce的文件和hbase共同输入
Mapreduce的文件和hbase共同输入 package duogemap; import java.io.IOException; import org.apache.hadoop.co ...
mapreduce多文件输出的两方法
mapreduce多文件输出的两方法 package duogemap; import java.io.IOException; import org.apache.hadoop.conf ...
mapreduce中一个map多个输入路径
package duogemap; import java.io.IOException; import java.util.ArrayList; import java.util.List; imp ...
Hadoop 中利用 mapreduce 读写 mysql 数据
Hadoop 中利用 mapreduce 读写 mysql 数据有时候我们在项目中会遇到输入结果集很大,但是输出结果很小,比如一些 pv.uv 数据,然后为了实时查询的需求,或者一些 OLAP ...
[Hadoop in Action] 第5章高阶MapReduce
链接多个MapReduce作业执行多个数据集的联结生成Bloom filter 1.链接MapReduce作业 [顺序链接MapReduce作业] mapreduce-1 | mapr ...
MapReduce
2016-12-21 16:53:49 mapred-default.xml mapreduce.input.fileinputformat.split.minsize 0 The minimum ...
Johnson 全源最短路径算法
解决单源最短路径问题(Single Source Shortest Paths Problem)的算法包括: Dijkstra 单源最短路径算法:时间复杂度为 O(E + VlogV),要求权值非负: ...

随机推荐

xxl-job 实现高可用
xxl-job-Admin是一个服务调度中心,管理所有的任务的触发. 1.如果xxl-job-Admin平台如果宕机了,该如何处理? 需要搭建集群. 2.xxl-job-Admin 如何搭建集群? 使 ...
es6 类和构造函数
html垂直居中
参考于http://www.cnblogs.com/yugege/p/5246652.html <!DOCTYPE html> <html lang="en"&g ...
Android--仿一号店货物详情轮播图动画效果
还不是很完全,目前只能点中间图片才能位移,图片外的其他区域没有..(属性动画),对了,图片加载用得是facebook的一款android图片加载库,感觉非常NB啊,完爆一切. 1.先看布局 <? ...
使用ADB无线连接Android真机进行调试
使用ADB无线连接Android真机进行调试其实这已经是一个很古老的知识了,记录一下备忘. 准备工作手机和电脑需要在同一个局域网内电脑上已经安装好ADB工具,可以是Mac或者Windows ...
Expo大作战(十五)--expo中splash启动页的详细机制
简要:本系列文章讲会对expo进行全面的介绍,本人从2017年6月份接触expo以来,对expo的研究断断续续,一路走来将近10个月,废话不多说,接下来你看到内容,讲全部来与官网我猜去全部机翻+个人 ...
C#取得控制台应用程序的根目录方法判断文件夹是否存在，不存在就创建
取得控制台应用程序的根目录方法1:Environment.CurrentDirectory 取得或设置当前工作目录的完整限定路径2:AppDomain.CurrentDomain.BaseDirect ...
CentOS 7下安装Python3.5
CentOS 7下安装Python3.5 •安装python3.5可能使用的依赖 yum install openssl-devel bzip2-devel expat-devel gdbm-deve ...
AWS CSAA -- 04 AWS Object Storage and CDN - S3 Glacier and CloudFront（三）
021 Storage Gateway 022 Snowball 023 Snowball - Lab 024 S3 Transfer Acceleration
反射式DLL注入--方法
使用RWX权限打开目标进程,并为该DLL分配足够大的内存. 将DLL复制到分配的内存空间. 计算DLL中用于执行反射加载的导出的内存偏移量. 调用CreateRemoteThread(或类似的未公开的 ...

MapReduce -- 最短路径

MapReduce计算最短路径

MapReduce -- 最短路径的更多相关文章

随机推荐

热门专题