YARN集群的mapreduce测试（三）

将user表、group表、order表关；（类似于多表关联查询）

测试准备:

首先同步时间，然后开启hdfs集群，开启yarn集群；在本地"/home/hadoop/test/"目录创建user表、group表、order表的文件；

user文件：

group文件：

order文件：

测试目标：

得到3张表关联后的结果；

测试代码：

一定要把握好输出键值的类型，否则有可能造成有输出目录，但是没有文件内容的问题；

package com.mmzs.bigdata.yarn.mapreduce;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.lib.input.FileSplit;

public class UserGroupMapper01 extends Mapper<LongWritable, Text, Text, Text> {

    private Text outKey;

    private Text outValue;

    @Override

    protected void setup(Mapper<LongWritable, Text, Text, Text>.Context context)

            throws IOException, InterruptedException {

        outKey = new Text();

        outValue = new Text();

    }

    @Override

    protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, Text>.Context context)

            throws IOException, InterruptedException {

        FileSplit fp = (FileSplit) context.getInputSplit();

        String fileName = fp.getPath().getName();

        String line = value.toString();

        String[] fields = line.split("\\s+");

        String keyStr = null;

        String valueStr = null;

        if ("group".equalsIgnoreCase(fileName)) {

            keyStr = fields[0];

            valueStr = new StringBuilder(fields[1]).append("-->").append(fileName).toString();

        } else {

            keyStr = fields[2];

            //加“-->”;后以此标识符作为分割符，进行文件区分

            valueStr = new StringBuilder(fields[0]).append("\t").append(fields[1]).append("-->").append(fileName).toString();

        }

        outKey.set(keyStr);

        outValue.set(valueStr);

        context.write(outKey, outValue);

    }

    @Override

    protected void cleanup(Mapper<LongWritable, Text, Text, Text>.Context context)

            throws IOException, InterruptedException {

        outKey = null;

        outValue = null;

    }

}

UserGroupMapper01

 package com.mmzs.bigdata.yarn.mapreduce;

 import java.io.IOException;

 import java.util.ArrayList;

 import java.util.Iterator;

 import java.util.List;

 import org.apache.hadoop.io.Text;

 import org.apache.hadoop.mapreduce.Reducer;

 public class UserGroupReducer01 extends Reducer<Text, Text, Text, Text> {

     private Text outValue;

     @Override

     protected void setup(Reducer<Text, Text, Text, Text>.Context context) throws IOException, InterruptedException {

         outValue = new Text();

     }

     @Override

     protected void reduce(Text key, Iterable<Text> values, Reducer<Text, Text, Text, Text>.Context context)

             throws IOException, InterruptedException {

         Iterator<Text> its = values.iterator();

         String masterRecord = null;

         List<String> slaveRecords = new ArrayList<String>();

         //拆分出主表记录和从表记录

         while (its.hasNext()) {

             String[] rowAndFileName = its.next().toString().split("-->");

             if (rowAndFileName[1].equalsIgnoreCase("group")) {

                 masterRecord = rowAndFileName[0];

                 continue;

             }

             slaveRecords.add(rowAndFileName[0]);

         }

         for (String slaveRecord : slaveRecords) {

             String valueStr = new StringBuilder(masterRecord).append("\t").append(slaveRecord).toString();

             outValue.set(valueStr);

             context.write(key, outValue);

         }

     }

     @Override

     protected void cleanup(Reducer<Text, Text, Text, Text>.Context context) throws IOException, InterruptedException {

         outValue = null;

     }

 }

UserGroupReducer01

 package com.mmzs.bigdata.yarn.mapreduce;

 import java.io.IOException;

 import java.net.URI;

 import java.net.URISyntaxException;

 import org.apache.hadoop.conf.Configuration;

 import org.apache.hadoop.fs.FileSystem;

 import org.apache.hadoop.fs.Path;

 import org.apache.hadoop.io.LongWritable;

 import org.apache.hadoop.io.Text;

 import org.apache.hadoop.mapreduce.Job;

 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

 /**

  * @author hadoop

  *

  */

 public class UserGroupDriver01 {

     private static FileSystem fs;

     private static Configuration conf;

     static {

         String uri = "hdfs://master01:9000/";

         conf = new Configuration();

         try {

             fs = FileSystem.get(new URI(uri), conf, "hadoop");

         } catch (IOException e) {

             e.printStackTrace();

         } catch (InterruptedException e) {

             e.printStackTrace();

         } catch (URISyntaxException e) {

             e.printStackTrace();

         }

     }

     public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {

         Job ugJob01 = getJob(args);

         if (null == ugJob01) {

             return;

         }

         //提交Job到集群并等待Job运行完成，参数true表示将Job运行时的状态信息返回到客户端

         boolean flag = false;

         flag = ugJob01.waitForCompletion(true);

         System.exit(flag?0:1);

     }

     /**

      * 获取Job实例

      * @param args

      * @return

      * @throws IOException

      */

     public static Job getJob(String[] args) throws IOException {

         if (null==args || args.length<2) return null;

         //放置需要处理的数据所在的HDFS路径

         Path inputPath = new Path(args[0]);

         //放置Job作业执行完成之后其处理结果的输出路径

         Path outputPath = new Path(args[1]);

         //主机文件路径

         Path userPath = new Path("/home/hadoop/test/user");

         Path groupPath = new Path("/home/hadoop/test/group");

         //如果输入的集群路径存在，则删除

         if (fs.exists(inputPath)) {

             fs.delete(inputPath, true);//true表示递归删除

         }

         if (fs.exists(outputPath)) {

             fs.delete(outputPath, true);//true表示递归删除

         }

         //创建并且将数据文件拷贝到创建的集群路径

         fs.mkdirs(inputPath);

         fs.copyFromLocalFile(false, false, new Path[]{userPath, groupPath}, inputPath);

         //获取Job实例

         Job ugJob01 = Job.getInstance(conf, "UserGroupJob01");

         //设置运行此jar包入口类

         //ugJob01的入口是WordCountDriver类

         ugJob01.setJarByClass(UserGroupDriver01.class);

         //设置Job调用的Mapper类

         ugJob01.setMapperClass(UserGroupMapper01.class);

         //设置Job调用的Reducer类（如果一个Job没有Reducer则可以不调用此条语句）

         ugJob01.setReducerClass(UserGroupReducer01.class);

         //设置MapTask的输出键类型

         ugJob01.setMapOutputKeyClass(Text.class);

         //设置MapTask的输出值类型

         ugJob01.setMapOutputValueClass(Text.class);

         //设置整个Job的输出键类型（如果一个Job没有Reducer则可以不调用此条语句）

         ugJob01.setOutputKeyClass(Text.class);

         //设置整个Job的输出值类型（如果一个Job没有Reducer则可以不调用此条语句）

         ugJob01.setOutputValueClass(Text.class);

         //设置整个Job需要处理数据的输入路径

         FileInputFormat.setInputPaths(ugJob01, inputPath);

         //设置整个Job计算结果的输出路径

         FileOutputFormat.setOutputPath(ugJob01, outputPath);

         return ugJob01;

     }

 }

UserGroupDriver01

 package com.mmzs.bigdata.yarn.mapreduce;

 import java.io.IOException;

 import org.apache.hadoop.io.LongWritable;

 import org.apache.hadoop.io.Text;

 import org.apache.hadoop.mapreduce.Mapper;

 import org.apache.hadoop.mapreduce.lib.input.FileSplit;

 public class UserGroupMapper02 extends Mapper<LongWritable, Text, Text, Text> {

     private Text outKey;

     private Text outValue;

     @Override

     protected void setup(Mapper<LongWritable, Text, Text, Text>.Context context)

             throws IOException, InterruptedException {

         outKey = new Text();

         outValue = new Text();

     }

     @Override

     protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, Text>.Context context)

             throws IOException, InterruptedException {

         FileSplit fp = (FileSplit) context.getInputSplit();

         String fileName = fp.getPath().getName();

         String line = value.toString();

         String[] fields = line.split("\\s+");

         String keyStr = fields[2];

         String valueStr = null;

         valueStr = new StringBuilder(fields[0]).append("\t").append(fields[1]).append("\t").append(fields[3]).append("-->").append(fileName).toString();

         outKey.set(keyStr);

         outValue.set(valueStr);

         context.write(outKey, outValue);

     }

     @Override

     protected void cleanup(Mapper<LongWritable, Text, Text, Text>.Context context)

             throws IOException, InterruptedException {

         outKey = null;

         outValue = null;

     }

 }

UserGroupMapper02

 package com.mmzs.bigdata.yarn.mapreduce;

 import java.io.IOException;

 import java.util.ArrayList;

 import java.util.Iterator;

 import java.util.List;

 import org.apache.hadoop.io.Text;

 import org.apache.hadoop.mapreduce.Reducer;

 public class UserGroupReducer02 extends Reducer<Text, Text, Text, Text> {

     private Text outValue;

     @Override

     protected void setup(Reducer<Text, Text, Text, Text>.Context context) throws IOException, InterruptedException {

         outValue = new Text();

     }

     @Override

     protected void reduce(Text key, Iterable<Text> values, Reducer<Text, Text, Text, Text>.Context context)

             throws IOException, InterruptedException {

         String masterRecord = null;

         List<String> slaveRecords = new ArrayList<String>();

         //拆分出主表记录和从表记录

         Iterator<Text> its = values.iterator();

         while (its.hasNext()) {

             String[] rowAndFileName = its.next().toString().split("-->");

             if (!rowAndFileName[1].equalsIgnoreCase("order")) {

                 masterRecord = rowAndFileName[0];

                 continue;

             }

             slaveRecords.add(rowAndFileName[0]);

         }

         for (String slaveRecord : slaveRecords) {

             String valueStr = new StringBuilder(masterRecord).append("\t").append(slaveRecord).toString();

             outValue.set(valueStr);

             context.write(key, outValue);

         }

     }

     @Override

     protected void cleanup(Reducer<Text, Text, Text, Text>.Context context) throws IOException, InterruptedException {

         outValue = null;

     }

 }

UserGroupReducer02

 package com.mmzs.bigdata.yarn.mapreduce;

 import java.io.IOException;

 import java.net.URI;

 import java.net.URISyntaxException;

 import org.apache.hadoop.conf.Configuration;

 import org.apache.hadoop.fs.FileSystem;

 import org.apache.hadoop.fs.Path;

 import org.apache.hadoop.io.Text;

 import org.apache.hadoop.mapreduce.Job;

 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

 /**

  * @author hadoop

  *

  */

 public class UserGroupDriver02 {

     private static FileSystem fs;

     private static Configuration conf;

     static {

         String uri = "hdfs://master01:9000/";

         conf = new Configuration();

         try {

             fs = FileSystem.get(new URI(uri), conf, "hadoop");

         } catch (IOException e) {

             e.printStackTrace();

         } catch (InterruptedException e) {

             e.printStackTrace();

         } catch (URISyntaxException e) {

             e.printStackTrace();

         }

     }

     public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {

         Job ugJob02 = getJob(new String[] {args[1], args[2]});

         if (null == ugJob02) {

             return;

         }

         //提交Job到集群并等待Job运行完成，参数true表示将Job运行时的状态信息返回到客户端

         boolean flag = false;

         flag = ugJob02.waitForCompletion(true);

         System.exit(flag?0:1);

     }

     /**

      * 获取Job实例

      * @param args

      * @return

      * @throws IOException

      */

     public static Job getJob(String[] args) throws IOException {

         if (null==args || args.length<2) return null;

         //放置需要处理的数据所在的HDFS路径

         Path inputPath = new Path(args[1]);

         //放置Job作业执行完成之后其处理结果的输出路径

         Path outputPath = new Path(args[2]);

         //主机文件路径

         Path orderPath = new Path("/home/hadoop/test/order");

         //输入的集群路径存在，在第一次已创建

         if (!fs.exists(inputPath)) return null;

         if (fs.exists(outputPath)) {

             fs.delete(outputPath, true);//true表示递归删除

         }

         //将数据文件拷贝到创建的集群路径

         fs.copyFromLocalFile(false, false, orderPath, inputPath);

         //获取Job实例

         Job ugJob02 = Job.getInstance(conf, "UserGroupJob02");

         //设置运行此jar包入口类

         //ugJob02的入口是WordCountDriver类

         ugJob02.setJarByClass(UserGroupDriver02.class);

         //设置Job调用的Mapper类

         ugJob02.setMapperClass(UserGroupMapper02.class);

         //设置Job调用的Reducer类（如果一个Job没有Reducer则可以不调用此条语句）

         ugJob02.setReducerClass(UserGroupReducer02.class);

         //设置MapTask的输出键类型

         ugJob02.setMapOutputKeyClass(Text.class);

         //设置MapTask的输出值类型

         ugJob02.setMapOutputValueClass(Text.class);

         //设置整个Job的输出键类型（如果一个Job没有Reducer则可以不调用此条语句）

         ugJob02.setOutputKeyClass(Text.class);

         //设置整个Job的输出值类型（如果一个Job没有Reducer则可以不调用此条语句）

         ugJob02.setOutputValueClass(Text.class);

         //设置整个Job需要处理数据的输入路径

         FileInputFormat.setInputPaths(ugJob02, inputPath);

         //设置整个Job计算结果的输出路径

         FileOutputFormat.setOutputPath(ugJob02, outputPath);

         return ugJob02;

     }

 }

UserGroupDriver02

 package com.mmzs.bigdata.yarn.mapreduce;

 import java.io.IOException;

 import java.net.URI;

 import java.net.URISyntaxException;

 import org.apache.hadoop.conf.Configuration;

 import org.apache.hadoop.fs.FileSystem;

 import org.apache.hadoop.fs.Path;

 import org.apache.hadoop.mapreduce.Job;

 public class UserGroupDriver {

     private static FileSystem fs;

     private static Configuration conf;

     private static final String TEMP= "hdfs://master01:9000/data/usergrouporder/tmp";

     static {

         String uri = "hdfs://master01:9000/";

         conf = new Configuration();

         try {

             fs = FileSystem.get(new URI(uri), conf, "hadoop");

         } catch (IOException e) {

             e.printStackTrace();

         } catch (InterruptedException e) {

             e.printStackTrace();

         } catch (URISyntaxException e) {

             e.printStackTrace();

         }

     }

     public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {

         String[] params = {args[0], TEMP, args[1]};

         //运行第1个Job

         Job ugJob01 = UserGroupDriver01.getJob(params);

         //提交Job到集群并等待Job运行完成，参数true表示将Job运行时的状态信息返回到客户端

         boolean flag01 = ugJob01.waitForCompletion(true);

         if (!flag01) {

             System.out.println("job2 running failure......");

             System.exit(1);

         }

         //运行第2个Job

         Job ugJob02 = UserGroupDriver02.getJob(params);

         //提交Job到集群并等待Job运行完成，参数true表示将Job运行时的状态信息返回到客户端

         boolean flag02 = ugJob02.waitForCompletion(true);

         if (flag02) {//等待Job02完成后就删掉中间目录并退出；

 //            fs.delete(new Path(TEMP), true);

             System.out.println("job2 running success......");

             System.exit(0);

         }

         System.out.println("job2 running failure......");

         System.exit(1);

     }

 }

UserGroupDriver

为了更好的测试，可以先屏蔽删除中间输出结果的语句；

//总Driver

String[] params = {args[0], TEMP, args[1]};

//运行第1个Job

Job ugJob01 = UserGroupDriver01.getJob(params);

//运行第2个Job

Job ugJob02 = UserGroupDriver02.getJob(params);

//分Driver01
//放置需要处理的数据所在的HDFS路径

Path inputPath = new Path(args[0]);//params中的args[0]

//放置Job作业执行完成之后其处理结果的输出路径

Path outputPath = new Path(args[1]);//params中的TEMP

//分Driver02

//params中的TEMP和args[2]//放置需要处理的数据所在的HDFS路径

Path inputPath = new Path(args[1]);

//放置Job作业执行完成之后其处理结果的输出路径

Path outputPath = new Path(args[2]);

测试结果：

运行时传入参数是：

如果在eclipse上运行：传参需要加上集群的master的uri即 hdfs://master01:9000

输入路径参数： /data/usergrouporder/src

输出路径参数： /data/usergrouporder/dst

YARN集群的mapreduce测试（三）的更多相关文章

YARN集群的mapreduce测试（六）
两张表链接操作(分布式缓存): ----------------------------------假设:其中一张A表,只有20条数据记录(比如group表)另外一张非常大,上亿的记录数量(比如use ...
YARN集群的mapreduce测试（五）
将user表计算后的结果分区存储测试准备: 首先同步时间,然后master先开启hdfs集群,再开启yarn集群:用jps查看: master上: 先有NameNode.SecondaryNameN ...
YARN集群的mapreduce测试（一）
hadoop集群搭建中配置了mapreduce的别名是yarn [hadoop@master01 hadoop]$ mv mapred-site.xml.template mapred-site.xm ...
YARN集群的mapreduce测试（四）
将手机用户使用流量的数据进行分组,排序: 测试准备: 首先同步时间,然后master先开启hdfs集群,再开启yarn集群:用jps查看: master上: 先有NameNode.SecondaryN ...
YARN集群的mapreduce测试（二）
只有mapTask任务没有reduceTask的情况: 测试准备: 首先同步时间,然后开启hdfs集群,开启yarn集群:在本地"/home/hadoop/test/"目录创建u ...
大数据入门第八天——MapReduce详解（三）MR的shuffer、combiner与Yarn集群分析
/mr的combiner /mr的排序 /mr的shuffle /mr与yarn /mr运行模式 /mr实现join /mr全局图 /mr的压缩今日提纲一.流量汇总排序的实现 1.需求对日志数据 ...
大数据【三】YARN集群部署
一概述 YARN是一个资源管理.任务调度的框架,采用master/slave架构,主要包含三大模块:ResourceManager(RM).NodeManager(NM).ApplicationMa ...
Spark on Yarn 集群运行要点
实验版本:spark-1.6.0-bin-hadoop2.6 本次实验主要是想在已有的Hadoop集群上使用Spark,无需过多配置 1.下载&解压到一台使用spark的机器上即可 2.修改配 ...
使用Cloudera Manager搭建MapReduce集群及MapReduce HA
使用Cloudera Manager搭建MapReduce集群及MapReduce HA 作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.通过CM部署MapReduce On ...

随机推荐

bower学习总结
1. 安装软件:node-v6.10.3-x64.msi 和 Git-2.13.0-64-bit.exe 在安装git时,需要选择‘Run Git from the Windows Command P ...
HttpServletRequest.getContextPath()取得的路径
如果项目名称为test,你在浏览器中输入请求路径:http://localhost:8080/test/pc/list.jsp 执行下面向行代码后打印出如下结果: 1. System.out.prin ...
ASP.NET Core - Razor 页面介绍
简介随着ASP.NET Core 2 即将来临,最热门的新事物是Razor页面.在之前的一篇文章中,我们简要介绍了ASP.NET Core Razor 页面. Razor页面是ASP.NET Cor ...
Dubbo 源码分析 - SPI 机制
1.简介 SPI 全称为 Service Provider Interface,是 Java 提供的一种服务发现机制.SPI 的本质是将接口实现类的全限定名配置在文件中,并由服务加载器读取配置文件,加 ...
jenkins在windows服务器上执行含git push命令的脚本权限不足的解决方法
错误摘要默认情况下执行脚本是没问题的,但是脚本中含有git push命令就无法执行了用jenkins部署hexo博客时候遇到的,执行hexo d -g一直阻塞至Build was aborted, ...
MSMQ队列的简单使用
微软消息队列-MicroSoft Message Queue(MSMQ) 使用感受:简单. 一.windows安装MSMQ服务控制面板->控制面板->所有控制面板项->程序和功能- ...
mysql5.5 五种日期
mysql(5.5)所支持的日期时间类型有:DATETIME. TIMESTAMP.DATE.TIME.YEAR. 几种类型比较如下: 日期时间类型占用空间日期格式最小值最大值零值表示 D ...
MVC详解
模型-视图-控制器(Modal View Controler,MVC)是Xerox PARC在八十年代为编程语言Smalltalk-80发明的一种软件设计模式,至今已被广泛使用.最近几年被推荐为Sun ...
Kali学习笔记9：端口扫描详解（上）
UDP端口扫描: 原理:回应ICMP不可达,代表端口关闭:没有回应,端口开启建议了解应用层的UDP包头结构,构建对应的UDP数据包用来提高准确度另外:所有的扫描都存在误判情况我们用Scapy写个 ...
线程池工厂Executors编程的艺术
Executors是一个线程池的工厂类,提供各种有用的线程池的创建,使用得当,将会使我们并发编程变得简单!今天就来聊聊这个工厂类的艺术吧! Executors只是Executor框架的主要成员组件之一 ...

YARN集群的mapreduce测试（三）

YARN集群的mapreduce测试（三）的更多相关文章

随机推荐

热门专题