[DB] MapReduce 例题

词频统计（word count）

一篇文章用哈希表统计即可
对互联网所有网页的词频进行统计（Google搜索引擎的需求），无法将所有网页读入内存
map：将单词提取出来，对每个单词输入一个<word,1>这样的<k,v>对，进而将相同的数据放在一起，形成<word,<1,1,1,...>>这样的<k,v集合>
reduce：将集合里的1求和，再将单词和这个和组成<word,sum>输出
一个map函数仅对一个HDFS数据块上的数据进行计算，从而实现大数据的分布式计算
在分布式集群中调度执行MapReduce程序的计算框架也叫MapReduce

WordCountMain.java

 1 import org.apache.hadoop.conf.Configuration;

 2 import org.apache.hadoop.fs.Path;

 3 import org.apache.hadoop.io.IntWritable;

 4 import org.apache.hadoop.mapreduce.Job;

 5 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

 6 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

 7 import com.sun.jersey.core.impl.provider.entity.XMLJAXBElementProvider.Text;

 8

 9 public class WordCountMain {

10     public static void main(String[] args) throws Exception{

11

12         Job job = Job.getInstance(new Configuration());

13         job.setJarByClass(WordCountMain.class);

14

15         job.setMapperClass(WordCountMapper.class);

16         job.setOutputKeyClass(Text.class);

17         job.setMapOutputValueClass(IntWritable.class);

18

19         job.setReducerClass(WordCountReducer.class);

20         job.setOutputKeyClass(Text.class);

21         job.setMapOutputValueClass(IntWritable.class);

22

23         FileInputFormat.setInputPaths(job, new Path(args[0]));

24         FileOutputFormat.setOutputPath(job, new Path(args[1]));

25     }

26 }

WordCountMapper.java

 1 import java.io.IOException;

 2 import org.apache.hadoop.io.IntWritable;

 3 import org.apache.hadoop.io.LongWritable;

 4 import org.apache.hadoop.io.Text;

 5 import org.apache.hadoop.mapreduce.Mapper;

 6

 7 public class WordCountMapper extends Mapper<LongWritable,Text,Text,IntWritable>{

 8

 9     @Override

10     protected void map(LongWritable key1, Text value1, Context context)

11             throws IOException, InterruptedException {

12         String data = value1.toString();

13         String[] words = data.split(" ");

14         for(String w:words) {

15             context.write(new Text(w),new IntWritable(1));

16         }

17     }

18 }

WordCountReducer.java

 1 import java.io.IOException;

 2 import org.apache.hadoop.io.IntWritable;

 3 import org.apache.hadoop.io.Text;

 4 import org.apache.hadoop.mapreduce.Reducer;

 5

 6 //                                              k3       v3       k4       v4

 7 public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

 8

 9     @Override

10     protected void reduce(Text k3, Iterable<IntWritable> v3,Context context) throws IOException, InterruptedException {

11         int total = 0;

12         for(IntWritable v:v3) {

13             total = total + v.get();

14         }

15         context.write(k3, new IntWritable(total));

16     }

17 }

求部门工资总额

SQL：select deptno,sum(sal) from emp gruop by deptno;
分析数据类型，套用模板重写map()、reduce()
导出jar包，指定main class
把数据保存在hdfs中
hadoop jar s1.jar /input/emp.csv /output/0910/s1

SalaryTotalMain.java

 1 import java.io.IOException;

 2

 3 import org.apache.hadoop.conf.Configuration;

 4 import org.apache.hadoop.fs.Path;

 5 import org.apache.hadoop.io.IntWritable;

 6 import org.apache.hadoop.mapreduce.Job;

 7 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

 8 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

 9

10 public class SalaryTotalMain {

11

12     public static void main(String[] args) throws Exception {

13         //1、创建任务、指定任务的入口

14         Job job = Job.getInstance(new Configuration());

15         job.setJarByClass(SalaryTotalMain.class);

16

17         //2、指定任务的map和map输出的数据类型

18         job.setMapperClass(SalaryTotalMapper.class);

19         job.setMapOutputKeyClass(IntWritable.class);

20         job.setMapOutputValueClass(IntWritable.class);

21

22         //3、指定任务的reducer和reducer输出的类型

23         job.setReducerClass(SalaryTotalReducer.class);

24         job.setOutputKeyClass(IntWritable.class);

25         job.setOutputValueClass(IntWritable.class);

26

27         //4、指定任务输入路径和输出路径

28         FileInputFormat.setInputPaths(job, new Path(args[0]));

29         FileOutputFormat.setOutputPath(job, new Path(args[1]));

30

31         //5、执行任务

32         job.waitForCompletion(true);

33     }

34 }

SalaryTotalMapper.java

 1 import java.io.IOException;

 2

 3 import org.apache.hadoop.io.IntWritable;

 4 import org.apache.hadoop.io.LongWritable;

 5 import org.apache.hadoop.io.Text;

 6 import org.apache.hadoop.mapreduce.Mapper;

 7

 8 public class SalaryTotalMapper extends Mapper<LongWritable, Text, IntWritable, IntWritable> {

 9

10     @Override

11     protected void map(LongWritable key1, Text value1,Context context)

12             throws IOException, InterruptedException {

13         // 数据：7654,MARTIN,SALESMAN,7698,1981/9/28,1250,1400,30

14         String data = value1.toString();

15

16         //分词

17         String[] words = data.split(",");

18

19         //输出：k2 部门号   v2员工的工资

20         context.write(new IntWritable(Integer.parseInt(words[7])),

21                       new IntWritable(Integer.parseInt(words[5])));

22     }

23 }

SalaryTotalReducer.java

 1 import java.io.IOException;

 2

 3 import org.apache.hadoop.io.IntWritable;

 4 import org.apache.hadoop.mapreduce.Reducer;

 5

 6 public class SalaryTotalReducer extends Reducer<IntWritable, IntWritable, IntWritable, IntWritable>{

 7

 8     @Override

 9     protected void reduce(IntWritable k3, Iterable<IntWritable> v3,Context context)

10             throws IOException, InterruptedException {

11         // 求v3求和

12         int total = 0;

13         for(IntWritable v:v3){

14             total = total + v.get();

15         }

16

17         //输出  k4  部门号    v4是部门的工资总额

18         context.write(k3, new IntWritable(total));

19     }

20

21 }

数据去重

select distinct job from emp;
用MR实现
只有Mapper没有Reducer（排序）

DistinctMain.java

 1 import java.io.IOException;

 2

 3 import org.apache.hadoop.conf.Configuration;

 4 import org.apache.hadoop.fs.Path;

 5 import org.apache.hadoop.io.IntWritable;

 6 import org.apache.hadoop.io.NullWritable;

 7 import org.apache.hadoop.io.Text;

 8 import org.apache.hadoop.mapreduce.Job;

 9 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

10 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

11

12 public class DistinctMain {

13

14     public static void main(String[] args) throws Exception {

15         //1、创建一个任务

16         Job job = Job.getInstance(new Configuration());

17         job.setJarByClass(DistinctMain.class); //任务的入口

18

19         //2、指定任务的map和map输出的数据类型

20         job.setMapperClass(DistinctMapper.class);

21         job.setMapOutputKeyClass(Text.class);  //k2的数据类型

22         job.setMapOutputValueClass(NullWritable.class);  //v2的类型

23

24         //3、指定任务的reduce和reduce的输出数据的类型

25         job.setReducerClass(DistinctReducer.class);

26         job.setOutputKeyClass(Text.class); //k4的类型

27         job.setOutputValueClass(NullWritable.class); //v4的类型

28

29         //4、指定任务的输入路径、任务的输出路径

30         FileInputFormat.setInputPaths(job, new Path(args[0]));

31         FileOutputFormat.setOutputPath(job, new Path(args[1]));

32

33         //5、执行任务

34         job.waitForCompletion(true);

35     }

36 }

DistinctMapper.java

 1 import java.io.IOException;

 2

 3 import org.apache.hadoop.io.LongWritable;

 4 import org.apache.hadoop.io.NullWritable;

 5 import org.apache.hadoop.io.Text;

 6 import org.apache.hadoop.mapreduce.Mapper;

 7

 8 //                                                             k2 职位job

 9 public class DistinctMapper extends Mapper<LongWritable, Text, Text, NullWritable> {

10

11     @Override

12     protected void map(LongWritable key1, Text value1, Context context)

13             throws IOException, InterruptedException {

14         //数据：7499,ALLEN,SALESMAN,7698,1981/2/20,1600,300,30

15         String data = value1.toString();

16

17         //分词

18         String[] words = data.split(",");

19

20         //输出：把职位job作为key2

21         context.write(new Text(words[2]), NullWritable.get());

22     }

23 }

DistinctReducer.java

 1 import java.io.IOException;

 2

 3 import org.apache.hadoop.io.NullWritable;

 4 import org.apache.hadoop.io.Text;

 5 import org.apache.hadoop.mapreduce.Reducer;

 6

 7 public class DistinctReducer extends Reducer<Text, NullWritable, Text, NullWritable> {

 8

 9     @Override

10     protected void reduce(Text k3, Iterable<NullWritable> v3,Context context) throws IOException, InterruptedException {

11         // 直接把k3输出即可

12         context.write(k3, NullWritable.get());

13     }

14

15 }

多表查询

select dname,ename from dept, emp where emp.deptno=dept.deptno;
关系型数据库的子查询会转换成多表查询（通过执行计划看SQL语句的执行过程和效率）
笛卡尔积：列数相加，行数相乘，得到全集
用连接条件（如 emp.deptno=dept.deptno）去掉全集中的错误数据
连接条件至少N-1个（N为表的个数）
根据连接条件不同，分为：
- 等值连接 / 不等值连接
- 外连接 / 自连接
分析输入输出-->确定数据类型-->写mapper和reducer程序

通过MR实现连接

EqualJoinMain.java

 1 package day0917.mr.equaljoin;

 2

 3 import java.io.IOException;

 4

 5 import org.apache.hadoop.conf.Configuration;

 6 import org.apache.hadoop.fs.Path;

 7 import org.apache.hadoop.io.IntWritable;

 8 import org.apache.hadoop.io.NullWritable;

 9 import org.apache.hadoop.io.Text;

10 import org.apache.hadoop.mapreduce.Job;

11 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

12 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

13

14 public class EqualJoinMain {

15

16     public static void main(String[] args) throws Exception {

17         //1、创建一个任务

18         Job job = Job.getInstance(new Configuration());

19         job.setJarByClass(EqualJoinMain.class); //任务的入口

20

21         //2、指定任务的map和map输出的数据类型

22         job.setMapperClass(EqualJoinMapper.class);

23         job.setMapOutputKeyClass(IntWritable.class);  //k2的数据类型

24         job.setMapOutputValueClass(Text.class);  //v2的类型

25

26         //3、指定任务的reduce和reduce的输出数据的类型

27         job.setReducerClass(EqualJoinReducer.class);

28         job.setOutputKeyClass(Text.class); //k4的类型

29         job.setOutputValueClass(Text.class); //v4的类型

30

31         //4、指定任务的输入路径、任务的输出路径

32         FileInputFormat.setInputPaths(job, new Path(args[0]));

33         FileOutputFormat.setOutputPath(job, new Path(args[1]));

34

35         //5、执行任务

36         job.waitForCompletion(true);

37

38     }

39

40 }

EqualJoinMapper.java

 1 import java.io.IOException;

 2

 3 import org.apache.hadoop.io.IntWritable;

 4 import org.apache.hadoop.io.LongWritable;

 5 import org.apache.hadoop.io.Text;

 6 import org.apache.hadoop.mapreduce.Mapper;

 7

 8 public class EqualJoinMapper extends Mapper<LongWritable, Text, IntWritable, Text> {

 9

10     @Override

11     protected void map(LongWritable key1, Text value1, Context context)

12             throws IOException, InterruptedException {

13         //数据可能是部门，也可能是员工

14         String data = value1.toString();

15

16         //分词

17         String[] words = data.split(",");

18

19         //判断数组的长度

20         if(words.length == 3){

21             //得到是部门数据：部门号 部门名称

22             context.write(new IntWritable(Integer.parseInt(words[0])), new Text("*"+words[1]));

23         }else{

24             //员工数据 : 员工的部门号 员工的姓名

25             context.write(new IntWritable(Integer.parseInt(words[7])), new Text(words[1]));

26         }

27

28     }

29 }

EqualJoinReducer.java

 1 import java.io.IOException;

 2

 3 import org.apache.hadoop.io.IntWritable;

 4 import org.apache.hadoop.io.Text;

 5 import org.apache.hadoop.mapreduce.Reducer;

 6

 7 public class EqualJoinReducer extends Reducer<IntWritable, Text, Text, Text> {

 8

 9     @Override

10     protected void reduce(IntWritable k3, Iterable<Text> v3, Context context)

11             throws IOException, InterruptedException {

12         // 处理v3：可能是部门名称、也可能是员工的姓名

13         String dname = "";

14         String empNameList = "";

15

16         for(Text value:v3){

17             String str = value.toString();

18             //判断是否存在*

19             int index = str.indexOf("*");

20             if(index >= 0){

21                 //代表是部门的名称

22                 dname = str.substring(1);

23             }else{

24                 //代表是员工的名称

25                 empNameList = str + ";" + empNameList;

26             }

27         }

28

29         //输出

30         context.write(new Text(dname), new Text(empNameList));

31     }

32

33 }

自连接

通过表的别名，将同一张表视为多张表
一个人是下级的老板，同时是上级的员工
同一条数据输出两次，一次作为老板，一次作为员工（看作两张表）
相同的k2，其value2会被同一个reducer处理
存在非法数据需进行清洗（如把大老板的老板编号置为-1）
老板和员工同时存在才输出（员工树的根节点和叶子节点不显示）

SelfJoinMain.java

 1 import java.io.IOException;

 2

 3 import org.apache.hadoop.conf.Configuration;

 4 import org.apache.hadoop.fs.Path;

 5 import org.apache.hadoop.io.IntWritable;

 6 import org.apache.hadoop.io.Text;

 7 import org.apache.hadoop.mapreduce.Job;

 8 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

 9 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

10

11 public class SelfJoinMain {

12

13     public static void main(String[] args) throws Exception {

14         //1、创建一个任务

15         Job job = Job.getInstance(new Configuration());

16         job.setJarByClass(SelfJoinMain.class); //任务的入口

17

18         //2、指定任务的map和map输出的数据类型

19         job.setMapperClass(SelfJoinMapper.class);

20         job.setMapOutputKeyClass(IntWritable.class);  //k2的数据类型

21         job.setMapOutputValueClass(Text.class);  //v2的类型

22

23         //3、指定任务的reduce和reduce的输出数据的类型

24         job.setReducerClass(SelfJoinReducer.class);

25         job.setOutputKeyClass(Text.class); //k4的类型

26         job.setOutputValueClass(Text.class); //v4的类型

27

28         //4、指定任务的输入路径、任务的输出路径

29         FileInputFormat.setInputPaths(job, new Path(args[0]));

30         FileOutputFormat.setOutputPath(job, new Path(args[1]));

31

32         //5、执行任务

33         job.waitForCompletion(true);

34

35     }

36 }

SelfJoinMapper.java

 1 import java.io.IOException;

 2

 3 import org.apache.hadoop.io.IntWritable;

 4 import org.apache.hadoop.io.LongWritable;

 5 import org.apache.hadoop.io.Text;

 6 import org.apache.hadoop.mapreduce.Mapper;

 7

 8 public class SelfJoinMapper extends Mapper<LongWritable, Text, IntWritable, Text> {

 9

10     @Override

11     protected void map(LongWritable key1, Text value1, Context context)

12             throws IOException, InterruptedException {

13         // 数据: 7566,JONES,MANAGER,7839,1981/4/2,2975,0,20

14         String data = value1.toString();

15

16         //分词操作

17         String[] words = data.split(",");

18

19         //输出数据

20         //1、作为老板表                                         员工号

21         context.write(new IntWritable(Integer.parseInt(words[0])), new Text("*"+words[1]));

22

23         //2、作为员工表                                         老板的员工号

24         context.write(new IntWritable(Integer.parseInt(words[3])), new Text(words[1]));

25         /*

26          * 注意一个问题：如果数据存在非法数据，一定处理一下（数据清洗）

27          * 如果产生例外，一定捕获

28          */

29     }

30 }

SelfJoinReducer.java

 1 import java.io.IOException;

 2

 3 import org.apache.hadoop.io.IntWritable;

 4 import org.apache.hadoop.io.Text;

 5 import org.apache.hadoop.mapreduce.Reducer;

 6

 7 public class SelfJoinReducer extends Reducer<IntWritable, Text, Text, Text> {

 8

 9     @Override

10     protected void reduce(IntWritable k3, Iterable<Text> v3, Context context)

11             throws IOException, InterruptedException {

12         //定义变量保存：老板的姓名、员工的姓名

13         String bossName = "";

14         String empNameList = "";

15

16         for(Text t:v3){

17             String str = t.toString();

18

19             //判断是否存在*号

20             int index = str.indexOf("*");

21             if(index >= 0 ){

22                 //老板的姓名

23                 bossName = str.substring(1);

24             }else{

25                 //员工的姓名

26                 empNameList = str + ";" + empNameList;

27             }

28         }

29

30         //输出：如果存在老板，也存在员工，才进行输出

31         if(bossName.length() > 0 && empNameList.length() > 0)

32             context.write(new Text(bossName), new Text(empNameList));

33     }

34 }

倒排索引

关系型数据库的索引

数据存储在HDFS后会建立索引，提高查找效率

MR实现倒排索引
- 记录一个单词在一个文件中出现的次数
- Combiner对同一文件中重复出现的单词进行求和
- Reducer对不同文件中出现的单词进行汇总
- 保证有无Combiner前后数据类型一样

RevertedIndexMain.java

 1 import java.io.IOException;

 2

 3 import org.apache.hadoop.conf.Configuration;

 4 import org.apache.hadoop.fs.Path;

 5 import org.apache.hadoop.io.IntWritable;

 6 import org.apache.hadoop.io.Text;

 7 import org.apache.hadoop.mapreduce.Job;

 8 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

 9 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

10

11

12 public class RevertedIndexMain {

13

14     public static void main(String[] args) throws Exception {

15         //1、创建一个任务

16         Job job = Job.getInstance(new Configuration());

17         job.setJarByClass(RevertedIndexMain.class); //任务的入口

18

19         //2、指定任务的map和map输出的数据类型

20         job.setMapperClass(RevertedIndexMapper.class);

21         job.setMapOutputKeyClass(Text.class);  //k2的数据类型

22         job.setMapOutputValueClass(Text.class);  //v2的类型

23

24         //指定任务的Combiner

25         job.setCombinerClass(RevertedIndexCombiner.class);

26

27         //3、指定任务的reduce和reduce的输出数据的类型

28         job.setReducerClass(RevertedIndexReducer.class);

29         job.setOutputKeyClass(Text.class); //k4的类型

30         job.setOutputValueClass(Text.class); //v4的类型

31

32         //4、指定任务的输入路径、任务的输出路径

33         FileInputFormat.setInputPaths(job, new Path(args[0]));

34         FileOutputFormat.setOutputPath(job, new Path(args[1]));

35

36         //5、执行任务

37         job.waitForCompletion(true);

38     }

39

40 }

RevertedIndexMapper.java

 1 import java.io.IOException;

 2

 3 import org.apache.hadoop.io.LongWritable;

 4 import org.apache.hadoop.io.Text;

 5 import org.apache.hadoop.mapreduce.Mapper;

 6 import org.apache.hadoop.mapreduce.lib.input.FileSplit;

 7

 8 public class RevertedIndexMapper extends Mapper<LongWritable, Text, Text, Text> {

 9

10     @Override

11     protected void map(LongWritable key1, Text value1, Context context)

12             throws IOException, InterruptedException {

13         //数据：/indexdata/data01.txt

14         //得到对应文件名

15         String path = ((FileSplit)context.getInputSplit()).getPath().toString();

16

17         //解析出文件名

18         //得到最后一个斜线的位置

19         int index = path.lastIndexOf("/");

20         String fileName = path.substring(index+1);

21

22         //数据：I love Beijing and love Shanghai

23         String data = value1.toString();

24         String[] words = data.split(" ");

25

26         //输出

27         for(String word:words){

28             context.write(new Text(word+":"+fileName), new Text("1"));

29         }

30     }

31 }

RevertedIndexCombiner.java

 1 import java.io.IOException;

 2

 3 import org.apache.hadoop.io.Text;

 4 import org.apache.hadoop.mapreduce.Reducer;

 5

 6 public class RevertedIndexCombiner extends Reducer<Text, Text, Text, Text> {

 7

 8     @Override

 9     protected void reduce(Text k21, Iterable<Text> v21, Context context)

10             throws IOException, InterruptedException {

11         // 求和：对同一个文件中的单词进行求和

12         int total = 0;

13         for(Text v:v21){

14             total = total + Integer.parseInt(v.toString());

15         }

16

17         //k21是：love:data01.txt

18         String data = k21.toString();

19         //找到：冒号的位置

20         int index = data.indexOf(":");

21

22         String word = data.substring(0, index);        //单词

23         String fileName = data.substring(index + 1);   //文件名

24

25         //输出：

26         context.write(new Text(word), new Text(fileName+":"+total));

27     }

28 }

RevertedIndexReducer.java

 1 import java.io.IOException;

 2

 3 import org.apache.hadoop.io.Text;

 4 import org.apache.hadoop.mapreduce.Reducer;

 5

 6 public class RevertedIndexReducer extends Reducer<Text, Text, Text, Text> {

 7

 8     @Override

 9     protected void reduce(Text k3, Iterable<Text> v3, Context context)

10             throws IOException, InterruptedException {

11         String str = "";

12

13         for(Text t:v3){

14             str = "("+t.toString()+")"+str;

15         }

16

17         context.write(k3, new Text(str));

18     }

19

20 }

Hadoop自带例题

start-all.sh
/root/training/hadoop-2.7.3/share/hadoop/mapreduce
hadoop jar hadoop-mapreduce-examples-2.7.3.jar
wordcount: A map/reduce program that counts the words in the input files.
hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount
Usage: wordcount <in> [<in>...] <out>
hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount /input/data.txt /output/0402/wc

参考

https://dpb-bobokaoya-sm.blog.csdn.net/article/details/88984195

[DB] MapReduce 例题的更多相关文章

[DB] MapReduce
概述大数据计算的核心思想:移动计算比移动数据更划算 MapReduce既是一个编程模型,又是一个计算框架包含Map和Reduce两个过程终极目标:用SQL语句分析大数据(Hive.SparkSQ ...
Hadoop 中利用 mapreduce 读写 mysql 数据
Hadoop 中利用 mapreduce 读写 mysql 数据有时候我们在项目中会遇到输入结果集很大,但是输出结果很小,比如一些 pv.uv 数据,然后为了实时查询的需求,或者一些 OLAP ...
MapReduce
2016-12-21 16:53:49 mapred-default.xml mapreduce.input.fileinputformat.split.minsize 0 The minimum ...
在MongoDB的MapReduce上踩过的坑
太久没动这里,目前人生处于一个新的开始.这次博客的内容很久前就想更新上来,但是一直没找到合适的时间点(哈哈,其实就是懒),主要内容集中在使用Mongodb时的一些隐蔽的MapReduce问题: 1.R ...
MongoDB进行MapReduce的数据类型
有很长一段时间没更新博客了,因为最近都比较忙,今天算是有点空闲吧.本文主要是介绍MapReduce在MongoDB上的使用,它与sql的分组.聚集类似,也是先map分组,再用reduce统计,最后还可 ...
MongoDB聚合运算之mapReduce函数的使用（11）
mapReduce 随着"大数据"概念而流行. 其实mapReduce的概念非常简单, 从功能上说,相当于RDBMS的 group 操作 mapReduce的真正强项在哪? 答:在 ...
mongo DB的一般操作
最近接触了一些mongoDB .将一些指令操作记录下来,便于查询和使用登录 [root@logs ~]# mongo -u loguser -p log123456 --authentication ...
Sqoop:Could not load db driver class: com.microsoft.sqlserver.jdbc.SQLServerDriver
Sqoop version:1.4.6-cdh Hadoop version:2.6.0-cdh5.8.2 场景:使用Sqoop从MSSqlserver导数据虽然1.4.6的官网说 Even if ...
mapreduce导出MSSQL的数据到HDFS
今天想通过一些数据,来测试一下我的<基于信息熵的无字典分词算法>这篇文章的正确性.就写了一下MapReduce程序从MSSQL SERVER2008数据库里取数据分析.程序发布到hadoo ...

随机推荐

力扣 - 剑指 Offer 09. 用两个栈实现队列
目录题目思路代码复杂度分析题目剑指 Offer 09. 用两个栈实现队列思路刚开始想的是用stack1作为数据存储的地方,stack2用来作为辅助栈,如果添加元素直接push入stac ...
自己挖的坑自己填--jxl进行Excel下载堆内存溢出问题
今天在进行使用 jxl 进行 Excel 下载时,由于数据量大(4万多条接近5万条数据的下载),数据结构过于负责,存在大量大对象(虽然在对象每次用完都设置为null,但还是存在内存溢出问题),加上本地 ...
.Net Core 路由处理
用户请求接口路由,应用返回处理结果.应用中如何匹配请求的数据呢?为何能如此精确的找到对应的处理方法?今天就谈谈这个路由.路由负责匹配传入的HTTP请求,将这些请求发送到可以执行的终结点.终结点在应用中 ...
Python基础（二十）：面向对象“类”第三课——类成员
知识点: 类属性与实例属性: 类方法与实例方法: 静态方法: 类属性与实例属性类属性与实例属性的区别属性的绑定不同类属性与当前类相关(绑定的是当前类),与当前类创建的任何对象无关: 实例属性与当 ...
BUAA_OO_第二单元
BUAA_OO_2020_UNIT2 一.程序结构分析第五次作业 UML & Mertrics 电梯的调度问题,实质上就是任务的请求与分配问题,笔者在第五次作业中采用简单的"生 ...
JVM（一）内存结构
今日开篇什么是JVM 定义 Java Virtual Machine,JAVA程序的运行环境(JAVA二进制字节码的运行环境) 好处一次编写,到处运行自动内存管理,垃圾回收机制数组下标越界检查 ...
《剑指offer》刷题笔记
简介此笔记为我在 leetcode 上的<剑指offer>专题刷题时的笔记整理. 在刷题时我尝试了 leetcode 上热门题解中的多种方法,这些不同方法的实现都列在了笔记中. leet ...
Java获取多线程执行结果方式的归纳与总结
在日常的项目开发中,我们会经常遇到通过多线程执行程序并需要返回执行结果的场景,下面我们就对获取多线程返回结果的几种方式进行一下归纳,并进行简要的分析与总结. 一.Thread.join 在一些简单的应 ...
Python 使用oslo.vmware管理ESXI虚拟机
oslo.vmware是OpenStack通用框架中的一部分,主要用于实现对虚拟机的管理任务,借助oslo.vmware模块我们可以管理Vmware ESXI集群环境. 读取所有节点主机 from o ...
POJ 3301 三分（最小覆盖正方形）
题意: 给你n个点,让你找一个最小的正方形去覆盖所有点.思路: 想一下,如果题目中规定正方形必须和x轴平行,那么我们是不是直接找到最大的x差和最大的y差取最大就行了,但是这个题目 ...

[DB] MapReduce 例题

[DB] MapReduce 例题的更多相关文章

随机推荐

热门专题