Hadoop学习之路(二十六)MapReduce的API使用(三)
影评案例
数据及需求
数据格式
movies.dat 3884条数据
1::Toy Story (1995)::Animation|Children's|Comedy
2::Jumanji (1995)::Adventure|Children's|Fantasy
3::Grumpier Old Men (1995)::Comedy|Romance
4::Waiting to Exhale (1995)::Comedy|Drama
5::Father of the Bride Part II (1995)::Comedy
6::Heat (1995)::Action|Crime|Thriller
7::Sabrina (1995)::Comedy|Romance
8::Tom and Huck (1995)::Adventure|Children's
9::Sudden Death (1995)::Action
10::GoldenEye (1995)::Action|Adventure|Thriller
users.dat 6041条数据
1::F::1::10::48067
2::M::56::16::70072
3::M::25::15::55117
4::M::45::7::02460
5::M::25::20::55455
6::F::50::9::55117
7::M::35::1::06810
8::M::25::12::11413
9::M::25::17::61614
10::F::35::1::95370
ratings.dat 1000210条数据
1::1193::5::978300760
1::661::3::978302109
1::914::3::978301968
1::3408::4::978300275
1::2355::5::978824291
1::1197::3::978302268
1::1287::5::978302039
1::2804::5::978300719
1::594::4::978302268
1::919::4::978301368
数据解释
1、users.dat 数据格式为: 2::M::56::16::70072
对应字段为:UserID BigInt, Gender String, Age Int, Occupation String, Zipcode String
对应字段中文解释:用户id,性别,年龄,职业,邮政编码
2、movies.dat 数据格式为: 2::Jumanji (1995)::Adventure|Children's|Fantasy
对应字段为:MovieID BigInt, Title String, Genres String
对应字段中文解释:电影ID,电影名字,电影类型
3、ratings.dat 数据格式为: 1::1193::5::978300760
对应字段为:UserID BigInt, MovieID BigInt, Rating Double, Timestamped String
对应字段中文解释:用户ID,电影ID,评分,评分时间戳
用户ID,电影ID,评分,评分时间戳,性别,年龄,职业,邮政编码,电影名字,电影类型
userid, movieId, rate, ts, gender, age, occupation, zipcode, movieName, movieType
需求统计
(1)求被评分次数最多的10部电影,并给出评分次数(电影名,评分次数)
(2)分别求男性,女性当中评分最高的10部电影(性别,电影名,评分)
(3)求movieid = 2116这部电影各年龄段(因为年龄就只有7个,就按这个7个分就好了)的平均影评(年龄段,评分)
(4)求最喜欢看电影(影评次数最多)的那位女性评最高分的10部电影的平均影评分(人,电影名,影评)
(5)求好片(评分>=4.0)最多的那个年份的最好看的10部电影
(6)求1997年上映的电影中,评分最高的10部Comedy类电影
(7)该影评库中各种类型电影中评价最高的5部电影(类型,电影名,平均影评分)
(8)各年评分最高的电影类型(年份,类型,影评分)
(9)每个地区最高评分的电影名,把结果存入HDFS(地区,电影名,电影评分)
代码实现
1、求被评分次数最多的10部电影,并给出评分次数(电影名,评分次数)
分析:此问题涉及到2个文件,ratings.dat和movies.dat,2个文件数据量倾斜比较严重,此处应该使用mapjoin方法,先将数据量较小的文件预先加载到内存中
MovieMR1_1.java
public class MovieMR1_1 {
public static void main(String[] args) throws Exception {
if(args.length < 4) {
args = new String[4];
args[0] = "/movie/input/";
args[1] = "/movie/output/";
args[2] = "/movie/cache/movies.dat";
args[3] = "/movie/output_last/";
}
Configuration conf1 = new Configuration();
conf1.set("fs.defaultFS", "hdfs://hadoop1:9000/");
System.setProperty("HADOOP_USER_NAME", "hadoop");
FileSystem fs1 = FileSystem.get(conf1);
Job job1 = Job.getInstance(conf1);
job1.setJarByClass(MovieMR1_1.class);
job1.setMapperClass(MoviesMapJoinRatingsMapper1.class);
job1.setReducerClass(MovieMR1Reducer1.class);
job1.setMapOutputKeyClass(Text.class);
job1.setMapOutputValueClass(IntWritable.class);
job1.setOutputKeyClass(Text.class);
job1.setOutputValueClass(IntWritable.class);
//缓存普通文件到task运行节点的工作目录
URI uri = new URI("hdfs://hadoop1:9000"+args[2]);
System.out.println(uri);
job1.addCacheFile(uri);
Path inputPath1 = new Path(args[0]);
Path outputPath1 = new Path(args[1]);
if(fs1.exists(outputPath1)) {
fs1.delete(outputPath1, true);
}
FileInputFormat.setInputPaths(job1, inputPath1);
FileOutputFormat.setOutputPath(job1, outputPath1);
boolean isDone = job1.waitForCompletion(true);
System.exit(isDone ? 0 : 1);
}
public static class MoviesMapJoinRatingsMapper1 extends Mapper<LongWritable, Text, Text, IntWritable>{
//用了存放加载到内存中的movies.dat数据
private static Map<String,String> movieMap = new HashMap<>();
//key:电影ID
Text outKey = new Text();
//value:电影名+电影类型
IntWritable outValue = new IntWritable();
/**
* movies.dat: 1::Toy Story (1995)::Animation|Children's|Comedy
*
*
* 将小表(movies.dat)中的数据预先加载到内存中去
* */
@Override
protected void setup(Context context) throws IOException, InterruptedException {
Path[] localCacheFiles = context.getLocalCacheFiles();
String strPath = localCacheFiles[0].toUri().toString();
BufferedReader br = new BufferedReader(new FileReader(strPath));
String readLine;
while((readLine = br.readLine()) != null) {
String[] split = readLine.split("::");
String movieId = split[0];
String movieName = split[1];
String movieType = split[2];
movieMap.put(movieId, movieName+"\t"+movieType);
}
br.close();
}
/**
* movies.dat: 1 :: Toy Story (1995) :: Animation|Children's|Comedy
* 电影ID 电影名字 电影类型
*
* ratings.dat: 1 :: 1193 :: 5 :: 978300760
* 用户ID 电影ID 评分 评分时间戳
*
* value: ratings.dat读取的数据
* */
@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String[] split = value.toString().split("::");
String userId = split[0];
String movieId = split[1];
String movieRate = split[2];
//根据movieId从内存中获取电影名和类型
String movieNameAndType = movieMap.get(movieId);
String movieName = movieNameAndType.split("\t")[0];
String movieType = movieNameAndType.split("\t")[1];
outKey.set(movieName);
outValue.set(Integer.parseInt(movieRate));
context.write(outKey, outValue);
}
}
public static class MovieMR1Reducer1 extends Reducer<Text, IntWritable, Text, IntWritable>{
//每部电影评论的次数
int count;
//评分次数
IntWritable outValue = new IntWritable();
@Override
protected void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {
count = 0;
for(IntWritable value : values) {
count++;
}
outValue.set(count);
context.write(key, outValue);
}
}
}
MovieMR1_2.java
public class MovieMR1_2 {
public static void main(String[] args) throws Exception {
if(args.length < 2) {
args = new String[2];
args[0] = "/movie/output/";
args[1] = "/movie/output_last/";
}
Configuration conf1 = new Configuration();
conf1.set("fs.defaultFS", "hdfs://hadoop1:9000/");
System.setProperty("HADOOP_USER_NAME", "hadoop");
FileSystem fs1 = FileSystem.get(conf1);
Job job = Job.getInstance(conf1);
job.setJarByClass(MovieMR1_2.class);
job.setMapperClass(MoviesMapJoinRatingsMapper2.class);
job.setReducerClass(MovieMR1Reducer2.class);
job.setMapOutputKeyClass(MovieRating.class);
job.setMapOutputValueClass(NullWritable.class);
job.setOutputKeyClass(MovieRating.class);
job.setOutputValueClass(NullWritable.class);
Path inputPath1 = new Path(args[0]);
Path outputPath1 = new Path(args[1]);
if(fs1.exists(outputPath1)) {
fs1.delete(outputPath1, true);
}
//对第一步的输出结果进行降序排序
FileInputFormat.setInputPaths(job, inputPath1);
FileOutputFormat.setOutputPath(job, outputPath1);
boolean isDone = job.waitForCompletion(true);
System.exit(isDone ? 0 : 1);
}
//注意输出类型为自定义对象MovieRating,MovieRating按照降序排序
public static class MoviesMapJoinRatingsMapper2 extends Mapper<LongWritable, Text, MovieRating, NullWritable>{
MovieRating outKey = new MovieRating();
@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
//'Night Mother (1986) 70
String[] split = value.toString().split("\t");
outKey.setCount(Integer.parseInt(split[1]));;
outKey.setMovieName(split[0]);
context.write(outKey, NullWritable.get());
}
}
//排序之后自然输出,只取前10部电影
public static class MovieMR1Reducer2 extends Reducer<MovieRating, NullWritable, MovieRating, NullWritable>{
Text outKey = new Text();
int count = 0;
@Override
protected void reduce(MovieRating key, Iterable<NullWritable> values,Context context) throws IOException, InterruptedException {
for(NullWritable value : values) {
count++;
if(count > 10) {
return;
}
context.write(key, value);
}
}
}
}
MovieRating.java
public class MovieRating implements WritableComparable<MovieRating>{
private String movieName;
private int count;
public String getMovieName() {
return movieName;
}
public void setMovieName(String movieName) {
this.movieName = movieName;
}
public int getCount() {
return count;
}
public void setCount(int count) {
this.count = count;
}
public MovieRating() {}
public MovieRating(String movieName, int count) {
super();
this.movieName = movieName;
this.count = count;
}
@Override
public String toString() {
return movieName + "\t" + count;
}
@Override
public void readFields(DataInput in) throws IOException {
movieName = in.readUTF();
count = in.readInt();
}
@Override
public void write(DataOutput out) throws IOException {
out.writeUTF(movieName);
out.writeInt(count);
}
@Override
public int compareTo(MovieRating o) {
return o.count - this.count ;
}
}
2、分别求男性,女性当中评分最高的10部电影(性别,电影名,评分)
分析:此问题涉及到3个表的联合查询,需要先将2个小表的数据预先加载到内存中,再进行查询
对三表进行联合
MoviesThreeTableJoin.java
/**
* 进行3表的联合查询
*
* */
public class MoviesThreeTableJoin { public static void main(String[] args) throws Exception { if(args.length < 4) {
args = new String[4];
args[0] = "/movie/input/";
args[1] = "/movie/output2/";
args[2] = "/movie/cache/movies.dat";
args[3] = "/movie/cache/users.dat";
} Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://hadoop1:9000/");
System.setProperty("HADOOP_USER_NAME", "hadoop");
FileSystem fs = FileSystem.get(conf);
Job job = Job.getInstance(conf); job.setJarByClass(MoviesThreeTableJoin.class);
job.setMapperClass(ThreeTableMapper.class); job.setOutputKeyClass(Text.class);
job.setOutputValueClass(NullWritable.class); URI uriUsers = new URI("hdfs://hadoop1:9000"+args[3]);
URI uriMovies = new URI("hdfs://hadoop1:9000"+args[2]);
job.addCacheFile(uriUsers);
job.addCacheFile(uriMovies); Path inputPath = new Path(args[0]);
Path outputPath = new Path(args[1]); if(fs.exists(outputPath)) {
fs.delete(outputPath,true);
} FileInputFormat.setInputPaths(job, inputPath);
FileOutputFormat.setOutputPath(job, outputPath); boolean isDone = job.waitForCompletion(true);
System.exit(isDone ? 0 : 1); } public static class ThreeTableMapper extends Mapper<LongWritable, Text, Text, NullWritable>{ //用于缓存movies和users中数据
private Map<String,String> moviesMap = new HashMap<>();
private Map<String,String> usersMap = new HashMap<>();
//用来存放读取的ratings.dat中的一行数据
String[] ratings; Text outKey = new Text(); @Override
protected void setup(Context context) throws IOException, InterruptedException { BufferedReader br = null; Path[] paths = context.getLocalCacheFiles();
String usersLine = null;
String moviesLine = null; for(Path path : paths) {
String name = path.toUri().getPath();
if(name.contains("movies.dat")) {
//读取movies.dat文件中的一行数据
br = new BufferedReader(new FileReader(name));
while((moviesLine = br.readLine()) != null) {
/**对读取的这行数据按照::进行切分
* 2::Jumanji (1995)::Adventure|Children's|Fantasy
* 电影ID,电影名字,电影类型
*
*电影ID作为key,其余作为value
*/
String[] split = moviesLine.split("::");
moviesMap.put(split[0], split[1]+"::"+split[2]);
}
}else if(name.contains("users.dat")) {
//读取users.dat文件中的一行数据
br = new BufferedReader(new FileReader(name));
while((usersLine = br.readLine()) != null) {
/**
* 对读取的这行数据按照::进行切分
* 2::M::56::16::70072
* 用户id,性别,年龄,职业,邮政编码
*
* 用户ID作为key,其他的作为value
* */
String[] split = usersLine.split("::");
System.out.println(split[0]+"----"+split[1]);
usersMap.put(split[0], split[1]+"::"+split[2]+"::"+split[3]+"::"+split[4]);
}
} } } @Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException { ratings = value.toString().split("::");
//通过电影ID和用户ID获取用户表和电影表中的其他信息
String movies = moviesMap.get(ratings[1]);
String users = usersMap.get(ratings[0]); //三表信息的联合
String threeTables = value.toString()+"::"+movies+"::"+users;
outKey.set(threeTables); context.write(outKey, NullWritable.get());
}
} }
三表联合之后的数据为
::::::::Winnie the Pooh and the Blustery Day ()::Animation|Children's::F::25::6::90027
::::::::Dumbo ()::Animation|Children's|Musical::F::25::6::90027
::::::::Die Hard ()::Action|Thriller::F::::::
::::::::Streetcar Named Desire, A ()::Drama::F::::::
::::::::Braveheart ()::Action|Drama|War::F::::::
::::::::Star Wars: Episode V - The Empire Strikes Back ()::Action|Adventure|Drama|Sci-Fi|War::F::::::
::::::::Raiders of the Lost Ark ()::Action|Adventure::F::::::
::::::::Aliens ()::Action|Sci-Fi|Thriller|War::F::::::
::::::::Good, The Bad and The Ugly, The ()::Action|Western::F::::::
::::::::Star Wars: Episode VI - Return of the Jedi ()::Action|Adventure|Romance|Sci-Fi|War::F::::::
字段解释
1000 :: 1036 :: 4 :: 975040964 :: Die Hard (1988) :: Action|Thriller :: F :: 25 :: 6 :: 90027 用户ID 电影ID 评分 评分时间戳 电影名字 电影类型 性别 年龄 职业 邮政编码 0 1 2 3 4 5 6 7 8 9
要分别求男性,女性当中评分最高的10部电影(性别,电影名,评分)
1、以性别和电影名分组,以电影名+性别为key,以评分为value进行计算;
2、以性别+电影名+评分作为对象,以性别分组,以评分降序进行输出TOP10
业务逻辑:MoviesDemo2.java
public class MoviesDemo2 {
public static void main(String[] args) throws Exception {
Configuration conf1 = new Configuration();
Configuration conf2 = new Configuration();
FileSystem fs1 = FileSystem.get(conf1);
FileSystem fs2 = FileSystem.get(conf2);
Job job1 = Job.getInstance(conf1);
Job job2 = Job.getInstance(conf2);
job1.setJarByClass(MoviesDemo2.class);
job1.setMapperClass(MoviesDemo2Mapper1.class);
job2.setMapperClass(MoviesDemo2Mapper2.class);
job1.setReducerClass(MoviesDemo2Reducer1.class);
job2.setReducerClass(MoviesDemo2Reducer2.class);
job1.setOutputKeyClass(Text.class);
job1.setOutputValueClass(DoubleWritable.class);
job2.setOutputKeyClass(MoviesSexBean.class);
job2.setOutputValueClass(NullWritable.class);
job2.setGroupingComparatorClass(MoviesSexGC.class);
Path inputPath1 = new Path("D:\\MR\\hw\\movie\\output3he1");
Path outputPath1 = new Path("D:\\MR\\hw\\movie\\output2_1");
Path inputPath2 = new Path("D:\\MR\\hw\\movie\\output2_1");
Path outputPath2 = new Path("D:\\MR\\hw\\movie\\output2_end");
if(fs1.exists(outputPath1)) {
fs1.delete(outputPath1,true);
}
if(fs2.exists(outputPath2)) {
fs2.delete(outputPath2,true);
}
FileInputFormat.setInputPaths(job1, inputPath1);
FileOutputFormat.setOutputPath(job1, outputPath1);
FileInputFormat.setInputPaths(job2, inputPath2);
FileOutputFormat.setOutputPath(job2, outputPath2);
JobControl control = new JobControl("MoviesDemo2");
ControlledJob aJob = new ControlledJob(job1.getConfiguration());
ControlledJob bJob = new ControlledJob(job2.getConfiguration());
bJob.addDependingJob(aJob);
control.addJob(aJob);
control.addJob(bJob);
Thread thread = new Thread(control);
thread.start();
while(!control.allFinished()) {
thread.sleep(1000);
}
System.exit(0);
}
/**
* 数据来源:3个文件关联之后的输出文件
* 以电影名+性别为key,以评分为value进行输出
*
* 1000::1036::4::975040964::Die Hard (1988)::Action|Thriller::F::25::6::90027
*
* 用户ID::电影ID::评分::评分时间戳::电影名字::电影类型::性别::年龄::职业::邮政编码
*
* */
public static class MoviesDemo2Mapper1 extends Mapper<LongWritable, Text, Text, DoubleWritable>{
Text outKey = new Text();
DoubleWritable outValue = new DoubleWritable();
@Override
protected void map(LongWritable key, Text value,Context context)
throws IOException, InterruptedException {
String[] split = value.toString().split("::");
String strKey = split[4]+"\t"+split[6];
String strValue = split[2];
outKey.set(strKey);
outValue.set(Double.parseDouble(strValue));
context.write(outKey, outValue);
}
}
/**
* 以电影名+性别为key,计算平均分
* */
public static class MoviesDemo2Reducer1 extends Reducer<Text, DoubleWritable, Text, DoubleWritable>{
DoubleWritable outValue = new DoubleWritable();
@Override
protected void reduce(Text key, Iterable<DoubleWritable> values,Context context)
throws IOException, InterruptedException {
int count = 0;
double sum = 0;
for(DoubleWritable value : values) {
count++;
sum += Double.parseDouble(value.toString());
}
double avg = sum / count;
outValue.set(avg);
context.write(key, outValue);
}
}
/**
* 以电影名+性别+评分作为对象,以性别分组,以评分降序排序
* */
public static class MoviesDemo2Mapper2 extends Mapper<LongWritable, Text, MoviesSexBean, NullWritable>{
MoviesSexBean outKey = new MoviesSexBean();
@Override
protected void map(LongWritable key, Text value,Context context)
throws IOException, InterruptedException {
String[] split = value.toString().split("\t");
outKey.setMovieName(split[0]);
outKey.setSex(split[1]);
outKey.setScore(Double.parseDouble(split[2]));
context.write(outKey, NullWritable.get());
}
}
/**
* 取性别男女各前10名评分最好的电影
* */
public static class MoviesDemo2Reducer2 extends Reducer<MoviesSexBean, NullWritable, MoviesSexBean, NullWritable>{
@Override
protected void reduce(MoviesSexBean key, Iterable<NullWritable> values,Context context)
throws IOException, InterruptedException {
int count = 0;
for(NullWritable nvl : values) {
count++;
context.write(key, NullWritable.get());
if(count == 10) {
return;
}
}
}
}
}
对象:MoviesSexBean.java
public class MoviesSexBean implements WritableComparable<MoviesSexBean>{
private String movieName;
private String sex;
private double score;
public MoviesSexBean() {
super();
}
public MoviesSexBean(String movieName, String sex, double score) {
super();
this.movieName = movieName;
this.sex = sex;
this.score = score;
}
public String getMovieName() {
return movieName;
}
public void setMovieName(String movieName) {
this.movieName = movieName;
}
public String getSex() {
return sex;
}
public void setSex(String sex) {
this.sex = sex;
}
public double getScore() {
return score;
}
public void setScore(double score) {
this.score = score;
}
@Override
public String toString() {
return movieName + "\t" + sex + "\t" + score ;
}
@Override
public void readFields(DataInput in) throws IOException {
movieName = in.readUTF();
sex = in.readUTF();
score = in.readDouble();
}
@Override
public void write(DataOutput out) throws IOException {
out.writeUTF(movieName);
out.writeUTF(sex);
out.writeDouble(score);
}
@Override
public int compareTo(MoviesSexBean o) {
int result = this.getSex().compareTo(o.getSex());
if(result == 0) {
double diff = this.getScore() - o.getScore();
if(diff == 0) {
return 0;
}else {
return diff > 0 ? -1 : 1;
}
}else {
return result > 0 ? -1 : 1;
}
}
}
分组:MoviesSexGC.java
public class MoviesSexGC extends WritableComparator{
public MoviesSexGC() {
super(MoviesSexBean.class,true);
}
@Override
public int compare(WritableComparable a, WritableComparable b) {
MoviesSexBean msb1 = (MoviesSexBean)a;
MoviesSexBean msb2 = (MoviesSexBean)b;
return msb1.getSex().compareTo(msb2.getSex());
}
}
3、求movieid = 2116这部电影各年龄段(因为年龄就只有7个,就按这个7个分就好了)的平均影评(年龄段,评分)
以第二部三表联合之后的文件进行操作
public class MovieDemo3 {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Job job = Job.getInstance(conf);
job.setJarByClass(MovieDemo3.class);
job.setMapperClass(MovieDemo3Mapper.class);
job.setReducerClass(MovieDemo3Reducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(DoubleWritable.class);
Path inputPath = new Path("D:\\MR\\hw\\movie\\3he1");
Path outputPath = new Path("D:\\MR\\hw\\movie\\outpu3");
if(fs.exists(outputPath)) {
fs.delete(outputPath,true);
}
FileInputFormat.setInputPaths(job, inputPath);
FileOutputFormat.setOutputPath(job, outputPath);
boolean isDone = job.waitForCompletion(true);
System.exit(isDone ? 0 : 1);
}
/**
* 1000::1036::4::975040964::Die Hard (1988)::Action|Thriller::F::25::6::90027
*
* 用户ID::电影ID::评分::评分时间戳::电影名字::电影类型::性别::年龄::职业::邮政编码
* 0 1 2 3 4 5 6 7 8 9
*
* key:电影ID+电影名字+年龄段
* value:评分
* 求movieid = 2116这部电影各年龄段
* */
public static class MovieDemo3Mapper extends Mapper<LongWritable, Text, Text, DoubleWritable>{
Text outKey = new Text();
DoubleWritable outValue = new DoubleWritable();
@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String[] split = value.toString().split("::");
int movieID = Integer.parseInt(split[1]);
if(movieID == 2116) {
String strKey = split[1]+"\t"+split[4]+"\t"+split[7];
String strValue = split[2];
outKey.set(strKey);
outValue.set(Double.parseDouble(strValue));
context.write(outKey, outValue);
}
}
}
/**
* 对map的输出结果求平均评分
* */
public static class MovieDemo3Reducer extends Reducer<Text, DoubleWritable, Text, DoubleWritable>{
DoubleWritable outValue = new DoubleWritable();
@Override
protected void reduce(Text key, Iterable<DoubleWritable> values, Context context)
throws IOException, InterruptedException {
int count = 0;
double sum = 0;
for(DoubleWritable value : values) {
count++;
sum += Double.parseDouble(value.toString());
}
double avg = sum / count;
outValue.set(avg);
context.write(key, outValue);
}
}
}
4、求最喜欢看电影(影评次数最多)的那位女性评最高分的10部电影的平均影评分(人,电影名,影评)
1000 :: 1036 :: 4 :: 975040964 :: Die Hard (1988) :: Action|Thriller :: F :: 25 :: 6 :: 90027 用户ID 电影ID 评分 评分时间戳 电影名字 电影类型 性别 年龄 职业 邮政编码 0 1 2 3 4 5 6 7 8 9
(1)求出评论次数最多的女性ID
MoviesDemo4_1.java
public class MoviesDemo4 {
public static void main(String[] args) throws Exception {
Configuration conf1 = new Configuration();
FileSystem fs1 = FileSystem.get(conf1);
Job job1 = Job.getInstance(conf1);
job1.setJarByClass(MoviesDemo4.class);
job1.setMapperClass(MoviesDemo4Mapper1.class);
job1.setReducerClass(MoviesDemo4Reducer1.class);
job1.setMapOutputKeyClass(Text.class);
job1.setMapOutputValueClass(Text.class);
job1.setOutputKeyClass(Text.class);
job1.setOutputValueClass(DoubleWritable.class);
Configuration conf2 = new Configuration();
FileSystem fs2 = FileSystem.get(conf2);
Job job2 = Job.getInstance(conf2);
job2.setJarByClass(MoviesDemo4.class);
job2.setMapperClass(MoviesDemo4Mapper2.class);
job2.setReducerClass(MoviesDemo4Reducer2.class);
job2.setMapOutputKeyClass(Moviegoers.class);
job2.setMapOutputValueClass(NullWritable.class);
job2.setOutputKeyClass(Moviegoers.class);
job2.setOutputValueClass(NullWritable.class);
Path inputPath1 = new Path("D:\\MR\\hw\\movie\\3he1");
Path outputPath1 = new Path("D:\\MR\\hw\\movie\\outpu4_1");
if(fs1.exists(outputPath1)) {
fs1.delete(outputPath1,true);
}
FileInputFormat.setInputPaths(job1, inputPath1);
FileOutputFormat.setOutputPath(job1, outputPath1);
Path inputPath2 = new Path("D:\\MR\\hw\\movie\\outpu4_1");
Path outputPath2 = new Path("D:\\MR\\hw\\movie\\outpu4_2");
if(fs2.exists(outputPath2)) {
fs2.delete(outputPath2,true);
}
FileInputFormat.setInputPaths(job2, inputPath2);
FileOutputFormat.setOutputPath(job2, outputPath2);
JobControl control = new JobControl("MoviesDemo4");
ControlledJob ajob = new ControlledJob(job1.getConfiguration());
ControlledJob bjob = new ControlledJob(job2.getConfiguration());
bjob.addDependingJob(ajob);
control.addJob(ajob);
control.addJob(bjob);
Thread thread = new Thread(control);
thread.start();
while(!control.allFinished()) {
thread.sleep(1000);
}
System.exit(0);
}
/**
* 1000::1036::4::975040964::Die Hard (1988)::Action|Thriller::F::25::6::90027
*
* 用户ID::电影ID::评分::评分时间戳::电影名字::电影类型::性别::年龄::职业::邮政编码
* 0 1 2 3 4 5 6 7 8 9
*
* 1、key:用户ID
* 2、value:电影名+评分
*
* */
public static class MoviesDemo4Mapper1 extends Mapper<LongWritable, Text, Text, Text>{
Text outKey = new Text();
Text outValue = new Text();
@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String[] split = value.toString().split("::");
String strKey = split[0];
String strValue = split[4]+"\t"+split[2];
if(split[6].equals("F")) {
outKey.set(strKey);
outValue.set(strValue);
context.write(outKey, outValue);
}
}
}
//统计每位女性的评论总数
public static class MoviesDemo4Reducer1 extends Reducer<Text, Text, Text, IntWritable>{
IntWritable outValue = new IntWritable();
@Override
protected void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
int count = 0;
for(Text value : values) {
count++;
}
outValue.set(count);
context.write(key, outValue);
}
}
//对第一次MapReduce的输出结果进行降序排序
public static class MoviesDemo4Mapper2 extends Mapper<LongWritable, Text,Moviegoers,NullWritable>{
Moviegoers outKey = new Moviegoers();
@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String[] split = value.toString().split("\t");
outKey.setName(split[0]);
outKey.setCount(Integer.parseInt(split[1]));
context.write(outKey, NullWritable.get());
}
}
//排序之后取第一个值(评论最多的女性ID和评论次数)
public static class MoviesDemo4Reducer2 extends Reducer<Moviegoers,NullWritable, Moviegoers,NullWritable>{
int count = 0;
@Override
protected void reduce(Moviegoers key, Iterable<NullWritable> values,Context context)
throws IOException, InterruptedException {
for(NullWritable nvl : values) {
count++;
if(count > 1) {
return;
}
context.write(key, nvl);
}
}
}
}
(2)
Hadoop学习之路(二十六)MapReduce的API使用(三)的更多相关文章
- 嵌入式Linux驱动学习之路(二十六)DM9000C网卡驱动程序
基于DM9000C的原厂代码修改dm9000c的驱动程序. 首先确认内存的基地址 iobase. 确定中断号码. 打开模块的初始化函数定义. 配置内存控制器的相应时序(结合DM9000C.C的手册). ...
- Hadoop学习之路(十六)Hadoop命令hadoop fs -ls详解
http://blog.csdn.net/strongyoung88/article/details/68952248
- FastAPI 学习之路(十六)Form表单
系列文章: FastAPI 学习之路(一)fastapi--高性能web开发框架 FastAPI 学习之路(二) FastAPI 学习之路(三) FastAPI 学习之路(四) FastAPI 学习之 ...
- 【Java学习笔记之二十六】深入理解Java匿名内部类
在[Java学习笔记之二十五]初步认知Java内部类中对匿名内部类做了一个简单的介绍,但是内部类还存在很多其他细节问题,所以就衍生出这篇博客.在这篇博客中你可以了解到匿名内部类的使用.匿名内部类要注意 ...
- Linux学习之CentOS(二十六)--Linux磁盘管理:LVM逻辑卷的创建及使用
在上一篇随笔里面 Linux学习之CentOS(二十五)--Linux磁盘管理:LVM逻辑卷基本概念及LVM的工作原理,详细的讲解了Linux的动态磁盘管理LVM逻辑卷的基本概念以及LVM的工作原理, ...
- Hadoop学习之路(十三)MapReduce的初识
MapReduce是什么 首先让我们来重温一下 hadoop 的四大组件: HDFS:分布式存储系统 MapReduce:分布式计算系统 YARN:hadoop 的资源调度系统 Common:以上三大 ...
- Hadoop学习之路(十二)分布式集群中HDFS系统的各种角色
NameNode 学习目标 理解 namenode 的工作机制尤其是元数据管理机制,以增强对 HDFS 工作原理的 理解,及培养 hadoop 集群运营中“性能调优”.“namenode”故障问题的分 ...
- Hadoop学习之路(十五)MapReduce的多Job串联和全局计数器
MapReduce 多 Job 串联 需求 一个稍复杂点的处理逻辑往往需要多个 MapReduce 程序串联处理,多 job 的串联可以借助 MapReduce 框架的 JobControl 实现 实 ...
- Hadoop学习之路(十四)MapReduce的核心运行机制
概述 一个完整的 MapReduce 程序在分布式运行时有两类实例进程: 1.MRAppMaster:负责整个程序的过程调度及状态协调 2.Yarnchild:负责 map 阶段的整个数据处理流程 3 ...
- Hadoop学习之路(十九)MapReduce框架排序
流量统计项目案例 样本示例 需求 1. 统计每一个用户(手机号)所耗费的总上行流量.总下行流量,总流量 2. 得出上题结果的基础之上再加一个需求:将统计结果按照总流量倒序排序 3. 将流量汇总统计结果 ...
随机推荐
- C#制作手机网站
<meta name="viewport" content="width=device-width, initial-scale=1.0"> //在 ...
- [linux] C语言Linux系统编程-做成守护进程
守护进程: 必须是init进程的子进程,运行在后台,不与任何控制终端相关联. 通过以下步骤成为守护进程 1.调用fork()创建出来一个新的进程,这个新进程会是将来的守护进程 2.在新守护进程的父进程 ...
- ssh 连接慢问题
连接先看报错: There were 11 failed login attempts since the last successful login. 先前有上百上千失败login,被攻击了,把短时 ...
- ubuntu关机重启命令
重启命令 : 1.reboot 2.shutdown -r now 立刻重启 3.shutdown -r 10 过10分钟自动重启 4.shutdown -r 20:35 ...
- 自己写的一个nodejs查找文件模块-node-find-all-files
最近在折腾着用node-webkit搭建一个工具,其中要查找路径下的所有文件然后再进行压缩等操作,于是进写了这样的一个模块.代码如下: /* 输入目录找出目录下的所有文件,包括文件夹 */ /* 依赖 ...
- css3怎么分清伪类和伪元素
伪类用于向某些选择器添加特殊的效果. 伪元素用于将特殊的效果添加到某些选择器. 伪类有::first-child ,:link:,vistited,:hover,:active,:focus,:lan ...
- The formal parameters of the method
package basic.java; public class ParametersOfTheMethod { public static void main(String[] args) { in ...
- svn提示out of date的解决方法
步骤1. team–>update 步骤2. team–>Show Tree Conflict–>标记"冲突已解决" 步骤3. team–>commit
- 4.Bootstrap基础总结
一.Bootstrap 网格系统 二.Bootstrap 排版 三.Bootstrap 代码 四.Bootstrap 表格 五.Bootstrap 表单 六.Bootstrap 按钮 七.Bootst ...
- d3js shape深入理解
本文将视图了解d3js提供的帮助我们创建矢量图形的helper函数,比如下面的: http://d3indepth.com/shapes/ lines curves pie chart segment ...