影评案例

数据及需求

数据格式

movies.dat　　3884条数据

1::Toy Story (1995)::Animation|Children's|Comedy

2::Jumanji (1995)::Adventure|Children's|Fantasy

3::Grumpier Old Men (1995)::Comedy|Romance

4::Waiting to Exhale (1995)::Comedy|Drama

5::Father of the Bride Part II (1995)::Comedy

6::Heat (1995)::Action|Crime|Thriller

7::Sabrina (1995)::Comedy|Romance

8::Tom and Huck (1995)::Adventure|Children's

9::Sudden Death (1995)::Action

10::GoldenEye (1995)::Action|Adventure|Thriller

users.dat　　6041条数据

1::F::1::10::48067

2::M::56::16::70072

3::M::25::15::55117

4::M::45::7::02460

5::M::25::20::55455

6::F::50::9::55117

7::M::35::1::06810

8::M::25::12::11413

9::M::25::17::61614

10::F::35::1::95370

ratings.dat　　1000210条数据

1::1193::5::978300760

1::661::3::978302109

1::914::3::978301968

1::3408::4::978300275

1::2355::5::978824291

1::1197::3::978302268

1::1287::5::978302039

1::2804::5::978300719

1::594::4::978302268

1::919::4::978301368

数据解释

1、users.dat 数据格式为： 2::M::56::16::70072
对应字段为：UserID BigInt, Gender String, Age Int, Occupation String, Zipcode String
对应字段中文解释：用户id，性别，年龄，职业，邮政编码

2、movies.dat 数据格式为： 2::Jumanji (1995)::Adventure|Children's|Fantasy
对应字段为：MovieID BigInt, Title String, Genres String
对应字段中文解释：电影ID，电影名字，电影类型

3、ratings.dat 数据格式为： 1::1193::5::978300760
对应字段为：UserID BigInt, MovieID BigInt, Rating Double, Timestamped String
对应字段中文解释：用户ID，电影ID，评分，评分时间戳

用户ID，电影ID，评分，评分时间戳，性别，年龄，职业，邮政编码，电影名字，电影类型
userid, movieId, rate, ts, gender, age, occupation, zipcode, movieName, movieType

需求统计

（1）求被评分次数最多的10部电影，并给出评分次数（电影名，评分次数）
（2）分别求男性，女性当中评分最高的10部电影（性别，电影名，评分）
（3）求movieid = 2116这部电影各年龄段（因为年龄就只有7个，就按这个7个分就好了）的平均影评（年龄段，评分）
（4）求最喜欢看电影（影评次数最多）的那位女性评最高分的10部电影的平均影评分（人，电影名，影评）
（5）求好片（评分>=4.0）最多的那个年份的最好看的10部电影
（6）求1997年上映的电影中，评分最高的10部Comedy类电影
（7）该影评库中各种类型电影中评价最高的5部电影（类型，电影名，平均影评分）
（8）各年评分最高的电影类型（年份，类型，影评分）
（9）每个地区最高评分的电影名，把结果存入HDFS（地区，电影名，电影评分）

代码实现

1、求被评分次数最多的10部电影，并给出评分次数（电影名，评分次数）

分析：此问题涉及到2个文件，ratings.dat和movies.dat，2个文件数据量倾斜比较严重，此处应该使用mapjoin方法，先将数据量较小的文件预先加载到内存中

MovieMR1_1.java

 public class MovieMR1_1 {

     public static void main(String[] args) throws Exception {

         if(args.length < 4) {

             args = new String[4];

             args[0] = "/movie/input/";

             args[1] = "/movie/output/";

             args[2] = "/movie/cache/movies.dat";

             args[3] = "/movie/output_last/";

         }

         Configuration conf1 = new Configuration();

         conf1.set("fs.defaultFS", "hdfs://hadoop1:9000/");

         System.setProperty("HADOOP_USER_NAME", "hadoop");

         FileSystem fs1 = FileSystem.get(conf1);

         Job job1 = Job.getInstance(conf1);

         job1.setJarByClass(MovieMR1_1.class);

         job1.setMapperClass(MoviesMapJoinRatingsMapper1.class);

         job1.setReducerClass(MovieMR1Reducer1.class);

         job1.setMapOutputKeyClass(Text.class);

         job1.setMapOutputValueClass(IntWritable.class);

         job1.setOutputKeyClass(Text.class);

         job1.setOutputValueClass(IntWritable.class);

         //缓存普通文件到task运行节点的工作目录

         URI uri = new URI("hdfs://hadoop1:9000"+args[2]);

         System.out.println(uri);

         job1.addCacheFile(uri);

         Path inputPath1 = new Path(args[0]);

         Path outputPath1 = new Path(args[1]);

         if(fs1.exists(outputPath1)) {

             fs1.delete(outputPath1, true);

         }

         FileInputFormat.setInputPaths(job1, inputPath1);

         FileOutputFormat.setOutputPath(job1, outputPath1);

         boolean isDone = job1.waitForCompletion(true);

         System.exit(isDone ? 0 : 1);

     }

     public static class MoviesMapJoinRatingsMapper1 extends Mapper<LongWritable, Text, Text, IntWritable>{

         //用了存放加载到内存中的movies.dat数据

         private static Map<String,String> movieMap =  new HashMap<>();

         //key：电影ID

         Text outKey = new Text();

         //value：电影名+电影类型

         IntWritable outValue = new IntWritable();

         /**

          * movies.dat:    1::Toy Story (1995)::Animation|Children's|Comedy

          *

          *

          * 将小表(movies.dat)中的数据预先加载到内存中去

          * */

         @Override

         protected void setup(Context context) throws IOException, InterruptedException {

             Path[] localCacheFiles = context.getLocalCacheFiles();

             String strPath = localCacheFiles[0].toUri().toString();

             BufferedReader br = new BufferedReader(new FileReader(strPath));

             String readLine;

             while((readLine = br.readLine()) != null) {

                 String[] split = readLine.split("::");

                 String movieId = split[0];

                 String movieName = split[1];

                 String movieType = split[2];

                 movieMap.put(movieId, movieName+"\t"+movieType);

             }

             br.close();

         }

         /**

          * movies.dat:    1    ::    Toy Story (1995)    ::    Animation|Children's|Comedy

          *                 电影ID    电影名字                    电影类型

          *

          * ratings.dat:    1    ::    1193    ::    5    ::    978300760

          *                 用户ID    电影ID        评分        评分时间戳

          *

          * value:    ratings.dat读取的数据

          * */

         @Override

         protected void map(LongWritable key, Text value, Context context)

                 throws IOException, InterruptedException {

             String[] split = value.toString().split("::");

             String userId = split[0];

             String movieId = split[1];

             String movieRate = split[2];

             //根据movieId从内存中获取电影名和类型

             String movieNameAndType = movieMap.get(movieId);

             String movieName = movieNameAndType.split("\t")[0];

             String movieType = movieNameAndType.split("\t")[1];

             outKey.set(movieName);

             outValue.set(Integer.parseInt(movieRate));

             context.write(outKey, outValue);

         }

     }

     public static class MovieMR1Reducer1 extends Reducer<Text, IntWritable, Text, IntWritable>{

         //每部电影评论的次数

         int count;

         //评分次数

         IntWritable outValue = new IntWritable();

         @Override

         protected void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {

             count = 0;

             for(IntWritable value : values) {

                 count++;

             }

             outValue.set(count);

             context.write(key, outValue);

         }

     }

 }

MovieMR1_2.java

 public class MovieMR1_2 {

     public static void main(String[] args) throws Exception {

         if(args.length < 2) {

             args = new String[2];

             args[0] = "/movie/output/";

             args[1] = "/movie/output_last/";

         }

         Configuration conf1 = new Configuration();

         conf1.set("fs.defaultFS", "hdfs://hadoop1:9000/");

         System.setProperty("HADOOP_USER_NAME", "hadoop");

         FileSystem fs1 = FileSystem.get(conf1);

         Job job = Job.getInstance(conf1);

         job.setJarByClass(MovieMR1_2.class);

         job.setMapperClass(MoviesMapJoinRatingsMapper2.class);

         job.setReducerClass(MovieMR1Reducer2.class);

         job.setMapOutputKeyClass(MovieRating.class);

         job.setMapOutputValueClass(NullWritable.class);

         job.setOutputKeyClass(MovieRating.class);

         job.setOutputValueClass(NullWritable.class);

         Path inputPath1 = new Path(args[0]);

         Path outputPath1 = new Path(args[1]);

         if(fs1.exists(outputPath1)) {

             fs1.delete(outputPath1, true);

         }

         //对第一步的输出结果进行降序排序

         FileInputFormat.setInputPaths(job, inputPath1);

         FileOutputFormat.setOutputPath(job, outputPath1);

         boolean isDone = job.waitForCompletion(true);

         System.exit(isDone ? 0 : 1);

     }

     //注意输出类型为自定义对象MovieRating，MovieRating按照降序排序

     public static class MoviesMapJoinRatingsMapper2 extends Mapper<LongWritable, Text, MovieRating, NullWritable>{

         MovieRating outKey = new MovieRating();

         @Override

         protected void map(LongWritable key, Text value, Context context)

                 throws IOException, InterruptedException {

             //'Night Mother (1986)         70

             String[] split = value.toString().split("\t");

             outKey.setCount(Integer.parseInt(split[1]));;

             outKey.setMovieName(split[0]);

             context.write(outKey, NullWritable.get());

         }

     }

     //排序之后自然输出，只取前10部电影

     public static class MovieMR1Reducer2 extends Reducer<MovieRating, NullWritable, MovieRating, NullWritable>{

         Text outKey = new Text();

         int count = 0;

         @Override

         protected void reduce(MovieRating key, Iterable<NullWritable> values,Context context) throws IOException, InterruptedException {

             for(NullWritable value : values) {

                 count++;

                 if(count > 10) {

                     return;

                 }

                 context.write(key, value);

             }

         }

     }

 }

MovieRating.java

 public class MovieRating implements WritableComparable<MovieRating>{

     private String movieName;

     private int count;

     public String getMovieName() {

         return movieName;

     }

     public void setMovieName(String movieName) {

         this.movieName = movieName;

     }

     public int getCount() {

         return count;

     }

     public void setCount(int count) {

         this.count = count;

     }

     public MovieRating() {}

     public MovieRating(String movieName, int count) {

         super();

         this.movieName = movieName;

         this.count = count;

     }

     @Override

     public String toString() {

         return  movieName + "\t" + count;

     }

     @Override

     public void readFields(DataInput in) throws IOException {

         movieName = in.readUTF();

         count = in.readInt();

     }

     @Override

     public void write(DataOutput out) throws IOException {

         out.writeUTF(movieName);

         out.writeInt(count);

     }

     @Override

     public int compareTo(MovieRating o) {

         return o.count - this.count ;

     }

 }

2、分别求男性，女性当中评分最高的10部电影（性别，电影名，评分）

分析：此问题涉及到3个表的联合查询，需要先将2个小表的数据预先加载到内存中，再进行查询

对三表进行联合

MoviesThreeTableJoin.java

 /**

  * 进行3表的联合查询

  *

  * */

 public class MoviesThreeTableJoin {

     public static void main(String[] args) throws Exception {

         if(args.length < 4) {

             args = new String[4];

             args[0] = "/movie/input/";

             args[1] = "/movie/output2/";

             args[2] = "/movie/cache/movies.dat";

             args[3] = "/movie/cache/users.dat";

         }

         Configuration conf = new Configuration();

         conf.set("fs.defaultFS", "hdfs://hadoop1:9000/");

         System.setProperty("HADOOP_USER_NAME", "hadoop");

         FileSystem fs = FileSystem.get(conf);

         Job job = Job.getInstance(conf);

         job.setJarByClass(MoviesThreeTableJoin.class);

         job.setMapperClass(ThreeTableMapper.class);

         job.setOutputKeyClass(Text.class);

         job.setOutputValueClass(NullWritable.class);

         URI uriUsers = new URI("hdfs://hadoop1:9000"+args[3]);

         URI uriMovies = new URI("hdfs://hadoop1:9000"+args[2]);

         job.addCacheFile(uriUsers);

         job.addCacheFile(uriMovies);

         Path inputPath = new Path(args[0]);

         Path outputPath = new Path(args[1]);

         if(fs.exists(outputPath)) {

             fs.delete(outputPath,true);

         }

         FileInputFormat.setInputPaths(job, inputPath);

         FileOutputFormat.setOutputPath(job, outputPath);

         boolean isDone = job.waitForCompletion(true);

         System.exit(isDone ? 0 : 1);

     }

     public static class ThreeTableMapper extends Mapper<LongWritable, Text, Text, NullWritable>{

         //用于缓存movies和users中数据

         private Map<String,String> moviesMap = new HashMap<>();

         private Map<String,String> usersMap = new HashMap<>();

         //用来存放读取的ratings.dat中的一行数据

         String[] ratings;

         Text outKey = new Text();

         @Override

         protected void setup(Context context) throws IOException, InterruptedException {

             BufferedReader br = null;

             Path[] paths = context.getLocalCacheFiles();

             String usersLine = null;

             String moviesLine = null;

             for(Path path : paths) {

                 String name = path.toUri().getPath();

                 if(name.contains("movies.dat")) {

                     //读取movies.dat文件中的一行数据

                     br = new BufferedReader(new FileReader(name));

                     while((moviesLine = br.readLine()) != null) {

                         /**对读取的这行数据按照：：进行切分

                         *    2::Jumanji (1995)::Adventure|Children's|Fantasy

                         *    电影ID，电影名字，电影类型

                         *

                         *电影ID作为key，其余作为value

                         */

                         String[] split = moviesLine.split("::");

                         moviesMap.put(split[0], split[1]+"::"+split[2]);

                     }

                 }else if(name.contains("users.dat")) {

                     //读取users.dat文件中的一行数据

                     br = new BufferedReader(new FileReader(name));

                     while((usersLine = br.readLine()) != null) {

                         /**

                          * 对读取的这行数据按照：：进行切分

                          * 2::M::56::16::70072

                          * 用户id，性别，年龄，职业，邮政编码

                          *

                          * 用户ID作为key，其他的作为value

                          * */

                         String[] split = usersLine.split("::");

                         System.out.println(split[0]+"----"+split[1]);

                         usersMap.put(split[0], split[1]+"::"+split[2]+"::"+split[3]+"::"+split[4]);

                     }

                 }

             }

         }

         @Override

         protected void map(LongWritable key, Text value, Context context)

                 throws IOException, InterruptedException {

             ratings = value.toString().split("::");

             //通过电影ID和用户ID获取用户表和电影表中的其他信息

             String movies = moviesMap.get(ratings[1]);

             String users = usersMap.get(ratings[0]);

             //三表信息的联合

             String threeTables = value.toString()+"::"+movies+"::"+users;

             outKey.set(threeTables);

             context.write(outKey, NullWritable.get());

         }

     }

 }

三表联合之后的数据为

::::::::Winnie the Pooh and the Blustery Day ()::Animation|Children's::F::25::6::90027

::::::::Dumbo ()::Animation|Children's|Musical::F::25::6::90027

::::::::Die Hard ()::Action|Thriller::F::::::

::::::::Streetcar Named Desire, A ()::Drama::F::::::

::::::::Braveheart ()::Action|Drama|War::F::::::

::::::::Star Wars: Episode V - The Empire Strikes Back ()::Action|Adventure|Drama|Sci-Fi|War::F::::::

::::::::Raiders of the Lost Ark ()::Action|Adventure::F::::::

::::::::Aliens ()::Action|Sci-Fi|Thriller|War::F::::::

::::::::Good, The Bad and The Ugly, The ()::Action|Western::F::::::

::::::::Star Wars: Episode VI - Return of the Jedi ()::Action|Adventure|Romance|Sci-Fi|War::F::::::

字段解释

1000    ::    1036    ::    4    ::    975040964    ::    Die Hard (1988)    ::    Action|Thriller    ::    F    ::    25    ::    6    ::    90027

用户ID        电影ID        评分     　　 评分时间戳             电影名字                  电影类型                性别        年龄        职业        邮政编码

0　　　　　　　　1　　　　　　　　2　　　　　　　　3　　　　　　　　　　　　4　　　　　　　　　　　　　　5　　　　　　　　　　　　6　　　　　　7　　　　　　8　　　　　　   9

要分别求男性，女性当中评分最高的10部电影（性别，电影名，评分）

1、以性别和电影名分组，以电影名+性别为key，以评分为value进行计算；

2、以性别+电影名+评分作为对象，以性别分组，以评分降序进行输出TOP10

业务逻辑：MoviesDemo2.java

 public class MoviesDemo2 {

     public static void main(String[] args) throws Exception {

         Configuration conf1 = new Configuration();

         Configuration conf2 = new Configuration();

         FileSystem fs1 = FileSystem.get(conf1);

         FileSystem fs2 = FileSystem.get(conf2);

         Job job1 = Job.getInstance(conf1);

         Job job2 = Job.getInstance(conf2);

         job1.setJarByClass(MoviesDemo2.class);

         job1.setMapperClass(MoviesDemo2Mapper1.class);

         job2.setMapperClass(MoviesDemo2Mapper2.class);

         job1.setReducerClass(MoviesDemo2Reducer1.class);

         job2.setReducerClass(MoviesDemo2Reducer2.class);

         job1.setOutputKeyClass(Text.class);

         job1.setOutputValueClass(DoubleWritable.class);

         job2.setOutputKeyClass(MoviesSexBean.class);

         job2.setOutputValueClass(NullWritable.class);

         job2.setGroupingComparatorClass(MoviesSexGC.class);

         Path inputPath1 = new Path("D:\\MR\\hw\\movie\\output3he1");

         Path outputPath1 = new Path("D:\\MR\\hw\\movie\\output2_1");

         Path inputPath2 = new Path("D:\\MR\\hw\\movie\\output2_1");

         Path outputPath2 = new Path("D:\\MR\\hw\\movie\\output2_end");

         if(fs1.exists(outputPath1)) {

             fs1.delete(outputPath1,true);

         }

         if(fs2.exists(outputPath2)) {

             fs2.delete(outputPath2,true);

         }

         FileInputFormat.setInputPaths(job1, inputPath1);

         FileOutputFormat.setOutputPath(job1, outputPath1);

         FileInputFormat.setInputPaths(job2, inputPath2);

         FileOutputFormat.setOutputPath(job2, outputPath2);

         JobControl control = new JobControl("MoviesDemo2");

         ControlledJob aJob = new ControlledJob(job1.getConfiguration());

         ControlledJob bJob = new ControlledJob(job2.getConfiguration());

         bJob.addDependingJob(aJob);

         control.addJob(aJob);

         control.addJob(bJob);

         Thread thread = new Thread(control);

         thread.start();

         while(!control.allFinished()) {

             thread.sleep(1000);

         }

         System.exit(0);

     }

     /**

      * 数据来源：3个文件关联之后的输出文件

      * 以电影名+性别为key，以评分为value进行输出

      *

      * 1000::1036::4::975040964::Die Hard (1988)::Action|Thriller::F::25::6::90027

      *

      * 用户ID::电影ID::评分::评分时间戳::电影名字::电影类型::性别::年龄::职业::邮政编码

      *

      * */

     public static class MoviesDemo2Mapper1 extends Mapper<LongWritable, Text, Text, DoubleWritable>{

         Text outKey = new Text();

         DoubleWritable outValue = new DoubleWritable();

         @Override

         protected void map(LongWritable key, Text value,Context context)

                 throws IOException, InterruptedException {

             String[] split = value.toString().split("::");

             String strKey = split[4]+"\t"+split[6];

             String strValue = split[2];

             outKey.set(strKey);

             outValue.set(Double.parseDouble(strValue));

             context.write(outKey, outValue);

         }

     }

     /**

      * 以电影名+性别为key，计算平均分

      * */

     public static class MoviesDemo2Reducer1 extends Reducer<Text, DoubleWritable, Text, DoubleWritable>{

         DoubleWritable outValue = new DoubleWritable();

         @Override

         protected void reduce(Text key, Iterable<DoubleWritable> values,Context context)

                 throws IOException, InterruptedException {

             int count = 0;

             double sum = 0;

             for(DoubleWritable value : values) {

                 count++;

                 sum += Double.parseDouble(value.toString());

             }

             double avg = sum / count;

             outValue.set(avg);

             context.write(key, outValue);

         }

     }

     /**

      * 以电影名+性别+评分作为对象，以性别分组，以评分降序排序

      * */

     public static class MoviesDemo2Mapper2 extends Mapper<LongWritable, Text, MoviesSexBean, NullWritable>{

         MoviesSexBean outKey = new MoviesSexBean();

         @Override

         protected void map(LongWritable key, Text value,Context context)

                 throws IOException, InterruptedException {

             String[] split = value.toString().split("\t");

             outKey.setMovieName(split[0]);

             outKey.setSex(split[1]);

             outKey.setScore(Double.parseDouble(split[2]));

             context.write(outKey, NullWritable.get());

         }

     }

     /**

      * 取性别男女各前10名评分最好的电影

      * */

     public static class MoviesDemo2Reducer2 extends Reducer<MoviesSexBean, NullWritable, MoviesSexBean, NullWritable>{

         @Override

         protected void reduce(MoviesSexBean key, Iterable<NullWritable> values,Context context)

                 throws IOException, InterruptedException {

             int count = 0;

             for(NullWritable nvl : values) {

                 count++;

                 context.write(key, NullWritable.get());

                 if(count == 10) {

                     return;

                 }

             }

         }

     }

 }

对象：MoviesSexBean.java

 public class MoviesSexBean implements WritableComparable<MoviesSexBean>{

     private String movieName;

     private String sex;

     private double score;

     public MoviesSexBean() {

         super();

     }

     public MoviesSexBean(String movieName, String sex, double score) {

         super();

         this.movieName = movieName;

         this.sex = sex;

         this.score = score;

     }

     public String getMovieName() {

         return movieName;

     }

     public void setMovieName(String movieName) {

         this.movieName = movieName;

     }

     public String getSex() {

         return sex;

     }

     public void setSex(String sex) {

         this.sex = sex;

     }

     public double getScore() {

         return score;

     }

     public void setScore(double score) {

         this.score = score;

     }

     @Override

     public String toString() {

         return movieName + "\t" + sex + "\t" + score ;

     }

     @Override

     public void readFields(DataInput in) throws IOException {

         movieName = in.readUTF();

         sex = in.readUTF();

         score = in.readDouble();

     }

     @Override

     public void write(DataOutput out) throws IOException {

         out.writeUTF(movieName);

         out.writeUTF(sex);

         out.writeDouble(score);

     }

     @Override

     public int compareTo(MoviesSexBean o) {

         int result = this.getSex().compareTo(o.getSex());

         if(result == 0) {

             double diff = this.getScore() - o.getScore();

             if(diff == 0) {

                 return 0;

             }else {

                 return diff > 0 ? -1 : 1;

             }

         }else {

             return result > 0 ? -1 : 1;

         }

     }

 }

分组：MoviesSexGC.java

 public class MoviesSexGC extends WritableComparator{

     public MoviesSexGC() {

         super(MoviesSexBean.class,true);

     }

     @Override

     public int compare(WritableComparable a, WritableComparable b) {

         MoviesSexBean msb1 = (MoviesSexBean)a;

         MoviesSexBean msb2 = (MoviesSexBean)b;

         return msb1.getSex().compareTo(msb2.getSex());

     }

 }

3、求movieid = 2116这部电影各年龄段（因为年龄就只有7个，就按这个7个分就好了）的平均影评（年龄段，评分）

以第二部三表联合之后的文件进行操作

 public class MovieDemo3 {

     public static void main(String[] args) throws Exception {

         Configuration conf = new Configuration();

         FileSystem fs = FileSystem.get(conf);

         Job job = Job.getInstance(conf);

         job.setJarByClass(MovieDemo3.class);

         job.setMapperClass(MovieDemo3Mapper.class);

         job.setReducerClass(MovieDemo3Reducer.class);

         job.setOutputKeyClass(Text.class);

         job.setOutputValueClass(DoubleWritable.class);

         Path inputPath = new Path("D:\\MR\\hw\\movie\\3he1");

         Path outputPath = new Path("D:\\MR\\hw\\movie\\outpu3");

         if(fs.exists(outputPath)) {

             fs.delete(outputPath,true);

         }

         FileInputFormat.setInputPaths(job, inputPath);

         FileOutputFormat.setOutputPath(job, outputPath);

         boolean isDone = job.waitForCompletion(true);

         System.exit(isDone ? 0 : 1);

     }

     /**

      * 1000::1036::4::975040964::Die Hard (1988)::Action|Thriller::F::25::6::90027

      *

      * 用户ID::电影ID::评分::评分时间戳::电影名字::电影类型::性别::年龄::职业::邮政编码

      * 0        1     2      3        4       5     6   7    8    9

      *

      * key:电影ID+电影名字+年龄段

      * value:评分

      * 求movieid = 2116这部电影各年龄段

      * */

     public static class MovieDemo3Mapper extends Mapper<LongWritable, Text, Text, DoubleWritable>{

         Text outKey = new Text();

         DoubleWritable outValue = new DoubleWritable();

         @Override

         protected void map(LongWritable key, Text value, Context context)

                 throws IOException, InterruptedException {

             String[] split = value.toString().split("::");

             int movieID = Integer.parseInt(split[1]);

             if(movieID == 2116) {

                 String strKey = split[1]+"\t"+split[4]+"\t"+split[7];

                 String strValue = split[2];

                 outKey.set(strKey);

                 outValue.set(Double.parseDouble(strValue));

                 context.write(outKey, outValue);

             }

         }

     }

     /**

      * 对map的输出结果求平均评分

      * */

     public static class MovieDemo3Reducer extends Reducer<Text, DoubleWritable, Text, DoubleWritable>{

         DoubleWritable outValue = new DoubleWritable();

         @Override

         protected void reduce(Text key, Iterable<DoubleWritable> values, Context context)

                 throws IOException, InterruptedException {

             int count = 0;

             double sum = 0;

             for(DoubleWritable value : values) {

                 count++;

                 sum += Double.parseDouble(value.toString());

             }

             double avg = sum / count;

             outValue.set(avg);

             context.write(key, outValue);

         }

     }

 }

4、求最喜欢看电影（影评次数最多）的那位女性评最高分的10部电影的平均影评分（人，电影名，影评）

1000    ::    1036    ::    4    ::    975040964    ::    Die Hard (1988)    ::    Action|Thriller    ::    F    ::    25    ::    6    ::    90027

用户ID        电影ID        评分     　　 评分时间戳             电影名字                  电影类型                性别        年龄        职业        邮政编码

0　　　　　　　　1　　　　　　　　2　　　　　　　　3　　　　　　　　　　　　4　　　　　　　　　　　　　　5　　　　　　　　　　　　6　　　　　　7　　　　　　8　　　　　　   9

（1）求出评论次数最多的女性ID

　　MoviesDemo4_1.java

 public class MoviesDemo4 {

     public static void main(String[] args) throws Exception {

         Configuration conf1 = new Configuration();

         FileSystem fs1 = FileSystem.get(conf1);

         Job job1 = Job.getInstance(conf1);

         job1.setJarByClass(MoviesDemo4.class);

         job1.setMapperClass(MoviesDemo4Mapper1.class);

         job1.setReducerClass(MoviesDemo4Reducer1.class);

         job1.setMapOutputKeyClass(Text.class);

         job1.setMapOutputValueClass(Text.class);

         job1.setOutputKeyClass(Text.class);

         job1.setOutputValueClass(DoubleWritable.class);

         Configuration conf2 = new Configuration();

         FileSystem fs2 = FileSystem.get(conf2);

         Job job2 = Job.getInstance(conf2);

         job2.setJarByClass(MoviesDemo4.class);

         job2.setMapperClass(MoviesDemo4Mapper2.class);

         job2.setReducerClass(MoviesDemo4Reducer2.class);

         job2.setMapOutputKeyClass(Moviegoers.class);

         job2.setMapOutputValueClass(NullWritable.class);

         job2.setOutputKeyClass(Moviegoers.class);

         job2.setOutputValueClass(NullWritable.class);

         Path inputPath1 = new Path("D:\\MR\\hw\\movie\\3he1");

         Path outputPath1 = new Path("D:\\MR\\hw\\movie\\outpu4_1");

         if(fs1.exists(outputPath1)) {

             fs1.delete(outputPath1,true);

         }

         FileInputFormat.setInputPaths(job1, inputPath1);

         FileOutputFormat.setOutputPath(job1, outputPath1);

         Path inputPath2 = new Path("D:\\MR\\hw\\movie\\outpu4_1");

         Path outputPath2 = new Path("D:\\MR\\hw\\movie\\outpu4_2");

         if(fs2.exists(outputPath2)) {

             fs2.delete(outputPath2,true);

         }

         FileInputFormat.setInputPaths(job2, inputPath2);

         FileOutputFormat.setOutputPath(job2, outputPath2);

         JobControl control = new JobControl("MoviesDemo4");

         ControlledJob ajob = new ControlledJob(job1.getConfiguration());

         ControlledJob bjob = new ControlledJob(job2.getConfiguration());

         bjob.addDependingJob(ajob);

         control.addJob(ajob);

         control.addJob(bjob);

         Thread thread = new Thread(control);

         thread.start();

         while(!control.allFinished()) {

             thread.sleep(1000);

         }

         System.exit(0);

     }

     /**

      * 1000::1036::4::975040964::Die Hard (1988)::Action|Thriller::F::25::6::90027

      *

      * 用户ID::电影ID::评分::评分时间戳::电影名字::电影类型::性别::年龄::职业::邮政编码

      * 0        1     2      3        4       5     6   7    8    9

      *

      * 1、key:用户ID

       * 2、value：电影名+评分

      *

      * */

     public static class MoviesDemo4Mapper1 extends Mapper<LongWritable, Text, Text, Text>{

         Text outKey = new Text();

         Text outValue = new Text();

         @Override

         protected void map(LongWritable key, Text value, Context context)

                 throws IOException, InterruptedException {

             String[] split = value.toString().split("::");

             String strKey = split[0];

             String strValue = split[4]+"\t"+split[2];

             if(split[6].equals("F")) {

                 outKey.set(strKey);

                 outValue.set(strValue);

                 context.write(outKey, outValue);

             }

         }

     }

     //统计每位女性的评论总数

     public static class MoviesDemo4Reducer1 extends Reducer<Text, Text, Text, IntWritable>{

         IntWritable outValue = new IntWritable();

         @Override

         protected void reduce(Text key, Iterable<Text> values, Context context)

                 throws IOException, InterruptedException {

             int count = 0;

             for(Text value : values) {

                 count++;

             }

             outValue.set(count);

             context.write(key, outValue);

         }

     }

     //对第一次MapReduce的输出结果进行降序排序

     public static class MoviesDemo4Mapper2 extends Mapper<LongWritable, Text,Moviegoers,NullWritable>{

         Moviegoers outKey = new Moviegoers();

         @Override

         protected void map(LongWritable key, Text value, Context context)

                 throws IOException, InterruptedException {

             String[] split = value.toString().split("\t");

             outKey.setName(split[0]);

             outKey.setCount(Integer.parseInt(split[1]));

             context.write(outKey, NullWritable.get());

         }

     }

     //排序之后取第一个值（评论最多的女性ID和评论次数）

     public static class MoviesDemo4Reducer2 extends Reducer<Moviegoers,NullWritable, Moviegoers,NullWritable>{

         int count = 0;

         @Override

         protected void reduce(Moviegoers key, Iterable<NullWritable> values,Context context)

                 throws IOException, InterruptedException {

             for(NullWritable nvl : values) {

                 count++;

                 if(count > 1) {

                     return;

                 }

                 context.write(key, nvl);

             }

         }

     }

 }

（2）

Hadoop学习之路（二十六）MapReduce的API使用（三）的更多相关文章

嵌入式Linux驱动学习之路(二十六)DM9000C网卡驱动程序
基于DM9000C的原厂代码修改dm9000c的驱动程序. 首先确认内存的基地址 iobase. 确定中断号码. 打开模块的初始化函数定义. 配置内存控制器的相应时序(结合DM9000C.C的手册). ...
Hadoop学习之路（十六）Hadoop命令hadoop fs -ls详解
http://blog.csdn.net/strongyoung88/article/details/68952248
FastAPI 学习之路（十六）Form表单
系列文章: FastAPI 学习之路(一)fastapi--高性能web开发框架 FastAPI 学习之路(二) FastAPI 学习之路(三) FastAPI 学习之路(四) FastAPI 学习之 ...
【Java学习笔记之二十六】深入理解Java匿名内部类
在[Java学习笔记之二十五]初步认知Java内部类中对匿名内部类做了一个简单的介绍,但是内部类还存在很多其他细节问题,所以就衍生出这篇博客.在这篇博客中你可以了解到匿名内部类的使用.匿名内部类要注意 ...
Linux学习之CentOS(二十六)--Linux磁盘管理：LVM逻辑卷的创建及使用
在上一篇随笔里面 Linux学习之CentOS(二十五)--Linux磁盘管理:LVM逻辑卷基本概念及LVM的工作原理,详细的讲解了Linux的动态磁盘管理LVM逻辑卷的基本概念以及LVM的工作原理, ...
Hadoop学习之路（十三）MapReduce的初识
MapReduce是什么首先让我们来重温一下 hadoop 的四大组件: HDFS:分布式存储系统 MapReduce:分布式计算系统 YARN:hadoop 的资源调度系统 Common:以上三大 ...
Hadoop学习之路（十二）分布式集群中HDFS系统的各种角色
NameNode 学习目标理解 namenode 的工作机制尤其是元数据管理机制,以增强对 HDFS 工作原理的理解,及培养 hadoop 集群运营中“性能调优”.“namenode”故障问题的分 ...
Hadoop学习之路（十五）MapReduce的多Job串联和全局计数器
MapReduce 多 Job 串联需求一个稍复杂点的处理逻辑往往需要多个 MapReduce 程序串联处理,多 job 的串联可以借助 MapReduce 框架的 JobControl 实现实 ...
Hadoop学习之路（十四）MapReduce的核心运行机制
概述一个完整的 MapReduce 程序在分布式运行时有两类实例进程: 1.MRAppMaster:负责整个程序的过程调度及状态协调 2.Yarnchild:负责 map 阶段的整个数据处理流程 3 ...
Hadoop学习之路（十九）MapReduce框架排序
流量统计项目案例样本示例需求 1. 统计每一个用户(手机号)所耗费的总上行流量.总下行流量,总流量 2. 得出上题结果的基础之上再加一个需求:将统计结果按照总流量倒序排序 3. 将流量汇总统计结果 ...

随机推荐

Docker学习（六）: 网络使用与配置
特别声明: 博文主要是学习过程中的知识整理,以便之后的查阅回顾.部分内容来源于网络(如有摘录未标注请指出).内容如有差错,也欢迎指正! =============系列文章============= 1 ...
Ubuntu14.04默认cmake升级为3.x
由于Ubuntu14.04的cmake版本为2.8.x,而如果需要cmake3.x版本时,无法生成makefile,有两种方法可以安装cmake3.4.1: sudo apt-get install ...
Django(二)：url和views
网络通讯的本质是socket,从socket封装到MVC模式,参见另外几篇博客.本节笔记整理自Django2.0官方文档. 一.url调度器 - django.urls.path django2.0中 ...
mysql随机查询符合条件的几条记录
随机查询,方法可以有很多种.比如,查询出所有记录,然后随机从列表中取n条记录.使用程序便可实现.可是程序实现必须查询出所有符合条件的记录(至少是所有符合条件的记录id),然后再随机取出n个id,查询数 ...
sql:PostgreSQL
PostgreSQL sql script: -- Database: geovindu -- DROP DATABASE geovindu; CREATE DATABASE geovindu WIT ...
LeetCode赛题392---- Is Subsequence
392. Is Subsequence Given a string s and a string t, check if s is subsequence of t. You may assume ...
MySQL数据库(10)----IN 和 NOT IN 子查询
当子查询要返回多个行来与外层查询进行比较运算时,可以使用运算符 IN 和 NOT IN.它们会测试某个给定的比较值是否存在于某一组值里.如果外层查询里的行与子查询返回的某一个行相匹配,那么 IN 的结 ...
Hive、Spark SQL、Impala比较
Hive.Spark SQL.Impala比较 Hive.Spark SQL和Impala三种分布式SQL查询引擎都是SQL-on-Hadoop解决方案,但又各有特点.前面已经讨论了Hi ...
下载 github 项目文件到本地方法
下载 github 项目文件到本地方法本篇终极,收集 3 种方法最厉害 666 的方法直接访问网站: 操作如下: 本地工具版下载方法首先需要下载 git 客户端我就不转载了,上面有客户端的使 ...
【转】pscp实现远程文件（夹）传输
原文地址:http://blog.163.com/yang_jianli/blog/static/16199000620128251383197/ pscp与linux下的scp命令相似,功能相同,在 ...

Hadoop学习之路（二十六）MapReduce的API使用（三）