统计日志文件中各访问状态的个数.

1.将日志数据上传到hdfs

路径 /mapreduce/data/apachelog/in 中

内容如下

  1. ::::::: - - [/Feb/::: +] "GET / HTTP/1.1"
  2. ::::::: - - [/Feb/::: +] "GET /tomcat.css HTTP/1.1"
  3. ::::::: - - [/Feb/::: +] "GET /tomcat.png HTTP/1.1"
  4. ::::::: - - [/Feb/::: +] "GET /bg-nav.png HTTP/1.1"
  5. ::::::: - - [/Feb/::: +] "GET /asf-logo.png HTTP/1.1"
  6. ::::::: - - [/Feb/::: +] "GET /bg-upper.png HTTP/1.1"
  7. ::::::: - - [/Feb/::: +] "GET /bg-button.png HTTP/1.1"
  8. ::::::: - - [/Feb/::: +] "GET /bg-middle.png HTTP/1.1"
  9. 127.0.0.1 - - [/Feb/::: +] "GET / HTTP/1.1"
  10. 127.0.0.1 - - [/Feb/::: +] "GET / HTTP/1.1"
  11. ::::::: - - [/Feb/::: +] "GET / HTTP/1.1"
  12. ::::::: - - [/Feb/::: +] "GET / HTTP/1.1"
  13. 127.0.0.1 - - [/Feb/::: +] "GET / HTTP/1.1"
  14. 127.0.0.1 - - [/Feb/::: +] "GET / HTTP/1.1"
  15. ::::::: - - [/Feb/::: +] "GET / HTTP/1.1"
  16. ::::::: - - [/Feb/::: +] "GET / HTTP/1.1"
  17. 127.0.0.1 - - [/Feb/::: +] "GET / HTTP/1.1"
  18. 127.0.0.1 - - [/Feb/::: +] "GET / HTTP/1.1"
  19. ::::::: - - [/Feb/::: +] "GET /sentiment_ms/login HTTP/1.1"
  20. ::::::: - - [/Feb/::: +] "GET / HTTP/1.1"
  21. ::::::: - - [/Feb/::: +] "GET / HTTP/1.1"

2.代码

  1. package com.zhen.apachelog;
  2.  
  3. import java.io.IOException;
  4.  
  5. import org.apache.hadoop.conf.Configuration;
  6. import org.apache.hadoop.fs.Path;
  7. import org.apache.hadoop.io.IntWritable;
  8. import org.apache.hadoop.io.Text;
  9. import org.apache.hadoop.mapreduce.Job;
  10. import org.apache.hadoop.mapreduce.Mapper;
  11. import org.apache.hadoop.mapreduce.Reducer;
  12. import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
  13. import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
  14. import org.apache.hadoop.util.GenericOptionsParser;
  15.  
  16. public class ApacheLog {
  17.  
  18. public static class apacheMapper extends Mapper<Object, Text, Text, IntWritable>{
  19.  
  20. @Override
  21. protected void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context)
  22. throws IOException, InterruptedException {
  23. String valueStr = value.toString();
  24. String[] strings = valueStr.split("\" ");
  25. String status = strings[].split(" ")[];
  26. context.write(new Text(status), new IntWritable());
  27. }
  28.  
  29. }
  30.  
  31. public static class apacheReduce extends Reducer<Text, IntWritable, Text, IntWritable>{
  32.  
  33. @Override
  34. protected void reduce(Text key, Iterable<IntWritable> value,
  35. Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
  36. int count = ;
  37. for (IntWritable intWritable : value) {
  38. count+=intWritable.get();
  39. }
  40. context.write(key, new IntWritable(count));
  41. }
  42.  
  43. }
  44.  
  45. public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
  46.  
  47. Configuration conf = new Configuration();
  48. String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
  49.  
  50. Job job = new Job(conf,"ApacheLog");
  51. job.setJarByClass(ApacheLog.class);
  52.  
  53. job.setMapperClass(apacheMapper.class);
  54. job.setReducerClass(apacheReduce.class);
  55.  
  56. job.setOutputKeyClass(Text.class);
  57. job.setOutputValueClass(IntWritable.class);
  58.  
  59. FileInputFormat.addInputPaths(job, args[]);
  60. FileOutputFormat.setOutputPath(job, new Path(args[]));
  61.  
  62. System.exit(job.waitForCompletion(true)?:);
  63. }
  64.  
  65. }

3.将代码生成jar包

4.调用

EFdeMacBook-Pro:hadoop-2.8.0 FengZhen$ hadoop jar /Users/FengZhen/Desktop/ApacheLog.jar com.zhen.apachelog.ApacheLog /mapreduce/data/apachelog/in /mapreduce/data/apachelog/out
17/09/13 15:32:22 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
17/09/13 15:32:23 INFO input.FileInputFormat: Total input files to process : 1
17/09/13 15:32:23 INFO mapreduce.JobSubmitter: number of splits:1
17/09/13 15:32:23 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1505268150495_0017
17/09/13 15:32:23 INFO impl.YarnClientImpl: Submitted application application_1505268150495_0017
17/09/13 15:32:23 INFO mapreduce.Job: The url to track the job: http://192.168.1.64:8088/proxy/application_1505268150495_0017/
17/09/13 15:32:23 INFO mapreduce.Job: Running job: job_1505268150495_0017
17/09/13 15:32:32 INFO mapreduce.Job: Job job_1505268150495_0017 running in uber mode : false
17/09/13 15:32:32 INFO mapreduce.Job: map 0% reduce 0%
17/09/13 15:32:37 INFO mapreduce.Job: map 100% reduce 0%
17/09/13 15:32:43 INFO mapreduce.Job: map 100% reduce 100%
17/09/13 15:32:43 INFO mapreduce.Job: Job job_1505268150495_0017 completed successfully
17/09/13 15:32:43 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=216
FILE: Number of bytes written=272795
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1776
HDFS: Number of bytes written=13
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=3160
Total time spent by all reduces in occupied slots (ms)=3167
Total time spent by all map tasks (ms)=3160
Total time spent by all reduce tasks (ms)=3167
Total vcore-milliseconds taken by all map tasks=3160
Total vcore-milliseconds taken by all reduce tasks=3167
Total megabyte-milliseconds taken by all map tasks=3235840
Total megabyte-milliseconds taken by all reduce tasks=3243008
Map-Reduce Framework
Map input records=21
Map output records=21
Map output bytes=168
Map output materialized bytes=216
Input split bytes=150
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=216
Reduce input records=21
Reduce output records=2
Spilled Records=42
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=54
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=358612992
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1626
File Output Format Counters
Bytes Written=13

5.查看结果

EFdeMacBook-Pro:lib FengZhen$ hadoop fs -ls /mapreduce/data/apachelog/out
Found 2 items
-rw-r--r-- 1 FengZhen supergroup 0 2017-09-13 15:32 /mapreduce/data/apachelog/out/_SUCCESS
-rw-r--r-- 1 FengZhen supergroup 13 2017-09-13 15:32 /mapreduce/data/apachelog/out/part-r-00000
EFdeMacBook-Pro:lib FengZhen$ hadoop fs -text /mapreduce/data/apachelog/out/part-r-00000
200 8
404 13

统计apachelog各访问状态个数(使用MapReduce)的更多相关文章

  1. 部署Nginx网站服务实现访问状态统计以及访问控制功能

    原文:https://blog.51cto.com/11134648/2130987 Nginx专为性能优化而开发,最知名的优点是它的稳定性和低系统资源消耗,以及对HTTP并发连接的高处理能力,单个物 ...

  2. shell+curl监控网站页面(域名访问状态),并利用sedemail发送邮件

    应领导要求,对公司几个主要站点的域名访问情况进行监控.下面分享一个监控脚本,并利用sendemail进行邮件发送. 监控脚本如下:下面是写了一个多线程的网站状态检测脚本,直接从文件中读出站点地址,然后 ...

  3. Javascript 统计复选框选中个数

    var checked = document.getElementsByName("checked_c[]"); var checked_counts = 0; for(var i ...

  4. 学习笔记_过滤器应用_1(分ip统计网站的访问次数)

    分ip统计网站的访问次数 ip count 192.168.1.111 2 192.168.1.112 59 统计工作需要在所有资源之前都执行,那么就可以放到Filter中了. 我们这个过滤器不打算做 ...

  5. Linux 统计文件夹下文件个数

    查看统计当前目录下文件的个数,包括子目录里的. ls -lR| grep "^-" | wc -l Linux下查看某个目录下的文件.或文件夹个数用到3个命令:ls列目录.用gre ...

  6. 学c语言做练习之​统计文件中字符的个数

    统计文件中字符的个数(采用命令行参数) #include<stdio.h> #include<stdlib.h> int main(int argc, char *argv[] ...

  7. 题目--统计一行文本的单词个数(PTA预习题)

    PTA预习题——统计一行文本的单词个数 7-1 统计一行文本的单词个数 (15 分) 本题目要求编写程序统计一行字符中单词的个数.所谓“单词”是指连续不含空格的字符串,各单词之间用空格分隔,空格数可以 ...

  8. shell+curl监控网站页面(域名访问状态),并利用sendemail发送邮件

    应领导要求,对公司几个主要站点的域名访问情况进行监控.下面分享一个监控脚本,并利用sendemail进行邮件发送. 监控脚本如下:下面是写了一个多线程的网站状态检测脚本,直接从文件中读出站点地址,然后 ...

  9. Linux上统计文件夹下文件个数以及目录个数

    对于linux终端用户而言,统计文件夹下文件的多少是经常要做的操作,于我而言,我会经常在谷歌搜索一个命令,“如何在linux统计文件夹的个数”,然后点击自己想要的答案,但是有时候不知道统计文件夹命令运 ...

随机推荐

  1. Hibernate学习之单向多对一映射

    © 版权声明:本文为博主原创文章,转载请注明出处 说明:该实例是通过映射文件和注解两种方式实现的.可根据自己的需要选择合适的方式 实例: 1.项目结构 2.pom.xml <project xm ...

  2. Atitit.跨语言 java c#.net php js常用的codec encode算法api 兼容性  应该内置到语言里面

    Atitit.跨语言 java c#.net php js常用的codec encode算法api 兼容性  应该内置到语言里面 1. 常用算法1 1.1. 目录2 1.2. 定义和用法编辑2 1.3 ...

  3. HDFS源码分析之FSImage文件内容(一)总体格式

    FSImage文件是HDFS中名字节点NameNode上文件/目录元数据在特定某一时刻的持久化存储文件.它的作用不言而喻,在HA出现之前,NameNode因为各种原因宕机后,若要恢复或在其他机器上重启 ...

  4. zip文件压缩

    zip文件结构            上面中的每一行都是一个条目,zip文件就是由一个或者多个条目组成.      条目在Java中对应ZipEntry类       创建zip压缩文件      知 ...

  5. git学习之时光机穿梭(四)

    时光机穿梭 我们已经成功地添加并提交了一个readme.txt文件,现在,是时候继续工作了,于是,我们继续修改readme.txt文件,改成如下内容: Git is a distributed ver ...

  6. Python标识符

    在python里,标识符有字母.数字.下划线组成. 在python中,所有标识符可以包括英文.数字以及下划线(_),但不能以数字开头. python中的标识符是区分大小写的. 以下划线开头的标识符是有 ...

  7. react-native create-react-app创建项目报错SyntaxError: Unexpected end of JSON input while parsing near '...ttachment":false,"tar' npm代理

    SyntaxError: Unexpected end of JSON input while parsing near '...ttachment":false,"tar' 错误 ...

  8. (转)linux设备驱动之USB数据传输分析 二

    3.2:控制传输过程1:root hub的控制传输在前面看到,对于root hub的情况,流程会转入rh_urb_enqueue().代码如下:static int rh_urb_enqueue (s ...

  9. RabbitMQ与Redis做队列比较

    本文仅针对RabbitMQ与Redis做队列应用时的情况进行对比 具体采用什么方式实现,还需要取决于系统的实际需求简要介绍RabbitMQRabbitMQ是实现AMQP(高级消息队列协议)的消息中间件 ...

  10. 解决xcode5升级后,Undefined symbols for architecture arm64:问题

    The issue is that the cocoapods have not been built for arm64 architecture yet thus they cannot be l ...