统计apachelog各访问状态个数(使用MapReduce)
统计日志文件中各访问状态的个数.
1.将日志数据上传到hdfs
路径 /mapreduce/data/apachelog/in 中
内容如下
::::::: - - [/Feb/::: +] "GET / HTTP/1.1"
::::::: - - [/Feb/::: +] "GET /tomcat.css HTTP/1.1"
::::::: - - [/Feb/::: +] "GET /tomcat.png HTTP/1.1"
::::::: - - [/Feb/::: +] "GET /bg-nav.png HTTP/1.1"
::::::: - - [/Feb/::: +] "GET /asf-logo.png HTTP/1.1"
::::::: - - [/Feb/::: +] "GET /bg-upper.png HTTP/1.1"
::::::: - - [/Feb/::: +] "GET /bg-button.png HTTP/1.1"
::::::: - - [/Feb/::: +] "GET /bg-middle.png HTTP/1.1"
127.0.0.1 - - [/Feb/::: +] "GET / HTTP/1.1"
127.0.0.1 - - [/Feb/::: +] "GET / HTTP/1.1"
::::::: - - [/Feb/::: +] "GET / HTTP/1.1"
::::::: - - [/Feb/::: +] "GET / HTTP/1.1"
127.0.0.1 - - [/Feb/::: +] "GET / HTTP/1.1"
127.0.0.1 - - [/Feb/::: +] "GET / HTTP/1.1"
::::::: - - [/Feb/::: +] "GET / HTTP/1.1"
::::::: - - [/Feb/::: +] "GET / HTTP/1.1"
127.0.0.1 - - [/Feb/::: +] "GET / HTTP/1.1"
127.0.0.1 - - [/Feb/::: +] "GET / HTTP/1.1"
::::::: - - [/Feb/::: +] "GET /sentiment_ms/login HTTP/1.1"
::::::: - - [/Feb/::: +] "GET / HTTP/1.1"
::::::: - - [/Feb/::: +] "GET / HTTP/1.1"
2.代码
package com.zhen.apachelog; import java.io.IOException; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser; public class ApacheLog { public static class apacheMapper extends Mapper<Object, Text, Text, IntWritable>{ @Override
protected void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
String valueStr = value.toString();
String[] strings = valueStr.split("\" ");
String status = strings[].split(" ")[];
context.write(new Text(status), new IntWritable());
} } public static class apacheReduce extends Reducer<Text, IntWritable, Text, IntWritable>{ @Override
protected void reduce(Text key, Iterable<IntWritable> value,
Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
int count = ;
for (IntWritable intWritable : value) {
count+=intWritable.get();
}
context.write(key, new IntWritable(count));
} } public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs(); Job job = new Job(conf,"ApacheLog");
job.setJarByClass(ApacheLog.class); job.setMapperClass(apacheMapper.class);
job.setReducerClass(apacheReduce.class); job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPaths(job, args[]);
FileOutputFormat.setOutputPath(job, new Path(args[])); System.exit(job.waitForCompletion(true)?:);
} }
3.将代码生成jar包
4.调用
EFdeMacBook-Pro:hadoop-2.8.0 FengZhen$ hadoop jar /Users/FengZhen/Desktop/ApacheLog.jar com.zhen.apachelog.ApacheLog /mapreduce/data/apachelog/in /mapreduce/data/apachelog/out
17/09/13 15:32:22 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
17/09/13 15:32:23 INFO input.FileInputFormat: Total input files to process : 1
17/09/13 15:32:23 INFO mapreduce.JobSubmitter: number of splits:1
17/09/13 15:32:23 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1505268150495_0017
17/09/13 15:32:23 INFO impl.YarnClientImpl: Submitted application application_1505268150495_0017
17/09/13 15:32:23 INFO mapreduce.Job: The url to track the job: http://192.168.1.64:8088/proxy/application_1505268150495_0017/
17/09/13 15:32:23 INFO mapreduce.Job: Running job: job_1505268150495_0017
17/09/13 15:32:32 INFO mapreduce.Job: Job job_1505268150495_0017 running in uber mode : false
17/09/13 15:32:32 INFO mapreduce.Job: map 0% reduce 0%
17/09/13 15:32:37 INFO mapreduce.Job: map 100% reduce 0%
17/09/13 15:32:43 INFO mapreduce.Job: map 100% reduce 100%
17/09/13 15:32:43 INFO mapreduce.Job: Job job_1505268150495_0017 completed successfully
17/09/13 15:32:43 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=216
FILE: Number of bytes written=272795
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1776
HDFS: Number of bytes written=13
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=3160
Total time spent by all reduces in occupied slots (ms)=3167
Total time spent by all map tasks (ms)=3160
Total time spent by all reduce tasks (ms)=3167
Total vcore-milliseconds taken by all map tasks=3160
Total vcore-milliseconds taken by all reduce tasks=3167
Total megabyte-milliseconds taken by all map tasks=3235840
Total megabyte-milliseconds taken by all reduce tasks=3243008
Map-Reduce Framework
Map input records=21
Map output records=21
Map output bytes=168
Map output materialized bytes=216
Input split bytes=150
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=216
Reduce input records=21
Reduce output records=2
Spilled Records=42
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=54
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=358612992
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1626
File Output Format Counters
Bytes Written=13
5.查看结果
EFdeMacBook-Pro:lib FengZhen$ hadoop fs -ls /mapreduce/data/apachelog/out
Found 2 items
-rw-r--r-- 1 FengZhen supergroup 0 2017-09-13 15:32 /mapreduce/data/apachelog/out/_SUCCESS
-rw-r--r-- 1 FengZhen supergroup 13 2017-09-13 15:32 /mapreduce/data/apachelog/out/part-r-00000
EFdeMacBook-Pro:lib FengZhen$ hadoop fs -text /mapreduce/data/apachelog/out/part-r-00000
200 8
404 13
统计apachelog各访问状态个数(使用MapReduce)的更多相关文章
- 部署Nginx网站服务实现访问状态统计以及访问控制功能
原文:https://blog.51cto.com/11134648/2130987 Nginx专为性能优化而开发,最知名的优点是它的稳定性和低系统资源消耗,以及对HTTP并发连接的高处理能力,单个物 ...
- shell+curl监控网站页面(域名访问状态),并利用sedemail发送邮件
应领导要求,对公司几个主要站点的域名访问情况进行监控.下面分享一个监控脚本,并利用sendemail进行邮件发送. 监控脚本如下:下面是写了一个多线程的网站状态检测脚本,直接从文件中读出站点地址,然后 ...
- Javascript 统计复选框选中个数
var checked = document.getElementsByName("checked_c[]"); var checked_counts = 0; for(var i ...
- 学习笔记_过滤器应用_1(分ip统计网站的访问次数)
分ip统计网站的访问次数 ip count 192.168.1.111 2 192.168.1.112 59 统计工作需要在所有资源之前都执行,那么就可以放到Filter中了. 我们这个过滤器不打算做 ...
- Linux 统计文件夹下文件个数
查看统计当前目录下文件的个数,包括子目录里的. ls -lR| grep "^-" | wc -l Linux下查看某个目录下的文件.或文件夹个数用到3个命令:ls列目录.用gre ...
- 学c语言做练习之统计文件中字符的个数
统计文件中字符的个数(采用命令行参数) #include<stdio.h> #include<stdlib.h> int main(int argc, char *argv[] ...
- 题目--统计一行文本的单词个数(PTA预习题)
PTA预习题——统计一行文本的单词个数 7-1 统计一行文本的单词个数 (15 分) 本题目要求编写程序统计一行字符中单词的个数.所谓“单词”是指连续不含空格的字符串,各单词之间用空格分隔,空格数可以 ...
- shell+curl监控网站页面(域名访问状态),并利用sendemail发送邮件
应领导要求,对公司几个主要站点的域名访问情况进行监控.下面分享一个监控脚本,并利用sendemail进行邮件发送. 监控脚本如下:下面是写了一个多线程的网站状态检测脚本,直接从文件中读出站点地址,然后 ...
- Linux上统计文件夹下文件个数以及目录个数
对于linux终端用户而言,统计文件夹下文件的多少是经常要做的操作,于我而言,我会经常在谷歌搜索一个命令,“如何在linux统计文件夹的个数”,然后点击自己想要的答案,但是有时候不知道统计文件夹命令运 ...
随机推荐
- qtav----ffmeg在ubuntu和win10上的编译和运行
最近在windows上和ubuntu上都安装了qtav并且通过了编译测试,实测播放中英文的视频文件功能正常,有图像有声音. 大致情况是,操作系统ubuntu: wkr@sea-X550JK:~$ ca ...
- 使用Firebug进行断点调试详解
利用Firebug我们可以非常方便地对网页上的任何JavaScript代码进行断点调试. 首先,使用快捷键F12在当前页面打开Firebug,并切换到脚本选项卡. 其次,我们需要为指定的js代码添加断 ...
- bootstrap input 加了 disabled 后台竟然接受不到值
下午做了一个input 值需要不可改变.在input属性中加入了 disabled .在后台接受 var_dump($_POST); 竟然看不到值. <input type="t ...
- Office365client通过本地方式批量部署(即点即用部署)
当企业用户拥有Office 365 ProPlus的许可后,可登陆Office 365.自行下载Officeclient安装部署 以上仅仅是理想情况,实际情况是企业用户较多,IT水平參差不齐,企业的带 ...
- Codeforces Round #FF (Div. 2) A. DZY Loves Hash
DZY has a hash table with p buckets, numbered from 0 to p - 1. He wants to insert n numbers, in the ...
- MySql存储过程及MySql常用流程控制语法
/* 该代码是创建了一个名叫"p4"的存储过程并设置了s1,s2,s3两个int型一个varchar型参数,还可以是其他数据类型,内部创建了x1,x2两个变量 DELIMITER是 ...
- docker安装并配置加速
安装 旧版本的 Docker 称为 docker 或者 docker-engine,使用以下命令卸载旧版本: sudo apt-get remove docker \ docker-engine \ ...
- 九度OJ 1202:排序 (排序)
时间限制:1 秒 内存限制:32 兆 特殊判题:否 提交:19711 解决:6508 题目描述: 对输入的n个数进行排序并输出. 输入: 输入的第一行包括一个整数n(1<=n<=100). ...
- 【python】-- SQLAlchemy操作MySQL
ORM.SQLAchemy orm英文全称object relational mapping,就是对象映射关系程序,简单来说就是类似python这种面向对象的程序来说一切皆对象,但是使用的数据库却都是 ...
- Java编程中的一些常见问题汇总
转载自 http://macrochen.iteye.com/blog/1393502 每天在写Java程序,其实里面有一些细节大家可能没怎么注意,这不,有人总结了一个我们编程中常见的问题.虽然一般 ...