日志分析-mime统计

提取日志中未落入标准字段的mime，分adx，adtype 统计mime的数量和包含js的数量占比

require 'date'

require 'net/http'

require 'uri'

require 'json'

def getmimes (  adx , bodyobj ,totalmimes, statics)

  if bodyobj.class  != Hash

        return

  end

  mimes = []

  bodyobj.keys.each  do |key|

        val = bodyobj[key]

          if val.class == Fixnum || val.class == Float  || val.class == Array

               if key == "imp"

                   if  val[0]['banner'] != nil  && val[0]['banner']['mimes'] != nil

                     statics['includmime'] +=1

                     mimes +=  val[0]['banner']['mimes']

                   end

                   if  val[0]['video'] != nil  && val[0]['video']['mimes'] != nil

                     statics['includmime'] +=1

                     mimes +=  val[0]['video']['mimes']

                   end

               end

          end

  end

  if mimes.length >0

     mimes.each do |mime|

        kk = adx.to_s + "_" + mime.to_s

        if mime.include?"javascript"

          statics['includejs'] +=1

        end

             totalmimes[ kk] +=1

     end

  end

end

filepath = "/data/mvdsp/log/request.log.2017-11-30-12"

puts filepath

i  = 0

totalmimes = Hash.new(0)

statics = Hash.new(0)

begin

  File.open("#{filepath}").each do |line|

    statics['total'] +=1

    if  line.length < 1000

      statics['invalidbody'] +=1

        next

    end

   if ! line.valid_encoding?

     s = line.encode("UTF-16be", :invalid=>:replace, :replace=>"?").encode('UTF-8')

     line = s.gsub(/dr/i,'med')

   end 

   fields = line.split("\t")

    if fields.length <10

        next

    end

    adx  = fields[3]

    ext10  = fields[45]

    adtype = ""

    if ext10.class ==Hash &&  ext10['reqtype']!= nil

      adtype =ext10['reqtype']

    end

    jsonstr = fields[6]

    bodyobj = {}

    begin

     bodyobj =  JSON.parse  jsonstr

    rescue JSON::ParserError

      i +=1

   end

     getmimes(adx,bodyobj,totalmimes ,  statics)

  end

rescue SystemCallError

  puts "warn:: #{filepath} not  exits!!"

end

puts "-----------totalmimes---------------------"

print   totalmimes

sorted = totalmimes.sort_by {|_key, value| value}

puts sorted

puts "-----------statics--------------"

print   statics

puts "--------------------------------"

ruby hash sort by value

hsh ={"a" => 1000, "b" => 10, "c" => 200000}

Hash[hsh.sort_by{|k,v| v}]

#or

hsh.sort_by{|k,v| v}.to_h

#or

hsh.sort_by(&:last)

日志分析-mime统计的更多相关文章

使用Spark进行搜狗日志分析实例——统计每个小时的搜索量
package sogolog import org.apache.spark.rdd.RDD import org.apache.spark.{SparkConf, SparkContext} /* ...
nginx日志分析及其统计PV、UV、IP
一.nginx日志结构 nginx中access.log 的日志结构: $remote_addr 客户端地址 211.28.65.253 $remote_user 客户端用户名称 -- $time_l ...
日志分析_统计每日各时段的的PV,UV
第一步: 需求分析需要哪些字段(时间:每一天,各个时段,id,url,guid,tracTime) 需要分区为天/时 PV(统计记录数) UV(guid去重) 第二步: 实施步骤建Hive表,表列 ...
yhd日志分析(二)
yhd日志分析(二) 继续yhd日志分析,统计数据日期 uv pv 登录人数游客人数平均访问时长二跳率独立ip数 1 分析登录人数 count(distinct endUserId) 游客 ...
mtools 是由MongoDB 官方工程师实现的一套工具集，可以很快速的日志查询分析、统计功能，此外还支持本地集群部署管理.
mtools 是由MongoDB 官方工程师实现的一套工具集,可以很快速的日志查询分析.统计功能,此外还支持本地集群部署管理 https://www.cnblogs.com/littleatp/p/9 ...
shell常用命令及正则辅助日志分析统计
https://www.cnblogs.com/wj033/p/3451618.html 正则日志分析统计 3 grep 'onerror' v3-0621.log | egrep -v '(\d ...
shell脚本实现网站日志分析统计
如何用shell脚本分析与统计每天的访问日志,并发送到电子邮箱,以方便每天了解网站情况.今天脚本小编为大家介绍一款不错的shell脚本,可以实现如上功能. 本脚本统计了:1.总访问量2.总带宽3.独立 ...
elk日志分析平台安装
ELK安装前言什么是ELK? 通俗来讲,ELK是由Elasticsearch.Logstash.Kibana 三个开源软件的组成的一个组合体,这三个软件当中,每个软件用于完成不同的功能,ELK 又 ...
【转】gc日志分析工具
性能测试排查定位问题,分析调优过程中,会遇到要分析gc日志,人肉分析gc日志有时比较困难,相关图形化或命令行工具可以有效地帮助辅助分析. Gc日志参数通过在tomcat启动脚本中添加相关参数生成gc ...

随机推荐

uitableviewcell textlabel detailtextLabel 换行的位置及尺寸问题
我们在使用uitableView的时候,一些简单的cell样式其实是不需要自定义的,但是系统的方法又似乎又无法满足需要,这时候我们就需要在系统上做一些改变来达到我们的需求: 像这种cell,简单分析下 ...
缓存一致性协议 mesi
m : modified e : exlusive s : shared i : invalid 四种状态的转换略过,现在讨论为什么有了这个协议,i++在多线程上还不是安全的. 两个cpu A B同时 ...
hdu5335（bfs，贪心）
In an n∗mn∗m maze, the right-bottom corner is the exit (position (n,m)(n,m) is the exit). In every p ...
python调用shell脚本
# coding=utf-8 //设置文本格式import os //导入os方法print('hello')n=os.system('/home/csliyb/kjqy_x ...
jetty调优
jetty服务器使用遇到一下内存溢出的问题: java.lang.OutOfMemoryError: unable to create new native thread 无法创建新的进程方法: ...
校验总结:校验是否是中英文等等(1.正则校验 2.hibernate volidator)
1.正则校验 import java.util.regex.Matcher;import java.util.regex.Pattern; public class Validation { //-- ...
Centos7虚拟机下配置静态IP
以下为Centos7在VMware下配置静态IP地址,配置完后内外网都可以访问,主机与虚拟机之间可以通信. 1.在VMware里,菜单栏点击[编辑]-->[虚拟网络编辑器],如下图.选择的是NA ...
chrome插件－ Manifest文件中的 background
在Manifest中指定background域可以使扩展常驻后台. background可以包含三种属性,分别是scripts.page和persistent. 如果指定了scripts属性,则Chr ...
Semaphore计数信号量
ExecutorService exec = Executors.newCachedThreadPool(); final Semaphore semp = new Semaphore(5); for ...
python import 其他 package的模块
https://blog.csdn.net/luo123n/article/details/49849649 http://blog.habnab.it/blog/2013/07/21/python- ...

日志分析-mime统计

日志分析-mime统计的更多相关文章

随机推荐

热门专题