使用python找出nginx访问日志中访问次数最多的10个ip排序生成网页

方法1：
linux下使用awk命令

# cat access1.log | awk '{print $1"  "$7"  "$9}'|sort -n|uniq -c |sort -n -r|head -10

方法2：
通过python处理日志

#encoding=utf-8

# 找到日志中的top 10，日志格式如下

#txt = '''100.116.167.9 - - [22/Oct/2017:03:55:53 +0800] "HEAD /check HTTP/1.0" 200 0 "-" "-" "-" ut = 0.001'''

#nodes = txt.split()

#print 'ip:%s, url:%s, code:%s' % (nodes[0],nodes[6],nodes[8])

# 统计ip,url,code的次数，并且生成字典

def log_analysis(log_file, dpath, topn = 10):

    path=log_file

    shandle = open(path, 'r')

    count = 1

    log_dict = {}

    while True:

        line = shandle.readline()

        if line == '':

            break

        #print line

        nodes = line.split()

        #count += 1

        #if count >= 10:

        #    break

        # {(ip,url,code):count}当做字典的key

        #print 'ip:%s, url:%s, code:%s' % (nodes[0],nodes[6],nodes[8])

        # 拼凑字典，如果不存在赋值为1，如果存在则+1

        ip,url,code = nodes[0],nodes[6],nodes[8]

        if (ip, url, code) not in log_dict:

            log_dict[(ip, url, code)] = 1

        else:

            log_dict[(ip, url, code)] = log_dict[(ip, url, code)] + 1

    # 关闭文件句柄

    shandle.close()

    # 对字典进行排序

    #print log_dict

    # ('111.37.21.148', '/index', '200'): 2

    rst_list = log_dict.items()

    #print rst_list

    #

    for j in range(10):

    # 冒泡法根据rst_list中的count排序，找出访问量最大的10个IP

        for i in range(0,len(rst_list) - 1):

            if rst_list[i][1] > rst_list[i+1][1]:

                temp = rst_list[i]

                rst_list[i] = rst_list[i+1]

                rst_list[i+1] = temp

    need_list = rst_list[-1:-topn - 1:-1]

    # 打印出top 10访问日志，并写入网页中

    title = 'nginx访问日志'

    tbody = ''

    for i in need_list:

        tbody += '<tr>\n<td>%s</td><td>%s</td><td>%s</td><td>%s</td>\n<tr>\n' % (i[1],i[0][0],i[0][1],i[0][2])

    html_tpl = '''

    <!DOCTYPE html>

    <html>

        <head>

            <meta charset="utf-8">

            <title>{title}</title>

        </head>

        <body>

            <table border="1" cellspacing="0" cellpadding="0" color='pink'>

                <thead>

                    <tr cellspacing="0" cellpadding="0">

                        <th>访问次数</th>

                        <th>ip</th>

                        <th>url</th>

                        <th>http_code</th>

                    </tr>

                </thead>

                {tbody}

            </table>

        </body>

    </html>

    '''

    html_handle = open(dpath,'w')

    html_handle.write(html_tpl.format(title = title, tbody = tbody))

    html_handle.close()

# 函数入口

if __name__ == '__main__':

    # nginx日志文件

    log_file = 'access1.log'

    dpath = 'top10.html'

    # topn 表示去top多少个

    # 不传，默认10个

    topn = 10

    # log_analysis(log_file, dpath)

    log_analysis(log_file,dpath,topn)

方法2

# 统计nginx日志中的前十名

def static_file(file_name):

    res_dict = {}

    with open(file_name) as f:

        for line in f:

            if line == '\n':

                continue

            # ['100.116.x.x', '-', '-', '[08/Feb/2018:14:37:13', '+0800]', '"HEAD',

            # '/check', 'HTTP/1.0"', '200', '0', '"-"', '"-"', '"-"', 'ut', '=', '0.002']

            tmp = line.split()

            # print(tmp)

            tup = (tmp[0],tmp[8])

            # 赋值

            res_dict[tup] = res_dict.get(tup,0) + 1

    return res_dict

def generate_html(rst_list):

    str_html = '<table border="1" cellpading=0 cellspacing=0>'

    str_html += "<tr><th>ip地址</th><th>状态码</th><th>次数</th></tr>"

    html_tmpl = '<tr><td>%s</td><td>%s</td><td>%s</td></tr>'

    for (ip, status),count in rst_list[-20:]:

        str_html += html_tmpl % (ip,status,count)

    str_html += '</table>'

    return str_html

def write_to_html(html_list):

    with open('res.html', 'w') as f:

        f.write(html_list)

def main():

    res_dict = static_file('voice20180208.log')

    res_list = sorted(res_dict.items(), key = lambda x:x[1])

    # html_content = generate_html(res_list[-10:])

    html_content = generate_html(res_list[-1:-20:-1])

    write_to_html(html_content)

if __name__ == "__main__":

    main()

使用python找出nginx访问日志中访问次数最多的10个ip排序生成网页的更多相关文章

python 找出一篇文章中出现次数最多的10个单词
#!/usr/bin/python #Filename: readlinepy.py import sys,re urldir=r"C:\python27\a.txt" disto ...
查询nginx访问日志中访问次数最多的前10个IP地址
cat log | cut -d ' ' -f 1 | sort | uniq -c | sort -nr | awk '{print $0}' | head -n 10
【python cookbook】【数据结构与算法】12.找出序列中出现次数最多的元素
问题:找出一个元素序列中出现次数最多的元素是什么解决方案:collections模块中的Counter类正是为此类问题所设计的.它的一个非常方便的most_common()方法直接告诉你答案. # ...
nginx访问日志中添加接口返回值
因为nginx作为web服务器时,会代理后端的一些接口,这时访问日志中只能记录访问接口的status码,也就是说,只能获得200.404 这些的值那么如何获得接口返回的response值呢? 下面开 ...
nginx日志中访问最多的100个ip及访问次数
nginx日志中访问最多的100个ip及访问次数 awk '{print $1}' /opt/software/nginx/logs/access.log| sort | uniq -c | sort ...
【python cookbook】找出序列中出现次数最多的元素
问题 <Python Cookbook>中有这么一个问题,给定一个序列,找出该序列出现次数最多的元素.例如: words = [ 'look', 'into', 'my', 'eyes', ...
Python找出列表中的最大数和最小数
Python找出列表中数字的最大值和最小值思路: 先使用冒泡排序将列表中的数字从小到大依次排序取出数组首元素和尾元素运行结果: 源代码: 1 ''' 2 4.编写函数,功能:找出多个数中的最大值 ...
python 找出字符串中出现次数最多的字母
# 请大家找出s=”aabbccddxxxxffff”中出现次数最多的字母 # 第一种方法,字典方式: s="aabbccddxxxxffff" count ={} for i ...
FCC JS基础算法题(5):Return Largest Numbers in Arrays(找出多个数组中的最大数)
题目描述: 找出多个数组中的最大数右边大数组中包含了4个小数组,分别找到每个小数组中的最大值,然后把它们串联起来,形成一个新数组.提示:你可以用for循环来迭代数组,并通过arr[i]的方式来访问数组 ...

随机推荐

java NIO入门【原】
server package com.server; import java.net.InetSocketAddress; import java.nio.ByteBuffer; import jav ...
dom4j基础教程【转】
转自 http://blog.csdn.net/whatlonelytear/article/details/42234937 ,但经过大量美化及补充. Dom4j是一个易用的.开源的库,用于XML, ...
如何跨线程调用Windows窗体控件
通过一个子线程来操作主线程中的控件,但是,这样作会出现一个问题(如图1所示),就是TextBox控件是在主线程中创建的,在子线程中并没有对其进行创建,也就是从不是创建控件的线程访问它.那么,如何解决跨 ...
ACM-ICPC 2018 南京赛区网络预赛 J Sum (思维+打表)
https://nanti.jisuanke.com/t/30999 题意 f(i)表示i能拆分成两个数的乘积,且要求这两个数中各自都没有出现超过1次的质因子的方案数.每次给出n,求∑(n,i=1)f ...
HDU 1039(字符串判断 **)
题意是检查一个字符串是否满足三个条件: 一.至少有一个元音字母.二.不能出现三个连续的元音或三个连续的辅音.三.除了 ee 和 oo 外不能出现两个连续相同字母. 若三个条件都能满足,该字符串满足条件 ...
哪些领域适合开发微信小程序
什么是小程序?小程序的实质就是webapp,最典型的案例是接入微信的“滴滴打车”.虽然没有下载安装APP,但通过微信完全可以正常使用滴滴打车的服务,需要的定位.支付等底层能力,微信都可以提供. 张小龙 ...
神奇的Content-Type——在JSON中玩转XXE攻击
大家都知道,许多WEB和移动应用都依赖于Client-Server的WEB通信交互服务.而在如SOAP.RESTful这样的WEB服务中,最常见的数据格式要数XML和JSON.当WEB服务使用XML或 ...
课堂测试——jsp登录界面设计
实现结果:在login.jsp页面提交用户名和密码(可以验证是否为空),点击登录跳转到loginResult.jsp页面进行验证并显示结果 JSP + JDBC + MySQL login.jsp 设 ...
DNN网络（三）python下用Tensorflow实现DNN网络以及Adagrad优化器
摘自: https://www.kaggle.com/zoupet/neural-network-model-for-house-prices-tensorflow 一.实现功能简介: 本文摘自Kag ...
DNN网络（一）
摘自 https://www.cnblogs.com/pinard/p/6418668.html 一.DNN 简介 1.DNN的层次可以分为三层输入层(input layer) 隐藏层(hidden ...

使用python找出nginx访问日志中访问次数最多的10个ip排序生成网页

使用python找出nginx访问日志中访问次数最多的10个ip排序生成网页的更多相关文章

随机推荐

热门专题