python 遍历hadoop，跟指定列表对比包含列表中值的取出。

import sys

import tstree

fname = 'high_freq_site.list'

tree = tstree.TernarySearchTrie()

tree.loadData(fname)

token = ''

counter =

post = []

# url, count, posttime

for line in sys.stdin:

    line = line.strip()

    arr = line.split()

    if len(arr) != :

        continue

    #print arr

    num = arr[]

    url = arr[]

    posttime = int(arr[])

    if token == '':

        token = url

        counter =

        counter += int(num)

        post.append(posttime)

    elif token == url:

        counter += int(num)

        post.append(posttime)

    elif token != url:

        ret = tree.maxMatch(token)

        if ret and post:

            print '%s\t%s\t%s\t%s' % (ret, token, counter, min(post))

        token = url

        counter =

        counter += int(num)

        post = []

ret = tree.maxMatch(token)

if ret and post:

    print '%s\t%s\t%s\t%s' % (ret, token, counter, min(post))

class TSTNode(object):

    def __init__(self, splitchar):

        self.splitchar = splitchar

        self.data = None

        self.loNode = None

        self.eqNode = None

        self.hiNode = None

class TernarySearchTrie(object):

    def __init__(self):

        self.rootNode = None

    def loadData(self, fname):

        f = open(fname)

        while True:

            line = f.readline()

            if not line:

                break

            line = line.strip()

            node = self.addWord(line)

            if node:

                node.data = line

        f.close()

    def addWord(self, word):

        if not word:

            return None

        charIndex =

        if not self.rootNode:

            self.rootNode = TSTNode(word[])

        currentNode = self.rootNode

        while True:

            charComp = ord(word[charIndex]) - ord(currentNode.splitchar)

            if charComp == :

                charIndex +=

                if charIndex == len(word):

                    return currentNode

                if not currentNode.eqNode:

                    currentNode.eqNode = TSTNode(word[charIndex])

                currentNode = currentNode.eqNode

            elif charComp < :

                if not currentNode.loNode:

                    currentNode.loNode = TSTNode(word[charIndex])

                currentNode = currentNode.loNode

            else:

                if not currentNode.hiNode:

                    currentNode.hiNode = TSTNode(word[charIndex])

                currentNode = currentNode.hiNode

    def maxMatch(self, url):

        ret = None

        currentNode = self.rootNode

        charIndex =

        while currentNode:

            if charIndex >= len(url):

                break

            charComp = ord(url[charIndex]) - ord(currentNode.splitchar)

            if charComp == :

                charIndex +=

                if currentNode.data:

                    ret = currentNode.data

                if charIndex == len(url):

                    return ret

                currentNode = currentNode.eqNode

            elif charComp < :

                currentNode = currentNode.loNode

            else:

                currentNode = currentNode.hiNode

        return ret

if __name__ == '__main__':

    import sys

    fname = 'high_freq_site.list'

    tree = TernarySearchTrie()

    tree.loadData(fname)

    for url in sys.stdin:

        url = url.strip()

        ret = tree.maxMatch(url)

        print ret

python 遍历hadoop，跟指定列表对比包含列表中值的取出。的更多相关文章

数据结构作业——P53算法设计题（6）：设计一个算法，通过一趟遍历确定长度为n的单链表中值最大的结点
思路: 设单链表首个元素为最大值max 通过遍历元素,与最大值max作比较,将较大值附给max 输出最大值max 算法: /* *title:P53页程序设计第6题 *writer:weiyuexin ...
使用python遍历指定城市的一周气温
处于兴趣,写了一个遍历指定城市五天内的天气预报,并转为华氏度显示.把城市名字写到一个列表里这样可以方便的添加城市.并附有详细注释 1 import requests import json#定义一个函 ...
python遍历列表删除多个元素的坑
如下代码,遍历列表,删除列表中的偶数时,结果与预期不符. a = [11, 20, 4, 5, 16, 28] for i in a: if i % 2 == 0: a.remove(i) print ...
python开发学习-day02(元组、字符串、列表、字典深入)
s12-20160109-day02 *:first-child { margin-top: 0 !important; } body>*:last-child { margin-bottom: ...
Python黑帽编程2.3 字符串、列表、元组、字典和集合
Python黑帽编程2.3 字符串.列表.元组.字典和集合本节要介绍的是Python里面常用的几种数据结构.通常情况下,声明一个变量只保存一个值是远远不够的,我们需要将一组或多组数据进行存储.查询 ...
python整理之（字符串、元组、列表、字典）
一.关于字符串的整理总结对于字符串的操作常用的有这些: 字符串的操作通过dir()函数可以查看我们先整理没有下划线的用法,有下划线的暂时不去考虑. 1.capitalize 功能:使字符串的首字母 ...
python基础知识3——基本的数据类型2——列表，元组，字典，集合
磨人的小妖精们啊!终于可以归置下自己的大脑啦,在这里我要把--整型,长整型,浮点型,字符串,列表,元组,字典,集合,这几个知识点特别多的东西,统一的捯饬捯饬,不然一直脑袋里面乱乱的. 一.列表 1.列 ...
Python第三天序列数据类型数值字符串列表元组字典
Python第三天序列数据类型数值字符串列表元组字典数据类型数值字符串列表元组字典序列序列:字符串.列表.元组序列的两个主要特点是索引操作符和切片操作符- 索引操作符让我 ...
Python：list 和 array的对比以及转换时的注意事项
Python:list 和 array的对比以及转换时的注意事项 zoerywzhou@163.com http://www.cnblogs.com/swje/ 作者:Zhouwan 2017-6-4 ...

随机推荐

nodejs 支付宝app支付
[链接]单笔转账到支付宝账户产品介绍更新时间:https://docs.open.alipay.com/309 const crypto = require('crypto') const momen ...
安装vmware 已经配置Centos7
一:安装vmware VMware14 安装CentOS7及其配置;CentOS7配置网桥,做远程连接; 1.VMware14安装进入百度链接,按照图形安装就好了.https://ji ...
HDU5511 : Minimum Cut-Cut
设$d[x]$表示端点位于$x$子树内部的非树边条数,那么有两种情况: $1.$割去的两条树边$(x,fa[x]),(y,fa[y])$中,$x$是$y$的祖先,那么此时需要割去的非树边数量为$d[x ...
Linux下安装配置virtualenv与virtualenvwrapper
一.Linux下安装.配置virtualenv 配置源 #指定清华源下载pip的包 [root@localhost opt]# pip3 install -i https://pypi.tuna.ts ...
px与rem的换算
在线转化工具: http://www.ofmonkey.com/front/rem rem是相对于根元素<html>,这样就意味着,我们只需要在根元素确定一个参考值,这个参考值设置为多少, ...
转：2016年崛起的js项目
近几年 JS 社区创新和演化的速度是有目共睹的,几个月前比较时髦的技术很可能现在已经过时了. 2016 已经过去,你有没有担心错过了什么重要的内容?在这篇调查报告中我们会为你解读社区的主流趋势. 我们 ...
django之ORM补充
本篇导航: QuerySet 中介模型查询优化 extra 一.QuerySet 1.可切片使用Python 的切片语法来限制查询集记录的数目 .它等同于SQL 的LIMIT 和OFFSET 子句 ...
使用Qemu运行Ubuntu文件系统（1）
参考 https://blog.csdn.net/mountzf/article/details/51707853 https://blog.csdn.net/stephen_lu_fahai/art ...
为什么大多公司不要培训班出来的JAVA程序员？
经常听到这样的观点:很多公司不招聘培训班出来的学生.甚至于让人感觉,如果你参加过培训,那简直就是你程序员职业生涯中的一大污点. 撸码J总结了这些公司不要培训班学生的理由: 一:简历造假网上有大量的帖 ...
mysql5.6 sql_mode设置为宽松模式
最近遇到一个很奇怪的事情由于数据人员的需求,现在需要修改mysql的sql_mode sql_mode默认是sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_ ...

python 遍历hadoop， 跟指定列表对比 包含列表中值的取出。

python 遍历hadoop， 跟指定列表对比 包含列表中值的取出。的更多相关文章

随机推荐

热门专题

python 遍历hadoop，跟指定列表对比包含列表中值的取出。

python 遍历hadoop，跟指定列表对比包含列表中值的取出。的更多相关文章