Python 新浪微博元素 (Word, Screen Name)词汇多样性

CODE:

#!/usr/bin/python

# -*- coding: utf-8 -*-

'''

Created on 2014-7-10

@author: guaguastd

@name: weiboLexicalDiversity.py

'''

if __name__ == '__main__':

    # get weibo_api to access sina api

    from sinaWeiboLogin import sinaWeiboLogin

    sinaWeiboApi = sinaWeiboLogin()

    # import sinaWeibo

    from sinaWeibo import extractWeiboEntities

    # import sinaWeoboStatuses

    from sinaWeiboStatuses import publicTimeline

    # import sinaWeiboFrequency

    from sinaWeiboLexicalDiversity import weibo_lexical_diversity, weibo_average_words

    # get the new 5 weibo

    weiboNum = 5

    statuses = publicTimeline(sinaWeiboApi, weiboNum)

    status_texts,screen_names,words = extractWeiboEntities(statuses)  

    for token in (words, screen_names):

        print '\rLexical diversity of %s: ' % token

        print weibo_lexical_diversity(token)  

    for status in (status_texts,):

        print '\rAverage words of %s: ' % status

        print weibo_average_words(status)

RESULT:

Lexical diversity of [u'[moc\u8f6c\u53d1]2014\u65b0\u6b3e\u590f\u88c5\u5370\u82b1\u77ed\u8896\u8fde\u8863\u88d9\u9ad8\u7aef\u5927\u7801\u4e2d\u5e74\u5973\u88c5\u4fee\u8eab\u663e\u7626\u857e\u4e1d\u8fde\u8863\u88d9', u'http://t.cn/RvCLdgN', u'[\u795e\u9a6c]\u963f\u4f9d\u83b2\u8fde\u8863\u88d9', u'ccdd\u5973\u88c52014\u590f\u88c5\u65b0\u6b3e', u'\u97e9\u7248', u'\u5c0f\u9999\u98ce\u857e\u4e1d\u516c\u4e3b\u88d9', u'\u6b63\u54c1', u'http://t.cn/RvCyo4X', u'\u590f\u65e5\u5ea6\u5047\u6e05\u51c9\u88c5~~>>>>>>\u559c\u6b22\u70b9\u8fd9\u91cc\uff1ahttp://t.cn/RvEqd5R', u'\u6211\u6b63\u5728\u6b66\u4fa0\u5361\u724c\u624b\u6e38\u201c\u5927\u638c\u95e8\u201d\u4e2d\u51b2\u51fb\u8840\u6218\u699c\u5355\uff0c\u613f\u5404\u4f4d\u5927\u4fa0\u62d4\u5200\u76f8\u52a9\uff01\u6ce8\u518c\u5927\u638c\u95e8\uff0c\u586b\u5199\u6211\u7684\u9080\u8bf7\u7801\u30102zr7\u3011\uff0c\u5171\u540c\u83b7\u53d6\u4e30\u539a\u5956\u52b1\u3002http://t.cn/8FUZSTe', u'@\u5927\u638c\u95e8\u6e38\u620f', u'\u8f7b\u8f68\u65e9\u4e0a\u7684\u7a7a\u8c03\u5f00\u5f97\u7565\u5927']:

1.0

Lexical diversity of [u'kathyisangel', u'wangbinrona', u'\u5168\u7403\u6d41\u884c\u670d\u9970\u6f6e\u7f8e\u98ce\u5c1a\u63a7', u'\u624b\u673a\u7528\u62372454403221', u'\u6b63\u76f4\u4f60\u4e00\u8138\u7684\u52c7\u6562\u541b']:

1.0

Average words of [u'[moc\u8f6c\u53d1]2014\u65b0\u6b3e\u590f\u88c5\u5370\u82b1\u77ed\u8896\u8fde\u8863\u88d9\u9ad8\u7aef\u5927\u7801\u4e2d\u5e74\u5973\u88c5\u4fee\u8eab\u663e\u7626\u857e\u4e1d\u8fde\u8863\u88d9  http://t.cn/RvCLdgN', u'[\u795e\u9a6c]\u963f\u4f9d\u83b2\u8fde\u8863\u88d9 ccdd\u5973\u88c52014\u590f\u88c5\u65b0\u6b3e \u97e9\u7248 \u5c0f\u9999\u98ce\u857e\u4e1d\u516c\u4e3b\u88d9 \u6b63\u54c1  http://t.cn/RvCyo4X', u'\u590f\u65e5\u5ea6\u5047\u6e05\u51c9\u88c5~~>>>>>>\u559c\u6b22\u70b9\u8fd9\u91cc\uff1ahttp://t.cn/RvEqd5R', u'\u6211\u6b63\u5728\u6b66\u4fa0\u5361\u724c\u624b\u6e38\u201c\u5927\u638c\u95e8\u201d\u4e2d\u51b2\u51fb\u8840\u6218\u699c\u5355\uff0c\u613f\u5404\u4f4d\u5927\u4fa0\u62d4\u5200\u76f8\u52a9\uff01\u6ce8\u518c\u5927\u638c\u95e8\uff0c\u586b\u5199\u6211\u7684\u9080\u8bf7\u7801\u30102zr7\u3011\uff0c\u5171\u540c\u83b7\u53d6\u4e30\u539a\u5956\u52b1\u3002http://t.cn/8FUZSTe @\u5927\u638c\u95e8\u6e38\u620f ', u'\u8f7b\u8f68\u65e9\u4e0a\u7684\u7a7a\u8c03\u5f00\u5f97\u7565\u5927']:

2.4

Python 新浪微博元素 (Word, Screen Name)词汇多样性的更多相关文章

Python 对Twitter tweet的元素 (Word, Screen Name, Hash Tag)的词汇多样性分析
CODE: #!/usr/bin/python # -*- coding: utf-8 -*- ''' Created on 2014-7-3 @author: guaguastd @name: tw ...
Python 对新浪微博的博文元素 (Word, Screen Name)的频率分析
CODE: #!/usr/bin/python # -*- coding: utf-8 -*- ''' Created on 2014-7-9 @author: guaguastd @name: we ...
Python 对Twitter tweet的元素 (Word, Screen Name, Hash Tag)的频率分析
CODE: #!/usr/bin/python # -*- coding: utf-8 -*- ''' Created on 2014-7-2 @author: guaguastd @name: tw ...
【NLP】Python NLTK获取文本语料和词汇资源
Python NLTK 获取文本语料和词汇资源作者:白宁超 2016年11月7日13:15:24 摘要:NLTK是由宾夕法尼亚大学计算机和信息科学使用python语言实现的一种自然语言工具包,其收集 ...
python统计元素重复次数
python统计元素重复次数 # !/usr/bin/python3.4 # -*- coding: utf-8 -*- from collections import Counter arr = [ ...
python如何转换word格式、读取word内容、转成html
# python如何转换word格式.读取word内容.转成html? import docx from win32com import client as wc # 首先将doc转换成docx wo ...
Python Appium 元素定位方法简单介绍
Python Appium 元素定位常用的八种定位方法(与selenium通用) # id定位 driver.find_element_by_id() # name定位 driver.find_ ...
借助python工具从word文件中抽取相关表的定义，最后组装建表语句-非常好
借助python工具从word文件中抽取表的定义,最后组装建表语句-非常好 --如有转载请以超链接的方式注明原文章出处,谢谢大家.请尊重每一位乐于分享的原创者 1.python脚本 ## -*- co ...
python+selenium 元素被定位到而且click()也提示执行成功，但是页面就是没有变化和跳转。
python+selenium 元素被定位到而且click()也提示执行成功,但是页面就是没有变化和跳转. 如果多次定位和click(),有时候会跳转. 我遇到很多次就是很郁闷,有人说,操作太快的,页 ...

随机推荐

理光C5502A 打印模糊问题
1.这款打印机好几W,我来的时候就有了.挺高端的. 2.来的时候由于网络没建成.建成之后,全部设置成网络打印机. 3.可以扫描成jpg\pdf,并且可以通过共享设置成扫描到目的地. 4.还有其它一些功 ...
shell字符串替换
java--线程状态
1.新建状态 Thread t1 = new Thread(); 创建之后,就已经有了相应的内存和其他资源,但是还是处于不可运行状态. 2.就绪状态当一个线程使用.start()启动之后就处于就绪状 ...
高性能MySql学习笔记——锁、事务、隔离级别（转）
为什么需要锁? 因为数据库要解决并发控制问题.在同一时刻,可能会有多个客户端对Table1.rown进行操作,比如有的在读取该行数据,其他的尝试去删除它.为了保证数据的一致性,数据库就要对这种并发操作 ...
python验证码识别
关于利用python进行验证码识别的一些想法用python加“验证码”为关键词在baidu里搜一下,可以找到很多关于验证码识别的文章.我大体看了一下,主要方法有几类:一类是通过对图片进行处理,然后 ...
java.lang.NoClassDefFoundError: org/apache/lucene/analysis/synonym/SynonymFilter
2013-6-24 13:28:51 org.apache.solr.common.SolrException log 严重: java.lang.NoClassDefFoundError: org/ ...
（1）ActivityThread分析
1. 入口. 曾经一直都说Activity的人口是onCreate方法.事实上android上一个应用的入口,应该是ActivityThread.和普通的java类一样,入口是一个main方法. pu ...
HttpURLConnection中使用代理（Proxy）及其验证（Authentication）
HttpURLConnection中使用代理(Proxy)及其验证(Authentication) 使用Java的HttpURLConnection类可以实现HttpClient的功能,而不需要依赖任 ...
jQuery.mobile.changePage() | jQuery Mobile API Documentation
jQuery.mobile.changePage() | jQuery Mobile API Documentation <script> $.mobile.changePage( &qu ...
第二代map-reduce架构YARN解析
需求我们在考虑hadoop map-reduce框架的时候,最重要需包括: 1. reliability 可靠性,主要是jobtracker,resource manager可靠性 2. avail ...

Python 新浪微博元素 (Word, Screen Name)词汇多样性

Python 新浪微博元素 (Word, Screen Name)词汇多样性的更多相关文章

随机推荐

热门专题