【439】Tweets processing by Python
参数说明:
- coordinates:Represents the geographic location of this Tweet as reported by the user or client application. The inner coordinates array is formatted as geoJSON (longitude first, then latitude).
1.文本文件转 json 格式
读取 txt 文件中的 tweets 文本,将其转为 json 格式,可以打印输出,也可以提取详细信息
代码:
import json
import os folderpath = r"D:\Twitter Data\Data\test"
files = os.listdir(folderpath)
os.chdir(folderpath) # get the first txt file
tweets_data_path = files[0] # store json format file in this array
tweets_data = []
tweets_file = open(tweets_data_path, "r")
for line in tweets_file:
try:
tweet = json.loads(line)
tweets_data.append(tweet)
except:
continue
# print json format file with indentation
print(json.dumps(tweets_data[0], indent=4))
输出:
{
"created_at": "Tue Jun 25 20:44:34 +0000 2019",
"id": 1143621025550049280,
"id_str": "1143621025550049280",
"text": "Australia beat the Poms overnight \ud83d\ude01\ud83c\udfcf\ud83c\udde6\ud83c\uddfa\ud83c\udff4\udb40\udc67\udb40\udc62\udb40\udc65\udb40\udc6e\udb40\udc67\udb40\udc7f #AUSvENG #CmonAussie #CWC19",
"source": "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>",
"truncated": false,
"in_reply_to_status_id": null,
"in_reply_to_status_id_str": null,
"in_reply_to_user_id": null,
"in_reply_to_user_id_str": null,
"in_reply_to_screen_name": null,
"user": {
"id": 252426781,
"id_str": "252426781",
"name": "Willy Aitch",
"screen_name": "WillyAitch",
"location": "Melbourne, Victoria",
"url": null,
"description": "September 2017 to February 2018, was the greatest 5 months ever. Richmond \ud83d\udc2f\ud83d\udc2f\ud83d\udc2fwon the 2017 AFL Premiership! Philadelphia Eagles \ud83e\udd85\ud83e\udd85\ud83e\udd85 won Super Bowl LII",
"translator_type": "none",
"protected": false,
"verified": false,
"followers_count": 417,
"friends_count": 1061,
"listed_count": 15,
"favourites_count": 18852,
"statuses_count": 17796,
"created_at": "Tue Feb 15 04:55:59 +0000 2011",
"utc_offset": null,
"time_zone": null,
"geo_enabled": true,
"lang": null,
"contributors_enabled": false,
"is_translator": false,
"profile_background_color": "C0DEED",
"profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png",
"profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png",
"profile_background_tile": false,
"profile_link_color": "1DA1F2",
"profile_sidebar_border_color": "C0DEED",
"profile_sidebar_fill_color": "DDEEF6",
"profile_text_color": "333333",
"profile_use_background_image": true,
"profile_image_url": "http://pbs.twimg.com/profile_images/1112669591342211072/rnbV0dCK_normal.jpg",
"profile_image_url_https": "https://pbs.twimg.com/profile_images/1112669591342211072/rnbV0dCK_normal.jpg",
"profile_banner_url": "https://pbs.twimg.com/profile_banners/252426781/1522377977",
"default_profile": true,
"default_profile_image": false,
"following": null,
"follow_request_sent": null,
"notifications": null
},
"geo": null,
"coordinates": null,
"place": {
"id": "01864a8a64df9dc4",
"url": "https://api.twitter.com/1.1/geo/id/01864a8a64df9dc4.json",
"place_type": "city",
"name": "Melbourne",
"full_name": "Melbourne, Victoria",
"country_code": "AU",
"country": "Australia",
"bounding_box": {
"type": "Polygon",
"coordinates": [
[
[
144.593742,
-38.433859
],
[
144.593742,
-37.511274
],
[
145.512529,
-37.511274
],
[
145.512529,
-38.433859
]
]
]
},
"attributes": {}
},
"contributors": null,
"is_quote_status": false,
"quote_count": 0,
"reply_count": 0,
"retweet_count": 0,
"favorite_count": 0,
"entities": {
"hashtags": [
{
"text": "AUSvENG",
"indices": [
46,
54
]
},
{
"text": "CmonAussie",
"indices": [
55,
66
]
},
{
"text": "CWC19",
"indices": [
67,
73
]
}
],
"urls": [],
"user_mentions": [],
"symbols": []
},
"favorited": false,
"retweeted": false,
"filter_level": "low",
"lang": "en",
"timestamp_ms": "1561495474599"
}
2. 读取关键字内容
通过 .keys() 获取所有的键值
代码:
import json
import os folderpath = r"D:\Twitter Data\Data\test"
files = os.listdir(folderpath)
os.chdir(folderpath) # get the first txt file
tweets_data_path = files[0] # store json format file in this array
tweets_data = []
tweets_file = open(tweets_data_path, "r")
for line in tweets_file:
try:
tweet = json.loads(line)
tweets_data.append(tweet)
except:
continue for k in tweets_data[0].keys():
print(k)
输出:
created_at
id
id_str
text
source
truncated
in_reply_to_status_id
in_reply_to_status_id_str
in_reply_to_user_id
in_reply_to_user_id_str
in_reply_to_screen_name
user
geo
coordinates
place
contributors
is_quote_status
quote_count
reply_count
retweet_count
favorite_count
entities
favorited
retweeted
filter_level
lang
timestamp_ms
3. 输出键值信息
代码:
import json
import os folderpath = r"D:\Twitter Data\Data\test"
files = os.listdir(folderpath)
os.chdir(folderpath) # get the first txt file
tweets_data_path = files[0] # store json format file in this array
tweets_data = []
tweets_file = open(tweets_data_path, "r")
for line in tweets_file:
try:
tweet = json.loads(line)
tweets_data.append(tweet)
except:
continue for k in tweets_data[0].keys():
print(k, ":", tweets_data[0][k])
print()
输出:
created_at : Tue Jun 25 20:44:34 +0000 2019 id : 1143621025550049280 id_str : 1143621025550049280 text : Australia beat the Poms overnight【439】Tweets processing by Python的更多相关文章
- 【转】实习小记-python 内置函数__eq__函数引发的探索
[转]实习小记-python 内置函数__eq__函数引发的探索 乱写__eq__会发生啥?请看代码.. >>> class A: ... def __eq__(self, othe ...
- 【转】实习小记-python中可哈希对象是个啥?what is hashable object in python?
[转]实习小记-python中可哈希对象是个啥?what is hashable object in python? 废话不多说直接祭上python3.3x的文档:(原文链接) object.__ha ...
- 【6】TensorFlow光速入门-python模型转换为tfjs模型并使用
本文地址:https://www.cnblogs.com/tujia/p/13862365.html 系列文章: [0]TensorFlow光速入门-序 [1]TensorFlow光速入门-tenso ...
- 【Linux】CentOS下升级Python和Pip版本全自动化py脚本
[Linux]CentOS下升级Python和Pip版本全自动化py脚本 CentOS7.6自带py2.7和py3.6 想要安装其它版本的话就要自己重新下载和编译py其它版本并且配置环境,主要是软链接 ...
- 【原】使用Json作为Python和C#混合编程时对象转换的中间文件
一.Python中自定义类对象json字符串化的步骤[1] 1. 用 json 或者simplejson 就可以: 2.定义转换函数: 3. 定义类 4. 生成对象 5.dumps执行,引入转换函 ...
- 【解决方案】django初始化执行python manage.py migrate命令后,除default数据库之外的其他数据库中的表没有创建出来
[问题原因]:django工程中存在多个应用,每个应用都指定了对应的数据库.执行python manage.py migrate命令时没有指定数据库,将只初始化默认的default数据库. [解决方案 ...
- 【转】利用Psyco提升Python运行速度
转自:http://www.leeon.me/a/use-Psyco-to-improve-Python-speed Psyco 是严格地在 Python 运行时进行操作的.也就是说,Python 源 ...
- 【转】你真的理解Python中MRO算法吗?
你真的理解Python中MRO算法吗? MRO(Method Resolution Order):方法解析顺序. Python语言包含了很多优秀的特性,其中多重继承就是其中之一,但是多重继承会引发很多 ...
- 【算法】八皇后问题 Python实现
[八皇后问题] 问题: 国际象棋棋盘是8 * 8的方格,每个方格里放一个棋子.皇后这种棋子可以攻击同一行或者同一列或者斜线(左上左下右上右下四个方向)上的棋子.在一个棋盘上如果要放八个皇后,使得她们互 ...
随机推荐
- Python数据分析(基础)
目录: Python基础: Python基本用法:控制语句.函数.文件读写等 Python基本数据结构:字典.集合等 Numpy:简述 Pandas:简述 一. Python基础: 1.1 文件读取 ...
- ES的入门学习
ES的入门:ES的雇员文档的设计和实现功能 ES的存放中包括:索引,类型,文档,字段 PUT /megacorp/employee/1{{ "first_name" : " ...
- 网络 IP
参考原文 http://blog.csdn.net/dan15188387481/article/details/49873923 1. 原始的IP地址表示方法及其分类(近几年慢慢淘汰) IP ...
- 【安卓周记】笔记复习记录:No.2
[安卓] 1. Activity横竖屏切换生命周期变化,分三种情况: 1. 没有配置android:configChanges:每次横竖屏切换都会重新走一遍生命周期 2. 配置android:conf ...
- (尚011)Vue事件处理
test011.html <!DOCTYPE html><html lang="en"><head> <meta charset=&quo ...
- P1894 [USACO4.2]完美的牛栏The Perfect Stall
题目描述 农夫约翰上个星期刚刚建好了他的新牛棚,他使用了最新的挤奶技术.不幸的是,由于工程问题,每个牛栏都不一样.第一个星期,农夫约翰随便地让奶牛们进入牛栏,但是问题很快地显露出来:每头奶牛都只愿意在 ...
- Python3正则表达式
正则表达式是一个特殊的字符序列,他能帮助你方便的检查一个字符串是否与某种模式匹配. re.match函数 re.match尝试从字符串的起始位置匹配一个模式,如果不是起始位置匹配成功的话,matc ...
- golang-复习1
结构体: 是一种数据类型 type Person struct{ //l类型定义,地位等价与 int byte boo string ……通常放在全局位置 name string sex byte ...
- Chrome浏览器控制台[DOM] Password field is not contained in a form:
[DOM] Password field is not contained in a form: ( [DOM]密码字段不包含在form表单中) 解决方案:添加一层form标签 <div cla ...
- Tkinter 之PanedWindow标签
一.参数说明 参数 作用 background(bg) 设置背景颜色 borderwidth(bd) 设置边框宽度 cursor 指定当鼠标在PanedWindow上飘过的时候的鼠标样式 handl ...