【439】Tweets processing by Python
参数说明:
- coordinates:Represents the geographic location of this Tweet as reported by the user or client application. The inner coordinates array is formatted as geoJSON (longitude first, then latitude).
1.文本文件转 json 格式
读取 txt 文件中的 tweets 文本,将其转为 json 格式,可以打印输出,也可以提取详细信息
代码:
import json
import os folderpath = r"D:\Twitter Data\Data\test"
files = os.listdir(folderpath)
os.chdir(folderpath) # get the first txt file
tweets_data_path = files[0] # store json format file in this array
tweets_data = []
tweets_file = open(tweets_data_path, "r")
for line in tweets_file:
try:
tweet = json.loads(line)
tweets_data.append(tweet)
except:
continue
# print json format file with indentation
print(json.dumps(tweets_data[0], indent=4))
输出:
{
"created_at": "Tue Jun 25 20:44:34 +0000 2019",
"id": 1143621025550049280,
"id_str": "1143621025550049280",
"text": "Australia beat the Poms overnight \ud83d\ude01\ud83c\udfcf\ud83c\udde6\ud83c\uddfa\ud83c\udff4\udb40\udc67\udb40\udc62\udb40\udc65\udb40\udc6e\udb40\udc67\udb40\udc7f #AUSvENG #CmonAussie #CWC19",
"source": "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>",
"truncated": false,
"in_reply_to_status_id": null,
"in_reply_to_status_id_str": null,
"in_reply_to_user_id": null,
"in_reply_to_user_id_str": null,
"in_reply_to_screen_name": null,
"user": {
"id": 252426781,
"id_str": "252426781",
"name": "Willy Aitch",
"screen_name": "WillyAitch",
"location": "Melbourne, Victoria",
"url": null,
"description": "September 2017 to February 2018, was the greatest 5 months ever. Richmond \ud83d\udc2f\ud83d\udc2f\ud83d\udc2fwon the 2017 AFL Premiership! Philadelphia Eagles \ud83e\udd85\ud83e\udd85\ud83e\udd85 won Super Bowl LII",
"translator_type": "none",
"protected": false,
"verified": false,
"followers_count": 417,
"friends_count": 1061,
"listed_count": 15,
"favourites_count": 18852,
"statuses_count": 17796,
"created_at": "Tue Feb 15 04:55:59 +0000 2011",
"utc_offset": null,
"time_zone": null,
"geo_enabled": true,
"lang": null,
"contributors_enabled": false,
"is_translator": false,
"profile_background_color": "C0DEED",
"profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png",
"profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png",
"profile_background_tile": false,
"profile_link_color": "1DA1F2",
"profile_sidebar_border_color": "C0DEED",
"profile_sidebar_fill_color": "DDEEF6",
"profile_text_color": "333333",
"profile_use_background_image": true,
"profile_image_url": "http://pbs.twimg.com/profile_images/1112669591342211072/rnbV0dCK_normal.jpg",
"profile_image_url_https": "https://pbs.twimg.com/profile_images/1112669591342211072/rnbV0dCK_normal.jpg",
"profile_banner_url": "https://pbs.twimg.com/profile_banners/252426781/1522377977",
"default_profile": true,
"default_profile_image": false,
"following": null,
"follow_request_sent": null,
"notifications": null
},
"geo": null,
"coordinates": null,
"place": {
"id": "01864a8a64df9dc4",
"url": "https://api.twitter.com/1.1/geo/id/01864a8a64df9dc4.json",
"place_type": "city",
"name": "Melbourne",
"full_name": "Melbourne, Victoria",
"country_code": "AU",
"country": "Australia",
"bounding_box": {
"type": "Polygon",
"coordinates": [
[
[
144.593742,
-38.433859
],
[
144.593742,
-37.511274
],
[
145.512529,
-37.511274
],
[
145.512529,
-38.433859
]
]
]
},
"attributes": {}
},
"contributors": null,
"is_quote_status": false,
"quote_count": 0,
"reply_count": 0,
"retweet_count": 0,
"favorite_count": 0,
"entities": {
"hashtags": [
{
"text": "AUSvENG",
"indices": [
46,
54
]
},
{
"text": "CmonAussie",
"indices": [
55,
66
]
},
{
"text": "CWC19",
"indices": [
67,
73
]
}
],
"urls": [],
"user_mentions": [],
"symbols": []
},
"favorited": false,
"retweeted": false,
"filter_level": "low",
"lang": "en",
"timestamp_ms": "1561495474599"
}
2. 读取关键字内容
通过 .keys() 获取所有的键值
代码:
import json
import os folderpath = r"D:\Twitter Data\Data\test"
files = os.listdir(folderpath)
os.chdir(folderpath) # get the first txt file
tweets_data_path = files[0] # store json format file in this array
tweets_data = []
tweets_file = open(tweets_data_path, "r")
for line in tweets_file:
try:
tweet = json.loads(line)
tweets_data.append(tweet)
except:
continue for k in tweets_data[0].keys():
print(k)
输出:
created_at
id
id_str
text
source
truncated
in_reply_to_status_id
in_reply_to_status_id_str
in_reply_to_user_id
in_reply_to_user_id_str
in_reply_to_screen_name
user
geo
coordinates
place
contributors
is_quote_status
quote_count
reply_count
retweet_count
favorite_count
entities
favorited
retweeted
filter_level
lang
timestamp_ms
3. 输出键值信息
代码:
import json
import os folderpath = r"D:\Twitter Data\Data\test"
files = os.listdir(folderpath)
os.chdir(folderpath) # get the first txt file
tweets_data_path = files[0] # store json format file in this array
tweets_data = []
tweets_file = open(tweets_data_path, "r")
for line in tweets_file:
try:
tweet = json.loads(line)
tweets_data.append(tweet)
except:
continue for k in tweets_data[0].keys():
print(k, ":", tweets_data[0][k])
print()
输出:
created_at : Tue Jun 25 20:44:34 +0000 2019 id : 1143621025550049280 id_str : 1143621025550049280 text : Australia beat the Poms overnight【439】Tweets processing by Python的更多相关文章
- 【转】实习小记-python 内置函数__eq__函数引发的探索
[转]实习小记-python 内置函数__eq__函数引发的探索 乱写__eq__会发生啥?请看代码.. >>> class A: ... def __eq__(self, othe ...
- 【转】实习小记-python中可哈希对象是个啥?what is hashable object in python?
[转]实习小记-python中可哈希对象是个啥?what is hashable object in python? 废话不多说直接祭上python3.3x的文档:(原文链接) object.__ha ...
- 【6】TensorFlow光速入门-python模型转换为tfjs模型并使用
本文地址:https://www.cnblogs.com/tujia/p/13862365.html 系列文章: [0]TensorFlow光速入门-序 [1]TensorFlow光速入门-tenso ...
- 【Linux】CentOS下升级Python和Pip版本全自动化py脚本
[Linux]CentOS下升级Python和Pip版本全自动化py脚本 CentOS7.6自带py2.7和py3.6 想要安装其它版本的话就要自己重新下载和编译py其它版本并且配置环境,主要是软链接 ...
- 【原】使用Json作为Python和C#混合编程时对象转换的中间文件
一.Python中自定义类对象json字符串化的步骤[1] 1. 用 json 或者simplejson 就可以: 2.定义转换函数: 3. 定义类 4. 生成对象 5.dumps执行,引入转换函 ...
- 【解决方案】django初始化执行python manage.py migrate命令后,除default数据库之外的其他数据库中的表没有创建出来
[问题原因]:django工程中存在多个应用,每个应用都指定了对应的数据库.执行python manage.py migrate命令时没有指定数据库,将只初始化默认的default数据库. [解决方案 ...
- 【转】利用Psyco提升Python运行速度
转自:http://www.leeon.me/a/use-Psyco-to-improve-Python-speed Psyco 是严格地在 Python 运行时进行操作的.也就是说,Python 源 ...
- 【转】你真的理解Python中MRO算法吗?
你真的理解Python中MRO算法吗? MRO(Method Resolution Order):方法解析顺序. Python语言包含了很多优秀的特性,其中多重继承就是其中之一,但是多重继承会引发很多 ...
- 【算法】八皇后问题 Python实现
[八皇后问题] 问题: 国际象棋棋盘是8 * 8的方格,每个方格里放一个棋子.皇后这种棋子可以攻击同一行或者同一列或者斜线(左上左下右上右下四个方向)上的棋子.在一个棋盘上如果要放八个皇后,使得她们互 ...
随机推荐
- 快速IO
namespace IO { #define gc() (iS==iT?(iT=(iS=ibuff)+fread(ibuff,1,SIZ,stdin),(iS==iT?EOF:iS++)):iS++) ...
- Nutch2.1+solr3.6.1+mysql5.6问题
1.Nutch2.1问题 1.1 问题:导入完成后,Nutch2.1里面runtime仍旧不能运行,出现jobfailed等错误. 解决:runtime里的nutch调试过程和导入Eclipse差不多 ...
- 用jquery快速解决IE输入框不能输入的问题_jquery
代码如下: 在IE10以上版本,微软为了提高IE输入框的便利性,增加了文本内容全部删除和密码眼睛功能,但是有些时候打开新的页面里,输入框却被锁定无法编辑,需要刷新一下页面,或者如果输入框有内容需要点击 ...
- go内置的反向代理
package main import ( "log" "net/http" "net/http/httputil" "net/u ...
- Sage Math中的语法
1.赋值后不能立即输出,而需要停顿.x= 3 不能输出显示,而 x= 3; x 可以显示. 2.可以用分号连续书写多行. 3.矩阵可以用 mtx[i, j]引用,但是行列号通常从0开始,维度n, m ...
- 三大框架 之 OGNL表达式
目录 OGNL 什么是OGNL OGNL与EL表达式对比 OGNL功能 OGNL使用要素 OGNL入门 java程序使用ognl struts2中使用ONGL OGNL 什么是OGNL OGNL是Ob ...
- 【转】adb server is out of date. killing完美解决
今天,久未出现的著名的“adb server is out of date. killing”又发生了,在此,将解决方法记下,以便日后查看. 1. 错误信息: C:\Users\lizy>ad ...
- [转]OpenGL编程指南(第9版)环境搭建--使用VS2017
1.使用CMake Configure中选择VS2017 Win64 , Finish: 点击Generate. 2.进入build目录 打开GLFW.sln , 生成解决方案. 打开vermilio ...
- Spring源码解析--IOC根容器Beanfactory详解
BeanFactory和FactoryBean的联系和区别 BeanFactory是整个Spring容器的根容器,里面描述了在所有的子类或子接口当中对容器的处理原则和职责,包括生命周期的一些约定. F ...
- Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression 2019-05-20 19:3 ...