Learning notes | Data Analysis: 1.2 data wrangling
| Data Wrangling |
# Sort all the data into one file
files = ['BeijingPM20100101_20151231.csv','ChengduPM20100101_20151231.csv','GuangzhouPM20100101_20151231.csv','ShanghaiPM20100101_20151231.csv','ShenyangPM20100101_20151231.csv']
out_columns = ['No', 'year', 'month', 'day', 'hour', 'season', 'PM_US Post']
# Create a void dataframe
df_all_cities = pd.DataFrame()
# Iterate to write diffrent files
for inx, val in enumerate(files):
df = pd.read_csv(val)
df = df[out_columns]
# create a city column
df['city'] = val.split('P')[0]
# map season
df['season'] = df['season'].map({1:'Spring', 2:'Summer', 3:'Autumn', 4: 'Winter'})
# append each file and merge all files into one
df_all_cities = df_all_cities.append(df)
# replace the space in variable names with '_'
df_all_cities.columns = [c.replace(' ', '_') for c in df_all_cities.columns]
# Assignment:
# print the length of data
print("The number of row in this dataset is ",len(Beijing_data.index))
# calculating the number of records in column "PM_Dongsi"
print("There number of missing data records in PM_Dongsi is: ",len(Beijing_data.index) - len(Beijing_data['PM_Dongsi'].dropna()))
print("There number of missing data records in PM_Dongsihuan is: ",len(Beijing_data.index) - len(Beijing_data['PM_Dongsihuan'].dropna()))
print("There number of missing data records in PM_Nongzhanguan is: ",len(Beijing_data.index) - len(Beijing_data['PM_Nongzhanguan'].dropna()))
print("There number of missing data records in DEWP is: ",len(Beijing_data.index) - len(Beijing_data['DEWP'].dropna()))
print("There number of missing data records in HUMI is: ",len(Beijing_data.index) - len(Beijing_data['HUMI'].dropna()))
print("There number of missing data records in PRES is: ",len(Beijing_data.index) - len(Beijing_data['PRES'].dropna()))
print("There number of missing data records in TEMP is: ",len(Beijing_data.index) - len(Beijing_data['TEMP'].dropna()))
print("There number of missing data records in cbwd is: ",len(Beijing_data.index) - len(Beijing_data['cbwd'].dropna()))
print("There number of missing data records in Iws is: ",len(Beijing_data.index) - len(Beijing_data['Iws'].dropna()))
print("There number of missing data records in precipitation is: ",len(Beijing_data.index) - len(Beijing_data['precipitation'].dropna()))
print("There number of missing data records in Iprec is: ",len(Beijing_data.index) - len(Beijing_data['Iprec'].dropna()))
Learning notes | Data Analysis: 1.2 data wrangling的更多相关文章
- Learning notes | Data Analysis: 1.1 data evaluation
| Data Evaluation | - Use Shift + Enter or Shift + Return to run the upper box so as to make it disp ...
- How to use data analysis for machine learning (example, part 1)
In my last article, I stated that for practitioners (as opposed to theorists), the real prerequisite ...
- Learning Spark: Lightning-Fast Big Data Analysis 中文翻译
Learning Spark: Lightning-Fast Big Data Analysis 中文翻译行为纯属个人对于Spark的兴趣,仅供学习. 如果我的翻译行为侵犯您的版权,请您告知,我将停止 ...
- 用pandas进行数据清洗(二)(Data Analysis Pandas Data Munging/Wrangling)
在<用pandas进行数据清洗(一)(Data Analysis Pandas Data Munging/Wrangling)>中,我们介绍了数据清洗经常用到的一些pandas命令. 接下 ...
- An Introduction to Stock Market Data Analysis with R (Part 1)
Around September of 2016 I wrote two articles on using Python for accessing, visualizing, and evalua ...
- 学习笔记之Python for Data Analysis
Python for Data Analysis, 2nd Edition https://www.safaribooksonline.com/library/view/python-for-data ...
- 《利用Python进行数据分析: Python for Data Analysis 》学习随笔
NoteBook of <Data Analysis with Python> 3.IPython基础 Tab自动补齐 变量名 变量方法 路径 解释 ?解释, ??显示函数源码 ?搜索命名 ...
- Python for Data Analysis
Data Analysis with Python ch02 一些有趣的数据分析结果 Male描述的是美国新生儿男孩纸的名字的最后一个字母的分布 Female描述的是美国新生儿女孩纸的名字的最后一个字 ...
- 深入浅出数据分析 Head First Data Analysis Code 数据与代码
<深入浅出数据分析>英文名为Head First Data Analysis Code, 这本书中提供了学习使用的数据和程序,原书链接由于某些原因不 能打开,这里在提供一个下载的链接.去下 ...
随机推荐
- javascript 随机数 生成 n-m
例子:生成800-1500的随机整数,包含800但不包含1500 代码如下: 1500-800 = 700 Math.random()*700 var num = Math.random()*700 ...
- IEEP部署企业级网络工程-网络故障-环路故障
网络故障 1.环路故障 概念 1).以太网是一个支持广播的网络, 在没有环路的环境中,广播报文在网络中以泛洪的形式被送达到网络的第一个角落,以保证每个设备都能够接受到它.每台二层设备在接收到广播报文以 ...
- Eclipse启动JVM机制
1.Eclipse启动的时候,会启动一个JVM来运行eclipse(因为Eclipse是Java代码实现的) 2.Eclipse启动一个带main的主类的时候,会单独启动一个JVM来运行他. 3.Ec ...
- python IO 文件读写
IO 由于CPU和内存的速度远远高于外设的速度,所以,在IO编程中,就存在速度严重不匹配的问题. 如要把100M的数据写入磁盘,CPU输出100M的数据只需要0.01秒,可是磁盘要接收这100M数据可 ...
- MySQL绿色解压缩版安装与配置
操作步骤: 一.安装MySQL数据库 1.下载MySQL-5.6.17-winx64.zip文件.2.解压到指定目录,本例为D:\mysql-5.6.17-winx64.3.修改配置文件,my-def ...
- “三八节”如何做好EDM邮件营销
阳春三月,乍暖还寒,万物复苏,一年一度的三八节也马上来临了,各路商家都开足马力,掀起了一股美丽的旋风.如今酒香也怕巷子深,要想取得良好的营销效果,就得早早动手,赚足眼球,才会换来节日当天的丰厚回馈.U ...
- Java连接MQ的实例, 测试类
package cjf.mq.mqclient; import com.ibm.mq.MQC; import com.ibm.mq.MQEnvironment; import com.ibm.mq.M ...
- 用eclipse pydev 创建一个新py文件时 文件的coding设置问题
问题: 当安装好eclipse和pydev后,创建一个project, 创建一个新的py文件,文件头都会自带中文时间.这样在编译的时候会报错. 解决办法之一: 通过设置,可以使新建的文件的文件头自动带 ...
- miniui dataGrid detail grid
<div > <div id="vkhGrjx_grid" class="mini-datagrid" style="wi ...
- Windows 使用iCloud日历
作者:Lumos Night链接:https://www.zhihu.com/question/34287617/answer/97299386来源:知乎著作权归作者所有.商业转载请联系作者获得授权, ...