| Data Wrangling |

# Sort all the data into one file

files = ['BeijingPM20100101_20151231.csv','ChengduPM20100101_20151231.csv','GuangzhouPM20100101_20151231.csv','ShanghaiPM20100101_20151231.csv','ShenyangPM20100101_20151231.csv']
out_columns = ['No', 'year', 'month', 'day', 'hour', 'season', 'PM_US Post']

# Create a void dataframe

df_all_cities = pd.DataFrame()

# Iterate to write diffrent files

for inx, val in enumerate(files):
df = pd.read_csv(val)
df = df[out_columns]
# create a city column
df['city'] = val.split('P')[0]
# map season
df['season'] = df['season'].map({1:'Spring', 2:'Summer', 3:'Autumn', 4: 'Winter'})
# append each file and merge all files into one
df_all_cities = df_all_cities.append(df)

# replace the space in variable names with '_'

df_all_cities.columns = [c.replace(' ', '_') for c in df_all_cities.columns]

# Assignment: 

# print the length of data
print("The number of row in this dataset is ",len(Beijing_data.index))
# calculating the number of records in column "PM_Dongsi"
print("There number of missing data records in PM_Dongsi is: ",len(Beijing_data.index) - len(Beijing_data['PM_Dongsi'].dropna()))
print("There number of missing data records in PM_Dongsihuan is: ",len(Beijing_data.index) - len(Beijing_data['PM_Dongsihuan'].dropna()))
print("There number of missing data records in PM_Nongzhanguan is: ",len(Beijing_data.index) - len(Beijing_data['PM_Nongzhanguan'].dropna()))
print("There number of missing data records in DEWP is: ",len(Beijing_data.index) - len(Beijing_data['DEWP'].dropna()))
print("There number of missing data records in HUMI is: ",len(Beijing_data.index) - len(Beijing_data['HUMI'].dropna()))
print("There number of missing data records in PRES is: ",len(Beijing_data.index) - len(Beijing_data['PRES'].dropna()))
print("There number of missing data records in TEMP is: ",len(Beijing_data.index) - len(Beijing_data['TEMP'].dropna()))
print("There number of missing data records in cbwd is: ",len(Beijing_data.index) - len(Beijing_data['cbwd'].dropna()))
print("There number of missing data records in Iws is: ",len(Beijing_data.index) - len(Beijing_data['Iws'].dropna()))
print("There number of missing data records in precipitation is: ",len(Beijing_data.index) - len(Beijing_data['precipitation'].dropna()))
print("There number of missing data records in Iprec is: ",len(Beijing_data.index) - len(Beijing_data['Iprec'].dropna()))

Learning notes | Data Analysis: 1.2 data wrangling的更多相关文章

  1. Learning notes | Data Analysis: 1.1 data evaluation

    | Data Evaluation | - Use Shift + Enter or Shift + Return to run the upper box so as to make it disp ...

  2. How to use data analysis for machine learning (example, part 1)

    In my last article, I stated that for practitioners (as opposed to theorists), the real prerequisite ...

  3. Learning Spark: Lightning-Fast Big Data Analysis 中文翻译

    Learning Spark: Lightning-Fast Big Data Analysis 中文翻译行为纯属个人对于Spark的兴趣,仅供学习. 如果我的翻译行为侵犯您的版权,请您告知,我将停止 ...

  4. 用pandas进行数据清洗(二)(Data Analysis Pandas Data Munging/Wrangling)

    在<用pandas进行数据清洗(一)(Data Analysis Pandas Data Munging/Wrangling)>中,我们介绍了数据清洗经常用到的一些pandas命令. 接下 ...

  5. An Introduction to Stock Market Data Analysis with R (Part 1)

    Around September of 2016 I wrote two articles on using Python for accessing, visualizing, and evalua ...

  6. 学习笔记之Python for Data Analysis

    Python for Data Analysis, 2nd Edition https://www.safaribooksonline.com/library/view/python-for-data ...

  7. 《利用Python进行数据分析: Python for Data Analysis 》学习随笔

    NoteBook of <Data Analysis with Python> 3.IPython基础 Tab自动补齐 变量名 变量方法 路径 解释 ?解释, ??显示函数源码 ?搜索命名 ...

  8. Python for Data Analysis

    Data Analysis with Python ch02 一些有趣的数据分析结果 Male描述的是美国新生儿男孩纸的名字的最后一个字母的分布 Female描述的是美国新生儿女孩纸的名字的最后一个字 ...

  9. 深入浅出数据分析 Head First Data Analysis Code 数据与代码

    <深入浅出数据分析>英文名为Head First Data Analysis Code, 这本书中提供了学习使用的数据和程序,原书链接由于某些原因不 能打开,这里在提供一个下载的链接.去下 ...

随机推荐

  1. javascript 随机数 生成 n-m

    例子:生成800-1500的随机整数,包含800但不包含1500 代码如下: 1500-800 = 700 Math.random()*700 var num = Math.random()*700 ...

  2. IEEP部署企业级网络工程-网络故障-环路故障

    网络故障 1.环路故障 概念 1).以太网是一个支持广播的网络, 在没有环路的环境中,广播报文在网络中以泛洪的形式被送达到网络的第一个角落,以保证每个设备都能够接受到它.每台二层设备在接收到广播报文以 ...

  3. Eclipse启动JVM机制

    1.Eclipse启动的时候,会启动一个JVM来运行eclipse(因为Eclipse是Java代码实现的) 2.Eclipse启动一个带main的主类的时候,会单独启动一个JVM来运行他. 3.Ec ...

  4. python IO 文件读写

    IO 由于CPU和内存的速度远远高于外设的速度,所以,在IO编程中,就存在速度严重不匹配的问题. 如要把100M的数据写入磁盘,CPU输出100M的数据只需要0.01秒,可是磁盘要接收这100M数据可 ...

  5. MySQL绿色解压缩版安装与配置

    操作步骤: 一.安装MySQL数据库 1.下载MySQL-5.6.17-winx64.zip文件.2.解压到指定目录,本例为D:\mysql-5.6.17-winx64.3.修改配置文件,my-def ...

  6. “三八节”如何做好EDM邮件营销

    阳春三月,乍暖还寒,万物复苏,一年一度的三八节也马上来临了,各路商家都开足马力,掀起了一股美丽的旋风.如今酒香也怕巷子深,要想取得良好的营销效果,就得早早动手,赚足眼球,才会换来节日当天的丰厚回馈.U ...

  7. Java连接MQ的实例, 测试类

    package cjf.mq.mqclient; import com.ibm.mq.MQC; import com.ibm.mq.MQEnvironment; import com.ibm.mq.M ...

  8. 用eclipse pydev 创建一个新py文件时 文件的coding设置问题

    问题: 当安装好eclipse和pydev后,创建一个project, 创建一个新的py文件,文件头都会自带中文时间.这样在编译的时候会报错. 解决办法之一: 通过设置,可以使新建的文件的文件头自动带 ...

  9. miniui dataGrid detail grid

    <div >      <div id="vkhGrjx_grid" class="mini-datagrid" style="wi ...

  10. Windows 使用iCloud日历

    作者:Lumos Night链接:https://www.zhihu.com/question/34287617/answer/97299386来源:知乎著作权归作者所有.商业转载请联系作者获得授权, ...