| Data Wrangling |

# Sort all the data into one file

files = ['BeijingPM20100101_20151231.csv','ChengduPM20100101_20151231.csv','GuangzhouPM20100101_20151231.csv','ShanghaiPM20100101_20151231.csv','ShenyangPM20100101_20151231.csv']
out_columns = ['No', 'year', 'month', 'day', 'hour', 'season', 'PM_US Post']

# Create a void dataframe

df_all_cities = pd.DataFrame()

# Iterate to write diffrent files

for inx, val in enumerate(files):
df = pd.read_csv(val)
df = df[out_columns]
# create a city column
df['city'] = val.split('P')[0]
# map season
df['season'] = df['season'].map({1:'Spring', 2:'Summer', 3:'Autumn', 4: 'Winter'})
# append each file and merge all files into one
df_all_cities = df_all_cities.append(df)

# replace the space in variable names with '_'

df_all_cities.columns = [c.replace(' ', '_') for c in df_all_cities.columns]

# Assignment: 

# print the length of data
print("The number of row in this dataset is ",len(Beijing_data.index))
# calculating the number of records in column "PM_Dongsi"
print("There number of missing data records in PM_Dongsi is: ",len(Beijing_data.index) - len(Beijing_data['PM_Dongsi'].dropna()))
print("There number of missing data records in PM_Dongsihuan is: ",len(Beijing_data.index) - len(Beijing_data['PM_Dongsihuan'].dropna()))
print("There number of missing data records in PM_Nongzhanguan is: ",len(Beijing_data.index) - len(Beijing_data['PM_Nongzhanguan'].dropna()))
print("There number of missing data records in DEWP is: ",len(Beijing_data.index) - len(Beijing_data['DEWP'].dropna()))
print("There number of missing data records in HUMI is: ",len(Beijing_data.index) - len(Beijing_data['HUMI'].dropna()))
print("There number of missing data records in PRES is: ",len(Beijing_data.index) - len(Beijing_data['PRES'].dropna()))
print("There number of missing data records in TEMP is: ",len(Beijing_data.index) - len(Beijing_data['TEMP'].dropna()))
print("There number of missing data records in cbwd is: ",len(Beijing_data.index) - len(Beijing_data['cbwd'].dropna()))
print("There number of missing data records in Iws is: ",len(Beijing_data.index) - len(Beijing_data['Iws'].dropna()))
print("There number of missing data records in precipitation is: ",len(Beijing_data.index) - len(Beijing_data['precipitation'].dropna()))
print("There number of missing data records in Iprec is: ",len(Beijing_data.index) - len(Beijing_data['Iprec'].dropna()))

Learning notes | Data Analysis: 1.2 data wrangling的更多相关文章

  1. Learning notes | Data Analysis: 1.1 data evaluation

    | Data Evaluation | - Use Shift + Enter or Shift + Return to run the upper box so as to make it disp ...

  2. How to use data analysis for machine learning (example, part 1)

    In my last article, I stated that for practitioners (as opposed to theorists), the real prerequisite ...

  3. Learning Spark: Lightning-Fast Big Data Analysis 中文翻译

    Learning Spark: Lightning-Fast Big Data Analysis 中文翻译行为纯属个人对于Spark的兴趣,仅供学习. 如果我的翻译行为侵犯您的版权,请您告知,我将停止 ...

  4. 用pandas进行数据清洗(二)(Data Analysis Pandas Data Munging/Wrangling)

    在<用pandas进行数据清洗(一)(Data Analysis Pandas Data Munging/Wrangling)>中,我们介绍了数据清洗经常用到的一些pandas命令. 接下 ...

  5. An Introduction to Stock Market Data Analysis with R (Part 1)

    Around September of 2016 I wrote two articles on using Python for accessing, visualizing, and evalua ...

  6. 学习笔记之Python for Data Analysis

    Python for Data Analysis, 2nd Edition https://www.safaribooksonline.com/library/view/python-for-data ...

  7. 《利用Python进行数据分析: Python for Data Analysis 》学习随笔

    NoteBook of <Data Analysis with Python> 3.IPython基础 Tab自动补齐 变量名 变量方法 路径 解释 ?解释, ??显示函数源码 ?搜索命名 ...

  8. Python for Data Analysis

    Data Analysis with Python ch02 一些有趣的数据分析结果 Male描述的是美国新生儿男孩纸的名字的最后一个字母的分布 Female描述的是美国新生儿女孩纸的名字的最后一个字 ...

  9. 深入浅出数据分析 Head First Data Analysis Code 数据与代码

    <深入浅出数据分析>英文名为Head First Data Analysis Code, 这本书中提供了学习使用的数据和程序,原书链接由于某些原因不 能打开,这里在提供一个下载的链接.去下 ...

随机推荐

  1. ps cs6破解补丁使用方法

    第一步.首先下载ps cs6破解补丁 ,再下载官方ps cs6中文版,安装之后运行一次.第二步.先备份你想要激活的软件的“amtlib”文件,比如PS CS6 64bit其目录在“C:\Program ...

  2. 使用c++11写个最简跨平台线程池

    为什么需要多线程? 最简单的多线程长啥样? 为什么需要线程池,有什么问题? 实现的主要原理是什么? 带着这几个问题,我们依次展开. 1.为什么需要多线程? 大部分程序毕竟都不是计算密集型的,简单的说, ...

  3. dia无法输入中文的解决

    安装dia后无法输入中文,解决如下: 修改/usr/bin/dia #dia-normal --integrated "$@" dia-normal "$@"

  4. Gradle入门实战(Windows版)

    Installation Gradle runs on all major operating systems and requires only a Java JDK or JRE version ...

  5. @autoclosure-可以让表达式自动封装成一个闭包:输入的是一个表达式

    @autoclosure 在闭包前面加上@autoclosure func or(first:Bool,@autoclosure second:()->Bool) -> Bool { if ...

  6. ADF中遍历VO中的行数据(Iterator)

    在ADF中VO实质上就是一个迭代器, 1.在Application Module的实现类中,直接借助VO实现类和Row的实现类 TestVOImpl organizationUser = (TestV ...

  7. Adobe flash player 因过期而遭到阻止解决办法

    最近使用谷歌浏览器时总是提示Adobe flash player 因过期而遭到阻止,这让人很头痛,基本上就是打开一个网页就会弹出一个提示,下面是解决办法. 问题的截图界面: 解决方法:在chrome浏 ...

  8. 【[HAOI2009]逆序对数列】

    发现自己学了几天splay已经傻了 其实还是一个比较裸的dp的,但是还是想了一小会,还sb的wa了几次 首先这道题的状态应该很好看出,我们用\(f[i][j]\)表示在前\(i\)个数中(即\(1-i ...

  9. JSP的域对象的作用范围

    <%-- Created by IntelliJ IDEA. User: tT丶 Date: 2017-12-12 Time: 14:53 To change this template use ...

  10. Grunt中批量无损压缩图片插件--Grunt-contrib-imagemin

    Photoshop 切出的图片,无论是 PNG 还是 JPEG/JPG 格式,都含有许多相关信息,又或多余的颜色值,这些信息和颜色值,对网页前端并没有用处,反而增加图片大小,所以 Google Pag ...