Learning notes | Data Analysis: 1.2 data wrangling

| Data Wrangling |

# Sort all the data into one file

files = ['BeijingPM20100101_20151231.csv','ChengduPM20100101_20151231.csv','GuangzhouPM20100101_20151231.csv','ShanghaiPM20100101_20151231.csv','ShenyangPM20100101_20151231.csv']

out_columns = ['No', 'year', 'month', 'day', 'hour', 'season', 'PM_US Post']

# Create a void dataframe

df_all_cities = pd.DataFrame()

# Iterate to write diffrent files

for inx, val in enumerate(files):

    df = pd.read_csv(val)

    df = df[out_columns]

    # create a city column

    df['city'] = val.split('P')[0]

    # map season

    df['season'] = df['season'].map({1:'Spring', 2:'Summer', 3:'Autumn', 4: 'Winter'})

    # append each file and merge all files into one

    df_all_cities = df_all_cities.append(df)

# replace the space in variable names with '_'

df_all_cities.columns = [c.replace(' ', '_') for c in df_all_cities.columns]

# Assignment:

# print the length of data

print("The number of row in this dataset is ",len(Beijing_data.index))

# calculating the number of records in column "PM_Dongsi"

print("There number of missing data records in PM_Dongsi is: ",len(Beijing_data.index) - len(Beijing_data['PM_Dongsi'].dropna()))

print("There number of missing data records in PM_Dongsihuan is: ",len(Beijing_data.index) - len(Beijing_data['PM_Dongsihuan'].dropna()))

print("There number of missing data records in PM_Nongzhanguan is: ",len(Beijing_data.index) - len(Beijing_data['PM_Nongzhanguan'].dropna()))

print("There number of missing data records in DEWP is: ",len(Beijing_data.index) - len(Beijing_data['DEWP'].dropna()))

print("There number of missing data records in HUMI is: ",len(Beijing_data.index) - len(Beijing_data['HUMI'].dropna()))

print("There number of missing data records in PRES is: ",len(Beijing_data.index) - len(Beijing_data['PRES'].dropna()))

print("There number of missing data records in TEMP is: ",len(Beijing_data.index) - len(Beijing_data['TEMP'].dropna()))

print("There number of missing data records in cbwd is: ",len(Beijing_data.index) - len(Beijing_data['cbwd'].dropna()))

print("There number of missing data records in Iws is: ",len(Beijing_data.index) - len(Beijing_data['Iws'].dropna()))

print("There number of missing data records in precipitation is: ",len(Beijing_data.index) - len(Beijing_data['precipitation'].dropna()))

print("There number of missing data records in Iprec is: ",len(Beijing_data.index) - len(Beijing_data['Iprec'].dropna()))

Learning notes | Data Analysis: 1.2 data wrangling的更多相关文章

Learning notes | Data Analysis: 1.1 data evaluation
| Data Evaluation | - Use Shift + Enter or Shift + Return to run the upper box so as to make it disp ...
How to use data analysis for machine learning (example, part 1)
In my last article, I stated that for practitioners (as opposed to theorists), the real prerequisite ...
Learning Spark: Lightning-Fast Big Data Analysis 中文翻译
Learning Spark: Lightning-Fast Big Data Analysis 中文翻译行为纯属个人对于Spark的兴趣,仅供学习. 如果我的翻译行为侵犯您的版权,请您告知,我将停止 ...
用pandas进行数据清洗（二）（Data Analysis Pandas Data Munging/Wrangling）
在<用pandas进行数据清洗(一)(Data Analysis Pandas Data Munging/Wrangling)>中,我们介绍了数据清洗经常用到的一些pandas命令. 接下 ...
An Introduction to Stock Market Data Analysis with R (Part 1)
Around September of 2016 I wrote two articles on using Python for accessing, visualizing, and evalua ...
学习笔记之Python for Data Analysis
Python for Data Analysis, 2nd Edition https://www.safaribooksonline.com/library/view/python-for-data ...
《利用Python进行数据分析： Python for Data Analysis 》学习随笔
NoteBook of <Data Analysis with Python> 3.IPython基础 Tab自动补齐变量名变量方法路径解释 ?解释, ??显示函数源码 ?搜索命名 ...
Python for Data Analysis
Data Analysis with Python ch02 一些有趣的数据分析结果 Male描述的是美国新生儿男孩纸的名字的最后一个字母的分布 Female描述的是美国新生儿女孩纸的名字的最后一个字 ...
深入浅出数据分析 Head First Data Analysis Code 数据与代码
<深入浅出数据分析>英文名为Head First Data Analysis Code, 这本书中提供了学习使用的数据和程序,原书链接由于某些原因不能打开,这里在提供一个下载的链接.去下 ...

随机推荐

ps cs6破解补丁使用方法
第一步.首先下载ps cs6破解补丁 ,再下载官方ps cs6中文版,安装之后运行一次.第二步.先备份你想要激活的软件的“amtlib”文件,比如PS CS6 64bit其目录在“C:\Program ...
使用c++11写个最简跨平台线程池
为什么需要多线程? 最简单的多线程长啥样? 为什么需要线程池,有什么问题? 实现的主要原理是什么? 带着这几个问题,我们依次展开. 1.为什么需要多线程? 大部分程序毕竟都不是计算密集型的,简单的说, ...
dia无法输入中文的解决
安装dia后无法输入中文,解决如下: 修改/usr/bin/dia #dia-normal --integrated "$@" dia-normal "$@"
Gradle入门实战（Windows版）
Installation Gradle runs on all major operating systems and requires only a Java JDK or JRE version ...
@autoclosure-可以让表达式自动封装成一个闭包：输入的是一个表达式
@autoclosure 在闭包前面加上@autoclosure func or(first:Bool,@autoclosure second:()->Bool) -> Bool { if ...
ADF中遍历VO中的行数据(Iterator)
在ADF中VO实质上就是一个迭代器, 1.在Application Module的实现类中,直接借助VO实现类和Row的实现类 TestVOImpl organizationUser = (TestV ...
Adobe flash player 因过期而遭到阻止解决办法
最近使用谷歌浏览器时总是提示Adobe flash player 因过期而遭到阻止,这让人很头痛,基本上就是打开一个网页就会弹出一个提示,下面是解决办法. 问题的截图界面: 解决方法:在chrome浏 ...
【[HAOI2009]逆序对数列】
发现自己学了几天splay已经傻了其实还是一个比较裸的dp的,但是还是想了一小会,还sb的wa了几次首先这道题的状态应该很好看出,我们用$f[i][j]$表示在前$i$个数中(即\(1-i ...
JSP的域对象的作用范围
<%-- Created by IntelliJ IDEA. User: tT丶 Date: 2017-12-12 Time: 14:53 To change this template use ...
Grunt中批量无损压缩图片插件－－Grunt-contrib-imagemin
Photoshop 切出的图片,无论是 PNG 还是 JPEG/JPG 格式,都含有许多相关信息,又或多余的颜色值,这些信息和颜色值,对网页前端并没有用处,反而增加图片大小,所以 Google Pag ...

Learning notes | Data Analysis: 1.2 data wrangling

Learning notes | Data Analysis: 1.2 data wrangling的更多相关文章

随机推荐

热门专题