| Data Wrangling |

# Sort all the data into one file

files = ['BeijingPM20100101_20151231.csv','ChengduPM20100101_20151231.csv','GuangzhouPM20100101_20151231.csv','ShanghaiPM20100101_20151231.csv','ShenyangPM20100101_20151231.csv']
out_columns = ['No', 'year', 'month', 'day', 'hour', 'season', 'PM_US Post']

# Create a void dataframe

df_all_cities = pd.DataFrame()

# Iterate to write diffrent files

for inx, val in enumerate(files):
df = pd.read_csv(val)
df = df[out_columns]
# create a city column
df['city'] = val.split('P')[0]
# map season
df['season'] = df['season'].map({1:'Spring', 2:'Summer', 3:'Autumn', 4: 'Winter'})
# append each file and merge all files into one
df_all_cities = df_all_cities.append(df)

# replace the space in variable names with '_'

df_all_cities.columns = [c.replace(' ', '_') for c in df_all_cities.columns]

# Assignment: 

# print the length of data
print("The number of row in this dataset is ",len(Beijing_data.index))
# calculating the number of records in column "PM_Dongsi"
print("There number of missing data records in PM_Dongsi is: ",len(Beijing_data.index) - len(Beijing_data['PM_Dongsi'].dropna()))
print("There number of missing data records in PM_Dongsihuan is: ",len(Beijing_data.index) - len(Beijing_data['PM_Dongsihuan'].dropna()))
print("There number of missing data records in PM_Nongzhanguan is: ",len(Beijing_data.index) - len(Beijing_data['PM_Nongzhanguan'].dropna()))
print("There number of missing data records in DEWP is: ",len(Beijing_data.index) - len(Beijing_data['DEWP'].dropna()))
print("There number of missing data records in HUMI is: ",len(Beijing_data.index) - len(Beijing_data['HUMI'].dropna()))
print("There number of missing data records in PRES is: ",len(Beijing_data.index) - len(Beijing_data['PRES'].dropna()))
print("There number of missing data records in TEMP is: ",len(Beijing_data.index) - len(Beijing_data['TEMP'].dropna()))
print("There number of missing data records in cbwd is: ",len(Beijing_data.index) - len(Beijing_data['cbwd'].dropna()))
print("There number of missing data records in Iws is: ",len(Beijing_data.index) - len(Beijing_data['Iws'].dropna()))
print("There number of missing data records in precipitation is: ",len(Beijing_data.index) - len(Beijing_data['precipitation'].dropna()))
print("There number of missing data records in Iprec is: ",len(Beijing_data.index) - len(Beijing_data['Iprec'].dropna()))

Learning notes | Data Analysis: 1.2 data wrangling的更多相关文章

  1. Learning notes | Data Analysis: 1.1 data evaluation

    | Data Evaluation | - Use Shift + Enter or Shift + Return to run the upper box so as to make it disp ...

  2. How to use data analysis for machine learning (example, part 1)

    In my last article, I stated that for practitioners (as opposed to theorists), the real prerequisite ...

  3. Learning Spark: Lightning-Fast Big Data Analysis 中文翻译

    Learning Spark: Lightning-Fast Big Data Analysis 中文翻译行为纯属个人对于Spark的兴趣,仅供学习. 如果我的翻译行为侵犯您的版权,请您告知,我将停止 ...

  4. 用pandas进行数据清洗(二)(Data Analysis Pandas Data Munging/Wrangling)

    在<用pandas进行数据清洗(一)(Data Analysis Pandas Data Munging/Wrangling)>中,我们介绍了数据清洗经常用到的一些pandas命令. 接下 ...

  5. An Introduction to Stock Market Data Analysis with R (Part 1)

    Around September of 2016 I wrote two articles on using Python for accessing, visualizing, and evalua ...

  6. 学习笔记之Python for Data Analysis

    Python for Data Analysis, 2nd Edition https://www.safaribooksonline.com/library/view/python-for-data ...

  7. 《利用Python进行数据分析: Python for Data Analysis 》学习随笔

    NoteBook of <Data Analysis with Python> 3.IPython基础 Tab自动补齐 变量名 变量方法 路径 解释 ?解释, ??显示函数源码 ?搜索命名 ...

  8. Python for Data Analysis

    Data Analysis with Python ch02 一些有趣的数据分析结果 Male描述的是美国新生儿男孩纸的名字的最后一个字母的分布 Female描述的是美国新生儿女孩纸的名字的最后一个字 ...

  9. 深入浅出数据分析 Head First Data Analysis Code 数据与代码

    <深入浅出数据分析>英文名为Head First Data Analysis Code, 这本书中提供了学习使用的数据和程序,原书链接由于某些原因不 能打开,这里在提供一个下载的链接.去下 ...

随机推荐

  1. /usr/lib64/python2.6/site-packages/cryptography/__init__.py:26: DeprecationWarning: Python 2.6 is no longer supported by the Python core team

    升级python2.6到2.7后,执行ansible后一直显示警告,如标题所示. 因为安装ansible,使用的是yum的方式,而yum使用的是python2.6,所以ansible安装环境为pyth ...

  2. maven仓库使用HTTP代理,maven仓库使用本地jar

    setting.xml <proxies> <proxy> <id>proxy</id> <active>true</active&g ...

  3. csr867x开发日记——常用软件工具介绍

    xIDE xIDE开发环境(编译器)可以被用于以下操作 查看代码 构建新应用 调试 运行 重新配置 修改 Universal Front End 通用前端 通用前端(UFE)工具用于配置DSP,ADK ...

  4. D3——绘制SVG图形-直方图

    1.创建SVG元素 var svg = d3.select("body").append("svg"); 2.为SVG元素设置属性 svg.attr() .at ...

  5. 问题解决:java.sql.SQLException:Value '0000-00-00' can not be represented as java.sql.Date

    问题描述: 数据表中有记录的time字段(属性为timestamp)其值为:“0000-00-00 00:00:00” 程序使用select 语句从中取数据时出现以下异常: Java.sql.SQLE ...

  6. 七.部署war包到Tomcat(基于Centos安装)

    1.把war包上传至tomcat的webapps目录下面 2.启动Tomcat,在Tomcat的bin目录下面,运行startup.sh 3.访问项目,如下成功打开项目

  7. UVa 1625 - Color Length(线性DP + 滚动数组)

    链接: https://uva.onlinejudge.org/index.php?option=com_onlinejudge&Itemid=8&page=show_problem& ...

  8. poj 1159 Palindrome 【LCS】

    任意门:http://poj.org/problem?id=1159 解题思路: LCS + 滚动数组 AC code: #include <cstdio> #include <io ...

  9. 在eclipse中配置Tomcat时,出现“Cannot create a server using the selected type”的错误。

    出现原因:Tomcat重新安装,并且安装目录改变了. 解决方案:在“Window->preferences->Server->Runtime Environment”,编辑Tomca ...

  10. [转]数据绑定之DataFormatString

    设定BoundField的DataFormatString,通常有以下几种 DataFormatString= "{0:C}" 货币,货币的格式取决于当前Thread中Cultur ...