Pandas与Matplotlib基础
.png)
pip install pandas
.png)
s = pd.Series([3, -5, 7, 4], index=['a', 'b', 'c', 'd'])
print(s) # result
a 3
b -5
c 7
d 4
dtype: int64
s.index = ['A', 'B', 'C', 'D']
print(s) # result
A 3
B -5
C 7
D 4
dtype: int64
data = {
'Country': ['Belgium', 'India', 'Brazil'],
'Capital': ['Brussels', 'New Delhi', 'Brasília'],
'Population': [11190846, 1303171035, 207847528]
}
df = pd.DataFrame(data, columns=['Country', 'Capital', 'Population'])
print(df)
# result
Country Capital Population
0 Belgium Brussels 11190846
1 India New Delhi 1303171035
2 Brazil Brasília 207847528
print(df.shape) # result
(260, 218)
print(df.columns)
print(df.index) # result
Index(['Life expectancy', '', '', '', '', '', '',
'', '', '',
...
'', '', '', '', '', '', '', '', '',
''],
dtype='object', length=218) Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
...
250, 251, 252, 253, 254, 255, 256, 257, 258, 259],
dtype='int64', length=260)
print(df.info()) # result
<class 'pandas.core.frame.DataFrame'>
Int64Index: 260 entries, 0 to 259
Columns: 218 entries, Life expectancy to 2016
dtypes: float64(217), object(1)
memory usage: 444.8+ KB
print(df.head()) # result
Life expectancy 1800 1801 1802 1803 1804 1805 1806 \
0 Abkhazia NaN NaN NaN NaN NaN NaN NaN
1 Afghanistan 28.21 28.20 28.19 28.18 28.17 28.16 28.15
2 Akrotiri and Dhekelia NaN NaN NaN NaN NaN NaN NaN
3 Albania 35.40 35.40 35.40 35.40 35.40 35.40 35.40
4 Algeria 28.82 28.82 28.82 28.82 28.82 28.82 28.82 1807 1808 ... 2016
0 NaN NaN ... 0 NaN
1 28.14 28.13 ... 1 52.72
2 NaN NaN ... 2 NaN
3 35.40 35.40 ... 3 78.10
4 28.82 28.82 ... 4 76.50 [5 rows x 218 columns]
print(df.tail()) # result
Life expectancy 1800 1801 1802 1803 1804 1805 1806 1807 \
255 Yugoslavia NaN NaN NaN NaN NaN NaN NaN NaN
256 Zambia 32.60 32.60 32.60 32.60 32.60 32.60 32.60 32.60
257 Zimbabwe 33.70 33.70 33.70 33.70 33.70 33.70 33.70 33.70
258 ?land NaN NaN NaN NaN NaN NaN NaN NaN
259 South Sudan 26.67 26.67 26.67 26.67 26.67 26.67 26.67 26.67 1808 ... 2016
255 NaN ... NaN
256 32.60 ... 57.10
257 33.70 ... 61.69
258 NaN ... NaN
259 26.67 ... 56.10 [5 rows x 218 columns]
selected_cols = ['', '', '']
date_df = df[selected_cols]
print(date_df.head()) # result
2010 2011 2012
0 NaN NaN NaN
1 53.6 54.0 54.4
2 NaN NaN NaN
3 77.2 77.4 77.5
4 76.0 76.1 76.2
print(df.loc[250, 'Life expectancy']) # result
250 Vietnam Name: Life expectancy, dtype: object
df.iloc[:10, -2:] # result
2015 2016
0 NaN NaN
1 53.8 52.72
2 NaN NaN
3 78.0 78.10
4 76.4 76.50
5 72.9 73.00
6 84.8 84.80
7 59.6 60.00
8 NaN NaN
9 76.4 76.50
mask = df > 50
print(df[mask].head()) #result
Life expectancy 1800 1801 1802 1803 1804 1805 1806 1807 \
0 Abkhazia NaN NaN NaN NaN NaN NaN NaN NaN
1 Afghanistan NaN NaN NaN NaN NaN NaN NaN NaN
2 Akrotiri and Dhekelia NaN NaN NaN NaN NaN NaN NaN NaN
3 Albania NaN NaN NaN NaN NaN NaN NaN NaN
4 Algeria NaN NaN NaN NaN NaN NaN NaN NaN 1808 ... 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN ... 52.4 52.8 53.3 53.6 54.0 54.4 54.8 54.9 53.8 52.72
2 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN ... 76.6 76.8 77.0 77.2 77.4 77.5 77.7 77.9 78.0 78.10
4 NaN ... 75.3 75.5 75.7 76.0 76.1 76.2 76.3 76.3 76.4 76.50 [5 rows x 218 columns]
heights = [59.0, 65.2, 62.9, 65.4, 63.7]
data = {
'height': heights, 'sex': 'Male',
}
df_heights = pd.DataFrame(data)
print(df_heights) # result
height sex
0 59.0 Male
1 65.2 Male
2 62.9 Male
3 65.4 Male
4 63.7 Male
df_heights.columns = ['HEIGHT', 'SEX']
df_heights.index = ['david', 'bob', 'lily', 'sara', 'tim']
print(df_heights) # result
HEIGHT SEX
david 59.0 Male
bob 65.2 Male
lily 62.9 Male
sara 65.4 Male
tim 63.7 Male
print(df_heights.sum()) # result
HEIGHT 316.2
SEX MaleMaleMaleMaleMale
dtype: object
print(df_heights.cumsum()) # result
HEIGHT SEX
david 59 Male
bob 124.2 MaleMale
lily 187.1 MaleMaleMale
sara 252.5 MaleMaleMaleMale
tim 316.2 MaleMaleMaleMaleMale
print(df_heights.max()) # result
HEIGHT 65.4
SEX Male
dtype: object
print(df_heights.min()) # result
HEIGHT 59
SEX Male
dtype: object
print(df_heights.mean()) # result
HEIGHT 63.24
dtype: float64
print(df_heights.median()) # result
HEIGHT 63.7
dtype: float64
print(df_heights.describe()) # result
HEIGHT
count 5.000000
mean 63.240000
std 2.589015
min 59.000000
25% 62.900000
50% 63.700000
75% 65.200000
max 65.400000
df_heights.drop(['david', 'tim'])
print(df_heights) # result
df_heights = df_heights.drop(['david', 'tim'])
print(df_heights) # result
HEIGHT SEX
bob 65.2 Male
lily 62.9 Male
sara 65.4 Male
print(df_heights.drop('SEX', axis=1))
# result
HEIGHT
david 177.0
bob 195.6
lily 188.7
sara 196.2
tim 191.1
print(df_heights.sort_index()) # result
HEIGHT SEX
bob 65.2 Male
david 59.0 Male
lily 62.9 Male
sara 65.4 Male
tim 63.7 Male
print(df_heights.sort_values(by='HEIGHT')) # result
HEIGHT SEX
david 59.0 Male
lily 62.9 Male
tim 63.7 Male
bob 65.2 Male
sara 65.4 Male
print(df_heights.rank()) # result
HEIGHT SEX
david 1.0 3.0
bob 4.0 3.0
lily 2.0 3.0
sara 5.0 3.0
tim 3.0 3.0
df_heights = df_heights.apply(lambda height: height*3)
print(df_heights) # result
HEIGHT SEX
david 177.0 MaleMaleMale
bob 195.6 MaleMaleMale
lily 188.7 MaleMaleMale
sara 196.2 MaleMaleMale
tim 191.1 MaleMaleMale
df = pd.read_csv("tips.csv")
print(df.info())
# result
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 8 columns):
total_bill 244 non-null float64
tip 244 non-null float64
sex 244 non-null object
smoker 244 non-null object
day 244 non-null object
time 244 non-null object
size 244 non-null int64
fraction 244 non-null float64
dtypes: float64(3), int64(1), object(4)
memory usage: 15.3+ KB
df = pd.read_csv('tips.csv', header=None, names=column_names)
df = pd.read_csv('tips.csv', header=None, names=column_names, na_values={'DAY': '-1'})
date_df = pd.read_csv('created_date.csv', parse_dates=[[3, 4, 5]])
.png)
import pandas as pd
import matplotlib.pyplot as plt df = pd.read_csv('percent-bachelors-degrees-women-usa.csv', index_col='Year')
print(df.info())
print(df.head()) # result
Agriculture Architecture Art and Performance Biology Business \ ... ... Social Sciences and History
Year ... ... Year
1970 4.229798 11.921005 59.7 29.088363 9.064439 ... ... 1970 36.8
1971 5.452797 12.003106 59.9 29.394403 9.503187 ... ... 1971 36.2
1972 7.420710 13.214594 60.4 29.810221 10.558962 ... ... 1972 36.1
1973 9.653602 14.791613 60.2 31.147915 12.804602 ... ... 1973 36.4
1974 14.074623 17.444688 61.9 32.996183 16.204850 ... ... 1974 37.3 <class 'pandas.core.frame.DataFrame'>
Int64Index: 42 entries, 1970 to 2011
Data columns (total 17 columns):
Agriculture 42 non-null float64
Architecture 42 non-null float64
Art and Performance 42 non-null float64
Biology 42 non-null float64
Business 42 non-null float64
Communications and Journalism 42 non-null float64
Computer Science 42 non-null float64
Education 42 non-null float64
Engineering 42 non-null float64
English 42 non-null float64
Foreign Languages 42 non-null float64
Health Professions 42 non-null float64
Math and Statistics 42 non-null float64
Physical Sciences 42 non-null float64
Psychology 42 non-null float64
Public Administration 42 non-null float64
Social Sciences and History 42 non-null float64
dtypes: float64(17)
memory usage: 5.9 KB
df_CS = df['Computer Science']
plt.plot(df_CS)
plt.show()

.png)
df_CS = df['Computer Science']
plt.plot(df_CS)
# 为图表添加标题
plt.title("Percentage of Computer Science's degrees awarded to women in the USA")
# 为X轴添加标签
plt.xlabel("Years")
# 为Y轴添加标签
plt.ylabel("Percentage")
plt.show()
.png)
df_CS = df['Computer Science']
df_MS = df['Math and Statistics']
# 可以通过DataFrame的plot()方法直接绘制
# color指定线条的颜色
# style指定线条的样式
# legend指定是否使用标识区分
df_CS.plot(color='b', style='.-', legend=True)
df_MS.plot(color='r', style='-', legend=True)
plt.title("Percentage of Computer Science's degrees VS Math and Statistics's")
plt.xlabel("Years")
plt.ylabel("Percentage")
plt.show()

.png)
# alpha指定透明度(0~1)
df.plot(alpha=0.7)
plt.title("Percentage of bachelor's degrees awarded to women in the USA")
plt.xlabel("Years")
plt.ylabel("Percentage")
# axis指定X轴Y轴的取值范围
plt.axis((1970, 2000, 0, 200))
plt.show()

.png)
iris = pd.read_csv("iris.csv")
# 源数据中没有给column,所以需要手动指定一下
iris.columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']
# kind表示图形的类型
# x, y 分别指定X, Y 轴所指定的数据
iris.plot(kind='scatter', x='sepal_length', y='sepal_width')
plt.xlabel("sepal length in cm")
plt.ylabel("sepal width in cm")
plt.title("iris data analysis")
plt.show()

.png)
iris.plot(kind='box', y='sepal_length')
plt.ylabel("sepal length in cm")
plt.show()

.png)
# 使用mask取出子集
mask = (iris.species == 'Iris-setosa')
setosa = iris[mask]
# bins指定柱状图的个数
# range指定X轴的取值范围
setosa.plot(kind='hist', bins=50, range=(0, 8), alpha=0.5)
plt.title("setosa in iris")
plt.xlabel("CM")
plt.show()

.png)
Pandas与Matplotlib基础的更多相关文章
- Pandas与Matplotlib
Pandas与Matplotlib基础 pandas是Python中开源的,高性能的用于数据分析的库.其中包含了很多可用的数据结构及功能,各种结构支持相互转换,并且支持读取.保存数据.结合matplo ...
- Matplotlib基础知识
Matplotlib基础知识 Matplotlib中的基本图表包括的元素 x轴和y轴 axis水平和垂直的轴线 x轴和y轴刻度 tick刻度标示坐标轴的分隔,包括最小刻度和最大刻度 x轴和y轴刻度标签 ...
- Matplotlib基础使用
matplotlib 一.Matplotlib基础知识 Matplotlib中的基本图表包括的元素 x轴和y轴 axis 水平和垂直的轴线 x轴和y轴刻度 tick 刻度标示坐标轴的分隔,包括最小刻度 ...
- 模块简介与matplotlib基础
模块简介与matplotlib基础 1.基本概念 1.1数据分析 对已知的数据进行分析,提取出一些有价值的信息. 1.2数据挖掘 对大量的数据进行分析与挖掘,得到一些未知的,有价值的信息. 1.3数据 ...
- 用Python的Pandas和Matplotlib绘制股票KDJ指标线
我最近出了一本书,<基于股票大数据分析的Python入门实战 视频教学版>,京东链接:https://item.jd.com/69241653952.html,在其中给出了MACD,KDJ ...
- 数据分析与展示——Matplotlib基础绘图函数示例
Matplotlib库入门 Matplotlib基础绘图函数示例 pyplot基础图表函数概述 函数 说明 plt.plot(x,y,fmt, ...) 绘制一个坐标图 plt.boxplot(dat ...
- Matplotlib基础图形之散点图
Matplotlib基础图形之散点图 散点图特点: 1.散点图显示两组数据的值,每个点的坐标位置由变量的值决定 2.由一组不连续的点组成,用于观察两种变量的相关性(正相关,负相关,不相关) 3.例如: ...
- numpy,scipy,pandas 和 matplotlib
numpy,scipy,pandas 和 matplotlib 本文会介绍numpy,scipy,pandas 和 matplotlib 的安装,环境为Windows10. 一般情况下,如果安装了Py ...
- linux下安装numpy,pandas,scipy,matplotlib,scikit-learn
python在数据科学方面需要用到的库: a.Numpy:科学计算库.提供矩阵运算的库. b.Pandas:数据分析处理库 c.scipy:数值计算库.提供数值积分和常微分方程组求解算法.提供了一个非 ...
随机推荐
- Git版本控制的基本命令
安装完了GIT首先要自报家门,否则代码不能提交 git config --global user.name "Your Name" git config --global user ...
- 在Vue2.0中集成UEditor 富文本编辑器
在vue的'项目中遇到了需要使用富文本编辑器的需求,在github上看了很多vue封装的editor插件,很多对图片上传和视频上传的支持并不是很好,最终还是决定使用UEditor. 这类的文章网上有很 ...
- C语言老司机学Python (四)
字符串格式化: 可以使用类似c语言中sprintf函数的方法进行格式化,但是函数名称是print() 如:print('常量 PI 的值近似为:%5.3f.' % var_PI) 注意var_PI ...
- 玩转FFmpeg的7个小技巧
FFmpeg堪称音频和视频应用程序的瑞士军刀,提供了丰富的选项和灵活性.很多时候用户为了看视频和听音乐都安装了ffmeg.更多关于ffmeg的详细介绍:here,可以通过ffmpeg -formats ...
- 为Hi3531添加4串口支持
修改文件为 linux-3.0.y\arch\arm\mach-godnet\core.c linux-3.0.y\arch\arm\mach-godnet\include\mach\irqs.h 修 ...
- 图解MBR分区无损转换GPT分区+UEFI引导安装WIN8.1
确定你的主板支持UEFI引导.1,前期准备,WIN8.1原版系统一份(坛子里很多,自己下载个),U盘2个其中大于4G一个(最好 准备两个U盘)2,大家都知道WIN8系统只支持GPT分区,传统的MBR分 ...
- SMBus
SMBus (System Management Bus,系统管理总线) 是1995年由Intel提出的,应用于移动PC和桌面PC系统中的低速率通讯.希望通过一条廉价并且功能强大的总线(由两条线组成) ...
- Day25 前端自学日记——入坑记
一 学习契机 今年是走出校门的第一个年头,进入了一家还算不错的公司,领着一份还算不错的薪水,在外人眼中,似乎这样已经不错了,只要我努力好好做,前程一片光明.可事实真是这样吗?两份实习经历都指向我应该从 ...
- ASP.NET Core轻松入门之Middleware管道模型
Middleware指的是微软的的asp.net core的管道模型.其原理可以用微软官方的下图展示: 原理如上图,随着Request的发起,HttpContext会经历多个管道处理(图中的箭头游走方 ...
- freemarker写select组件(二十二)
一,讲解一 1.宏定义 <#macro select id datas> <select id="${id}" name="${id}"> ...