numpy库是python的一个著名的科学计算库，本文是一个quickstart。

引入：计算BMI

BMI = 体重（kg）/身高（m）^2

假如有如下几组体重和身高数据，让求每组数据的BMI值：

weight = [65.4,59.2,63.6,88.4,68.7]

height = [1.73,1.68,1.71,1.89,1.79]

print weight / height ** 2

执行上面代码，报错：TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

这是因为普通的除法是元素级的而非向量级的，并不能应用到一组数据上。

解决方案：使用numpy.ndarray数据结构（N维数组），运算是面向矩阵的：

import numpy as np

np_weight = np.array(weight)

np_height = np.array(height)

print type(np_weight)

print type(np_height)

<type 'numpy.ndarray'>

<type 'numpy.ndarray'>

print np_weight

print np_height

[ 65.4  59.2  63.6  88.4  68.7]

[ 1.73  1.68  1.71  1.89  1.79]

注：和python的列表不同的是，numpy.ndarray数据结构的元素之间是没有逗号分隔的。

np_bmi = np_weight / np_height ** 2

print type(np_bmi)

print np_bmi

<type 'numpy.ndarray'>

[ 21.85171573  20.97505669  21.75028214  24.7473475   21.44127836]

numpy数组：numpy.ndarray

numpy.ndarray是numpy最基本的数据结构，即N维数组，且数组中的元素需要是同一种类型，如果不是，则会自动转换成同一种类型，如：

print np.array([1.0,'hi',True])

['1.0' 'hi' 'True']

可以看到都被转成了字符串类型。

不同数据类型的不同行为

# 普通的python列表

py_list = [1,2,3]

# numpy数组

np_array = np.array(py_list)

print py_list + py_list  # 这是列表的拼接

[1, 2, 3, 1, 2, 3]

print np_array + np_array  # 这是每两个对应元素之间的运算

[2 4 6]

子集

print np_bmi[0]

21.8517157272

print np_bmi > 23

[False False False  True False]

print np_bmi[np_bmi > 23]

[ 24.7473475]

二维numpy数组

二维numpy数组是以list作为元素的数组，比如：

np_2d = np.array([height,weight])

print type(np_2d)

<type 'numpy.ndarray'>

print np_2d

[[  1.73   1.68   1.71   1.89   1.79]

 [ 65.4   59.2   63.6   88.4   68.7 ]]

print np_2d.shape

(2, 5)

通过shape属性值可以看出，np_2d是一个2行5列的二维数组。

single type原则

print np.array([[1,2],[3,'4']])

[['1' '2']

 ['3' '4']]

二维numpy数组的子集

np_2d = np.array([height,weight])

print np_2d

[[  1.73   1.68   1.71   1.89   1.79]

 [ 65.4   59.2   63.6   88.4   68.7 ]]

print np_2d[0][2]

1.71

print np_2d[0,2]

1.71

还可以在两个轴向上分别切片：

print np_2d[:,1:3]

[[  1.68   1.71]

 [ 59.2   63.6 ]]

选取第1行：

print np_2d[1,:]

[ 65.4  59.2  63.6  88.4  68.7]

求对应的BMI值：

print np_2d[1,:] / np_2d[0,:] ** 2

[ 21.85171573  20.97505669  21.75028214  24.7473475   21.44127836]

应用

用numpy生成呈正太分布的随机测试数据，并求各项基本的统计数据。

比如生成10000条数据集，记录的是某个镇上所有居民的身高（m）、体重（kg）数据，所用到的函数：

np.random.normal(均值，标准差，取样数)

height = np.random.normal(1.75,0.20,10000)

weight = np.random.normal(60.32,15,10000)

下面将若干个（这里是2个）一维数组拼成一个二维数组（有点像zip()函数的作用）：

np_info = np.column_stack((height,weight))

print np_info

[[  1.88474198  76.24957048]

 [  1.85353302  64.62674488]

 [  1.74999035  67.5831439 ]

 ...,

 [  1.78187257  50.11001273]

 [  1.90415778  50.65985964]

 [  1.51573081  41.00493358]]

求np_info身高平均值：

print np.mean(np_info[:,0])

1.75460102053

求身高的中位数：

print np.median(np_info[:,0])

1.75385473036

求身高和体重的相关系数：

print np.corrcoef(np_info[:,0],np_info[:,1])

[[  1.00000000e+00  -1.50825116e-04]

 [ -1.50825116e-04   1.00000000e+00]]

求身高的标准差：

print np.std(np_info[:,0])

0.201152169706

排序（不会影响源数组）：

print np.sort(np_info[0:10,0])

[ 1.46053123  1.59268772  1.74939538  1.74999035  1.78229515  1.85353302

  1.88474198  1.99755291  2.12384833  2.3727505 ]

求和：

print np.sum(np_info[0:10,0])

18.5673265584

随机推荐

.NET开发相关使用工具和框架
转自: http://www.cnblogs.com/NatureSex/archive/2011/04/21/2023265.html 开发类 visual_studio 2005-2010系列-- ...
【PyQt】分析承载界面
承载界面: # coding=utf-8 import sys from PyQt4.QtCore import * from PyQt4.QtGui import * import class_da ...
boost容器bimap简单使用
C++标准提供了map和multi_map,把key映射到value; 但是这种映射是单向的,只能是key到value,不能反过来; boost.bimap扩展了标准库映射型容器,提供双向 ...
AWS系列-申请Redis
1.1 打开aws控制台,可以直接搜索redis 1.2 进入redis控制面板点击启动缓存集群(这个只是启动创建的意思,不是启动下面创建好的node.我也不懂为啥翻译过来是这个意思...) 1.3 ...
[转]廖雪峰Git教程总结
html ； css ； javascript ； json ；
[说明]今天因为看到了前端的js页面和html页面,觉得有必要熟悉一下他们的基本语法,所以花了一天的时间去复习巩固了一下(之前学习过),包括html语法,css语法,javascript语法,对象.B ...
[LeetCode] Reverse Lists
Well, since the head pointer may also be modified, we create a new_head that points to it to facilit ...
相似性分析之Jaccard相似系数
Jaccard, 又称为Jaccard相似系数(Jaccard similarity coefficient)用于比较有限样本集之间的相似性与差异性.Jaccard系数值越大,样本相似度越高公式: ...
Centos7.0安装python2.7后yum报错
yum报错: vi /usr/libexec/urlgrabber-ext-down 把头部的Python改成和/usr/bin/yum中一样的
javascript数组中的方法
数组中的方法今天我们来说一下,对数组进行操作的几种方法: //添加 a=[];//空数组 a[0]="我是第一个"; a[2]="我是第三个"; ...

$python数据分析基础——初识numpy库