import numpy as np

Using NumPy arrays enables you to express many kinds of data processing tasks as concise(简明的) array expressions(不用写循环就能用数组表达很多数据过程) that might otherwise require writing loops. This practice of replacing explicit loops whth array expressions is commonly referred to as vectorization(向量化操作). In general, vectorized array operations will offen be one or two(or more) orders of magnitude faster than their pure Python equivalents, with the biggest impact in any kind of numerical computations. Later, Appendix A, I explain broadcasting, a powerful method for vectorizing computations. -> 面向数组编程, 效率比纯Python高快很多.

As a simple example, suppose we wished to evaluate the function sqrt(x^2 + y^2) across a reqular grid of values. The np.meshgrid function takes two 1D arrays and produces two 2D matrices corresponding(对应的值对) to all paris of (x,y) in the two arrays.

# 1000 equaly spaced points
points = np.arange(-5, 5, 0.01) xs, ys = np.meshgrid(points, points)
ys
array([[-5.  , -5.  , -5.  , ..., -5.  , -5.  , -5.  ],
[-4.99, -4.99, -4.99, ..., -4.99, -4.99, -4.99],
[-4.98, -4.98, -4.98, ..., -4.98, -4.98, -4.98],
...,
[ 4.97, 4.97, 4.97, ..., 4.97, 4.97, 4.97],
[ 4.98, 4.98, 4.98, ..., 4.98, 4.98, 4.98],
[ 4.99, 4.99, 4.99, ..., 4.99, 4.99, 4.99]])

Now, evaluating the function is a matter of writing the same expression you would wirte with two points:

z = np.sqrt(xs**2 + ys**2)
z
array([[7.07106781, 7.06400028, 7.05693985, ..., 7.04988652, 7.05693985,
7.06400028],
[7.06400028, 7.05692568, 7.04985815, ..., 7.04279774, 7.04985815,
7.05692568],
[7.05693985, 7.04985815, 7.04278354, ..., 7.03571603, 7.04278354,
7.04985815],
...,
[7.04988652, 7.04279774, 7.03571603, ..., 7.0286414 , 7.03571603,
7.04279774],
[7.05693985, 7.04985815, 7.04278354, ..., 7.03571603, 7.04278354,
7.04985815],
[7.06400028, 7.05692568, 7.04985815, ..., 7.04279774, 7.04985815,
7.05692568]])

As a preview of Chapter9, , I use matplotlib to create visualizations(可视化) of this two-dimensional array.

import matplotlib.pyplot as plt

plt.imshow(z, cmap=plt.cm.gray)
plt.colorbar() plt.title("Image plot of $\sqrt{x^2 + y^2}$ for a grid of values") plt.show()
<matplotlib.image.AxesImage at 0x23043ff1be0>
<matplotlib.colorbar.Colorbar at 0x230430a16d8>
Text(0.5, 1.0, 'Image plot of $\\sqrt{x^2 + y^2}$ for a grid of values')

Expressing Conditional Logic as Array Oprations

The numpy. where function is a vectorized of the ternary(三元的) expression x if condition else y. (np.where(cond T, F)的三元表达式) Suppose we had a boolean array and two arrays of values:

xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5]) cond = np.array([True, False, True, True, False])

Suppose we wanted to take a value from xarr whenever the corresponding(对应的) value in cond is True, and otherwise take the value from yarr. A list comprehension(理解) doing this might look like:

"通过判断 c 的值为 True or False"

"zip(xarr, yarr, zarr)"

result = [(x if c else y) for x, y, c in zip(xarr, yarr, cond)]
result
'通过判断 c 的值为 True or False'
'zip(xarr, yarr, zarr)'
[1.1, 2.2, 1.3, 1.4, 2.5]
# cj test
for x, y, c in zip(xarr, yarr, cond):
x,y,c
(1.1, 2.1, True)
(1.2, 2.2, False)
(1.3, 2.3, True)
(1.4, 2.4, True)
(1.5, 2.5, False)
# cj test
list(zip(xarr, yarr, cond))
[(1.1, 2.1, True),
(1.2, 2.2, False),
(1.3, 2.3, True),
(1.4, 2.4, True),
(1.5, 2.5, False)]

This has multiplue problesms. First, it will not be very fast for large arrays(Because all the work is being done in interpreted Pyton(解释器执行慢) code). Second, it will not work with multidimensional arrays. With np.where you can write this very concisely(简明地). -> np.where()这种能弥补python解释器运行慢和能处理多维数组的不足.

result = np.where(cond, xarr, yarr)
result
array([1.1, 2.2, 1.3, 1.4, 2.5])

The second and the third arguments to np.where don't need to be arrays; one or both of them can be scalar. A typical use of where in data analysis is to produce a new array of values base on another array(通过一个多维数组,对其进行判断, 产生新数组, 通过三元表达式的写法). Suppose you had a matrix of randomly generated data and you wanted to replace all positive values with 2 and all negative values(负数值) with -2. This is very easy to do with np.where.

arr = np.random.randn(4,4)
arr "逻辑判断 值大于0"
arr > 0
array([[ 0.16344426, -0.24675782, -0.99098667,  2.30182665],
[ 1.21964938, 1.6536566 , -0.06302591, -0.27577446],
[ 0.9991692 , 0.47264648, 0.51368592, -0.28743687],
[-0.62238625, 1.24407926, 0.46229014, -0.09544536]])
'逻辑判断 值大于0'
array([[ True, False, False,  True],
[ True, True, False, False],
[ True, True, True, False],
[False, True, True, False]])
"np.where, 大于0的值设为2, 否则值设为-2, 产生了新的数组"

np.where(arr > 0, 2, -2)
'np.where, 大于0的值设为2, 否则值设为-2, 产生了新的数组'
array([[ 2, -2, -2,  2],
[ 2, 2, -2, -2],
[ 2, 2, 2, -2],
[-2, 2, 2, -2]])

You can combine scalars and arrays when using np.where. For example, I can replace all positive values in arr with the constant 2 like so:

# set only positive values to 2
np.where(arr > 0, 2, arr)
array([[ 2.        , -0.24675782, -0.99098667,  2.        ],
[ 2. , 2. , -0.06302591, -0.27577446],
[ 2. , 2. , 2. , -0.28743687],
[-0.62238625, 2. , 2. , -0.09544536]])

The arrays passed to np.where can be more than just equal-sized arrays or scalars. -> np.where还有很多强大的用法呢

Mathematical and Statistical Methods

A set of mathematical functions that compute statistics about an entire array or about the data along an axis are accessible(可理解为) as methods of the array class. You can use aggregations(聚合函数) like sum, mean, and std either by calling the array instance method of using the top-level NumPy function.

Here I generate some normally distribute random data and compute some aggregate statistics:

arr = np.random.randn(5,4)
arr
array([[-1.37805831, -1.12482245, -0.16684412, -0.76586049],
[ 0.53032371, -0.44266291, -2.34564781, -0.16721986],
[-0.85135248, -1.11541433, 1.50280171, -0.32380149],
[-0.40347092, 0.04702776, -0.97636849, 0.2564794 ],
[-0.44465538, 0.68593465, -1.45780821, 0.46746144]])
arr.mean()

np.mean(arr)

arr.sum()

-0.4236979305849384

-0.4236979305849384

-8.473958611698768

Functions like mean and sum take an opetional axis argument that complutes the statistic over the given axis, resulting in an array with one fewer dimension: -> 聚合函数是根据轴来计算的.

# cj test

arr = np.arange(1,7).reshape((2, 3))
arr
array([[1, 2, 3],
[4, 5, 6]])
"axis=1, 轴1表示列方向, 右边, 即按每行计算"
arr.mean(axis=1) "axis=0, 轴0表示行方向, 下边, 即表示按每列计算"
arr.sum(axis=0)
'axis=1, 轴1表示列方向, 右边, 即按每行计算'

array([2., 5.])

'axis=0, 轴0表示行方向, 下边, 即表示按每列计算'

array([5, 7, 9])

Here, arr.mean(1) means "compute mean across columns (列方向,右边, 按照每行计算)" where arr.sum(0) means "compute sum down the rows.(行方向, 下边, 计算每列)"

Other methods like cumsum and cumprod do not aggregate, instead producing an array of the intermediate(中间的结果数组) results:

arr = np.array([0, 1, 2, 3, 4, 5, 6, 7])

"cumsum() 累积求和函数"
arr.cumsum()
'cumsum() 累积求和函数'

array([ 0,  1,  3,  6, 10, 15, 21, 28], dtype=int32)

In multidimensional arrays, accumulation(累积) like cumsum return an array of the same size, but with the partial aggregates(部分聚合函数) computed the indicated axis according to each lower dimentional slice:(根据轴来计算)

arr = np.array([[0,1,2], [3,4,5],[6,7,8]])
arr
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
"0轴, 行方向, 下方, 按列展示"
arr.cumsum(axis=0) "1轴, 方向, 右边, 按行展示"
arr.cumprod(axis=1)
'0轴, 行方向, 下方, 按列展示'

array([[ 0,  1,  2],
[ 3, 5, 7],
[ 9, 12, 15]], dtype=int32)
'1轴, 方向, 右边, 按行展示'

array([[  0,   0,   0],
[ 3, 12, 60],
[ 6, 42, 336]], dtype=int32)

See Table 4-5 for a full listing. We'll see many examples these methods in action in late chapters.

  • sum, mean, median
  • std, var
  • max, min
  • argmin, argmax
  • cumsum, cumprod
  • ...

Methods for Boolean arrays

Boolean values are coerced to(被规定为) 1 and 0 (False) in the preceding methods. Thus, sum is often used as a means of counting True values in a boolean array:

arr = np.random.randn(100)

"计算大于0的元素值有多少个"
(arr > 0).sum()
'计算大于0的元素值有多少个'

56

There are two addtional methods, any and all, usefull especially for boolean arrays. any(存在至少一个) tests whether one or more values in an array is True, while(而) all(所有) checks if every value is True.

bools = np.array([False, False, True, False])

"any: 存在至少一个即为真"
bools.any() "all: 必须所有真才为真"
bools.all()
'any: 存在至少一个即为真'

True

'all: 必须所有真才为真'

False

Theres methods also work with non-boolean arrays, where non-zero elements evaluate to True.

Sorting

Like Python's built-in list type, NumPy arrays can be sorted in-place with the sort method:

arr = np.random.randn(6)
arr
array([-0.07751873,  1.96812178,  1.62236213,  0.35971909,  0.63935982,
0.75188034])
"arr.sort() 排序是原地的, 直接修改原数组, 没有返回值"

arr.sort()
arr
'arr.sort() 排序是原地的, 直接修改原数组, 没有返回值'

array([-0.07751873,  0.35971909,  0.63935982,  0.75188034,  1.62236213,
1.96812178])

You can sort each one-dimensional section(轴编号) of values in a multidimentional array in-place along an axis by passing the axis number to sort:

arr = np.random.randn(5,3)
arr
array([[ 1.19201961, -0.55352247,  0.59211779],
[-0.72344831, 0.48316786, -0.11050496],
[-0.77023054, 0.54681603, 0.49216649],
[ 0.20738566, -0.60705897, -1.37389538],
[ 0.46993764, -0.81503777, -1.31609675]])
"axis=1, 列方向, 右边, 按照每行"
arr.sort(1)
arr
'axis=1, 列方向, 右边, 按照每行'

array([[-0.55352247,  0.59211779,  1.19201961],
[-0.72344831, -0.11050496, 0.48316786],
[-0.77023054, 0.49216649, 0.54681603],
[-1.37389538, -0.60705897, 0.20738566],
[-1.31609675, -0.81503777, 0.46993764]])

The top-level method np.sort returns a sorted copy of an array instead of modifying the array in-place.(np.sort()返回的是一个深拷贝, 非原地修改) A quick-and-dirty way to compute the quantiles(分位数) of an array is to sort it and select the value at a particular rank:

large_arr = np.random.randn(1000)

"默认升序"
large_arr.sort() " %5 quantile"
large_arr[int(0.05 * len(large_arr))]
'默认升序'

' %5 quantile'

-1.7445979520348824

For more details on using NumPy's sorting methods, and more advanced techniques like indirect sorts, see Appendix A. Several other kinds of data manipulations related to sorting of data can also be found in pandas.

Unique and Other set Logic

NumPy has some basic set opetation for one-dimensional ndarrays. A commonly used is np.unique, which returns the sorted unique values in an array:

names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])

"数组去重"
np.unique(names)
'数组去重'

array(['Bob', 'Joe', 'Will'], dtype='<U4')

ints = np.array([3,3,3,2,2,1,1,4,4])

"数子去重"
np.unique(ints)
'数子去重'

array([1, 2, 3, 4])

Contrast(对比) np.unique with the pure Python alternative:

sorted(set(names))
['Bob', 'Joe', 'Will']

Another function, np.in1d, tests membership of the values in one array in another, returning a boolean array:

values = np.array([6, 0, 0, 3, 2, 5, 6])

"逐个判断数组的值, 是否在另一个array中"
np.in1d(values, [2,3,6])
'逐个判断数组的值, 是否在另一个array中'

array([ True, False, False,  True,  True, False,  True])

See Table 4-6 for a listing of set functions in NumPy.

  • unique(x) Compute the sorted, unique elements in x
  • intersect1d(x, y) Compute the sorted, common elements in x and y
  • union1d(x, y) Compute the sorted union of elements
  • in1d(x, y) Compute a boolean array indicating whether each element of x in y
  • ....

NumPy 之 面向数组编程的更多相关文章

  1. 《利用python进行数据分析》读书笔记--第四章 numpy基础:数组和矢量计算

    http://www.cnblogs.com/batteryhp/p/5000104.html 第四章 Numpy基础:数组和矢量计算 第一部分:numpy的ndarray:一种多维数组对象 实话说, ...

  2. python数据分析---第04章 NumPy基础:数组和矢量计算

    NumPy(Numerical Python的简称)是Python数值计算最重要的基础包.大多数提供科学计算的包都是用NumPy的数组作为构建基础. NumPy的部分功能如下: ndarray,一个具 ...

  3. 30天C#基础巩固------面向鸭子编程,关于string和File的练习

         面向对象编程就是面向抽象的父类进行编程,具体的实现不用考虑,由子类决定.<经典的说法--面向鸭子编程> eg:鸭子的编程,<对于多态的理解>     我们都习惯把使用 ...

  4. spring AOP面向切面编程学习笔记

    一.面向切面编程简介: 在调用某些类的方法时,要在方法执行前或后进行预处理或后处理:预处理或后处理的操作被封装在另一个类中.如图中,UserService类在执行addUser()或updateUse ...

  5. javascript设计模式学习之十七——程序设计原则与面向接口编程

    一.编程设计原则 1)单一职责原则(SRP): 这里的职责是指“引起变化的原因”:单一职责原则体现为:一个对象(方法)只做一件事. 事实上,未必要在任何时候都一成不变地遵守原则,实际开发中,因为种种原 ...

  6. Spring AOP:面向切面编程,AspectJ,是基于spring 的xml文件的方法

    导包等不在赘述: 建立一个接口:ArithmeticCalculator,没有实例化的方法: package com.atguigu.spring.aop.impl.panpan; public in ...

  7. Spring AOP:面向切面编程,AspectJ,是基于注解的方法

    面向切面编程的术语: 切面(Aspect): 横切关注点(跨越应用程序多个模块的功能)被模块化的特殊对象 通知(Advice): 切面必须要完成的工作 目标(Target): 被通知的对象 代理(Pr ...

  8. iOS控制器瘦身-面向超类编程

    今天写这篇文章的目的,是提供一种思路,来帮助大家解决控制器非常臃肿的问题,对控制器瘦身. 滴滴 老司机要开车了 如果手边有项目,不妨打开工程看一下你的控制器代码有多少行,是不是非常多?再看一下tabl ...

  9. Spring AOP: Spring之面向方面编程

    Spring AOP: Spring之面向方面编程 面向方面编程 (AOP) 提供从另一个角度来考虑程序结构以完善面向对象编程(OOP). 面向对象将应用程序分解成 各个层次的对象,而AOP将程序分解 ...

随机推荐

  1. git提交代码时出现was rejected by remote错误

    git常见问题 git是大家在公司基本都项目管理工具,有一次在改了一个bug提交远程提交就出现问题了. 解决方案 首先这个是远程提交的时候被项目权限拦截掉了,一般在我们都用配置ssh公钥的方式操作,那 ...

  2. 每日一问:Android 中内存泄漏都有哪些注意点?

    内存泄漏对每一位 Android 开发一定是司空见惯,大家或多或少都肯定有些许接触.大家都知道,每一个手机都有一定的承载上限,多处的内存泄漏堆积一定会堆积如山,最终出现内存爆炸 OOM. 而这,也是极 ...

  3. 远程windows

    1. 起因 因为经常用teamviewer,所以断定我是商业用户,不允许我用了.想买一个授权,结果太贵了,1700多.使用了很多其他的,向日葵卡顿,有的窗口点不到,vnc慢,效果差,卡顿,还收费,等等 ...

  4. VS C++代码能正确编译 但还是显示红色波浪线 提示"无法打开源文件"

    症状细节 我发现很多第三方库的环境部署教程,都会教读者配置 属性VC++目录 -> 包含目录 比如OpenCV的环境配置教程. 这样配置之后能通过编译: 但是,在IDE里,会有一些烦人的红波浪线 ...

  5. 分布式系统全局唯一ID生成

    一 什么是分布式系统唯一ID 在复杂分布式系统中,往往需要对大量的数据和消息进行唯一标识. 如在金融.电商.支付.等产品的系统中,数据日渐增长,对数据分库分表后需要有一个唯一ID来标识一条数据或消息, ...

  6. spark 基本操作整理

    关于spark 的详细操作请参照spark官网 scala 版本:2.11.8 1.添加spark maven依赖,如需访问hdfs,则添加hdfs依赖 groupId = org.apache.sp ...

  7. Maven 教程(3)— Maven仓库介绍与本地仓库配置

    原文地址:https://blog.csdn.net/liupeifeng3514/article/details/79537837 1.Maven本地仓库/远程仓库的基本介绍 本地仓库是指存在于我们 ...

  8. Maven使用tomcat7-maven-plugin

    原文地址:https://www.cnblogs.com/mozisss/p/10233366.html 功能: (使用maven中的tomcat插件,就可以将tomcat集成到项目中,效果就是:在不 ...

  9. iOS 测试 WebDriverAgent 简介

    WebDriverAgent 是什么   去年的 SeleniumConf 上,Facebook 推出了一款新的iOS移动测试框架 —— WebDriverAgent,当时的推文上,写的还只支持模拟器 ...

  10. vue-cli中的element-ui的主题更改

    主题安装分为全局安装和局部安装(局部安装指的是项目内进行安装) 局部安装: 使用局部安装方便他人使用,他人直接安装主题需要的依赖就可以进行使用 局部安装的步骤 1.npm i element-them ...