numpy小结

《python数据科学》笔记在线版地址:https://github.com/jakevdp/PythonDataScienceHandbook

1、常用np简写

import numpy as np

2、nbarray

NumPy 最重要的一个特点是其 N 维数组对象 ndarray，它是一系列同类型数据的集合，以 0 下标为开始进行集合中元素的索引。

数组形状（shape）：表示各维度大小；

size：数组元素的总个数；

dtype：说明数据类型的对象；

ndim：维度

2.1 创建nbarray

array函数

#整数

np.array([1,2,5,4,3])

#浮点数，整数向上转换为浮点数

np.array([3.14, 4, 2, 3])

#指定数据类型

np.array([1,2,3,5],dtype='float32')

#多维

np.array([range(i,i+3) for i in [1.2.3]])

其他函数

#全0

np.zeros(10)

#全1

np.ones((3, 5))

#全空，未初始化的随意值

np.empty((2,3,2))

#全为3.14的数组

np.full((3, 5), 3.14)

#range数组版

np.arange(0, 20, 2)

#均匀分布（start，stop，num）true包括stop，false不包括stop

np.linspace(0, 1, 5)

#单位矩阵

np.eye(3)

#[0,1）间随机数

np.random.random((3, 3）

#正态分布随机数组（均值，标准差）

np.random.normal(0, 1, (3, 3))

#[0,10）随机整数

np.random.randint(0, 10, (3, 3))

2.2 ndaray的数据类型

numpy是c构造

Data type	Description
`bool_`	Boolean (True or False) stored as a byte
`int_`	Default integer type (same as C `long`; normally either `int64` or `int32`)
`intc`	Identical to C `int` (normally `int32` or `int64`)
`intp`	Integer used for indexing (same as C `ssize_t`; normally either `int32` or `int64`)
`int8`	Byte (-128 to 127)
`int16`	Integer (-32768 to 32767)
`int32`	Integer (-2147483648 to 2147483647)
`int64`	Integer (-9223372036854775808 to 9223372036854775807)
`uint8`	Unsigned integer (0 to 255)
`uint16`	Unsigned integer (0 to 65535)
`uint32`	Unsigned integer (0 to 4294967295)
`uint64`	Unsigned integer (0 to 18446744073709551615)
`float_`	Shorthand for `float64`.
`float16`	Half precision float: sign bit, 5 bits exponent, 10 bits mantissa
`float32`	Single precision float: sign bit, 8 bits exponent, 23 bits mantissa
`float64`	Double precision float: sign bit, 11 bits exponent, 52 bits mantissa
`complex_`	Shorthand for `complex128`.
`complex64`	Complex number, represented by two 32-bit floats
`complex128`	Complex number, represented by two 64-bit floats

3 数组操作

3.1数组索引

3.1.1简单索引

和python列表索引类似，从0开始计数，负值即从末尾索引，最后一位为-1。

一维数组x[i]；多维数组可用逗号分隔来获取。

eg：array（[1,2,3],

[2,3,7],

[3,4,5]）

x[0,0]=x[0][0]:1 x[2,1]:4

3.1.1花哨的索引

传递的是索引数组而不是单个标量，一次性获得多个数组元素。

结果的形状与索引数组一致

eg：[51 92 14 71 60 20 82 86 74 74]

In [4]:ind = np.array([[3, 7],

                      [4, 5]])

       x[ind]

Out[4]:array([[71, 86],

             [60, 20]])

返回的值是广播后的索引数组的形状。

广播：即数组计算的通用规则。

如果进行运算的大小数组不同，会将数组进行扩展，具体规则为:

①如果两个数组的维度数不相同，那么小维度数组的形状将会在最左边补1.

②如果两个数组的形状在任何一个维度上都不匹配，数组形状沿着维度为1的维度扩展以匹配另外一个数组的形状。

③如果两个数组的形状在任何一个维度上都不匹配且没有一个维度等于1将引发异常。

注意规则为在左边补全，在右边补不允许，可以使用变形数组来实现。

广播可以用来进行数组数据的归一化、基于二维函数显示数据等。

用法:

①花哨的索引还可以和其他索引方案进行组合形成更强大的索引操作。

x[[ 0 1 2 3]

[ 4 5 6 7]

[ 8 9 10 11]]

X[2, [2, 0, 1]]:array([10, 8, 9])

也可以和切片、掩码结合起来。

②可以用来选择随机点，如选择20个随机的不重复的索引值：

a=np.random.choice[X.shape[0],20,replace=False]

selection=X[a]

可以用来快速分割数据。

③可以用来修改值：x[i] -=10

重复的索引可能会和预期不同，at（）方法可以用来进行累加。

④数据区间划分

3.2数组切片

符号用：表示，x[start:stop:step],注意取值是[ )，step为负值表示为逆序

多维用冒号分割。，可用一个冒号表示空切片，在获取行时可省略。

数据切片返回的值是视图而不是副本，而在python列表中获取的是副本。

如果需要获取副本可用copy()实现。x[:2,:2].copy()

3.3数组变形

通过reshape（）函数实现，大小必须一致。

还可以通过newaxis关键字

3.4数组拼接和分裂

拼接主要由np.concatenate,np.vstack（垂直栈）,np.hstack（水平栈）实现

np.concatenate[]可拼接二维数组，axis=0/1

拼接主要由np.split,np.hsplit,np.vsplit来实现。np.dsplit将沿第三个维度分裂。

可以向函数传递一个索引列表作为参数，索引列表记录分裂点位置，N分裂点会有N+1个子数组。

4 通用函数

4.1函数

Numpy为很多操作提供了静态的可编译程序的接口，也被成为向量操作，可以很方便地对数组进行操作，一位内这种方法被用于将循环推送到Numpy下的编译层，所以速度会较快。

eg：np.array(5)/np.arrange(1,6)

通用函数使用python原生的算术运算符

`+`	`np.add`	Addition (e.g., `1 + 1 = 2`)
`-`	`np.subtract`	Subtraction (e.g., `3 - 2 = 1`)
`-`	`np.negative`	Unary negation (e.g., `-2`)
`*`	`np.multiply`	Multiplication (e.g., `2 * 3 = 6`)
`/`	`np.divide`	Division (e.g., `3 / 2 = 1.5`)
`//`	`np.floor_divide`	Floor division (e.g., `3 // 2 = 1`)
`**`	`np.power`	Exponentiation (e.g., `2 ** 3 = 8`)
`%`	`np.mod`	Modulus/remainder (e.g., `9 % 4 =`

绝对值函数：np.abs（）复数时返回模

三角函数：np.sin();np.cos();np.tan()；逆三角函数：np.arcsin();np.arccos();np.arctan()

指数函数：np.exp();np.ex2();np.power(a, )a为底数。x值较小时，可使用np.expm1(x) (exp(x)-1)

对数函数：np.log();np.log2();np.log10()。x值较小时，可使用np.log1p(x) (log(1+x))

专用的通用函数：scipy.special包含了一些专用计算函数。

4.2高级函数通用特性

指定输出:可以通过out参数指定存放位置。数组较大时慎用

聚合:reduce方法可指定重复执行。如对add调用会返回所有元素的和。需要存储中间结果时可使用accumulate

外积：可用outer方法获得两个不同输入数组所有元素对的函数运算结果。

5 聚合

多维度聚合，常用操作是沿着一行或一列聚合，可以通过axis参数指定轴的方向。

Function Name	NaN-safe Version	Description
`np.sum`	`np.nansum`	Compute sum of elements
`np.prod`	`np.nanprod`	Compute product of elements
`np.mean`	`np.nanmean`	Compute mean of elements
`np.std`	`np.nanstd`	Compute standard deviation
`np.var`	`np.nanvar`	Compute variance
`np.min`	`np.nanmin`	Find minimum value
`np.max`	`np.nanmax`	Find maximum value
`np.argmin`	`np.nanargmin`	Find index of minimum value
`np.argmax`	`np.nanargmax`	Find index of maximum value
`np.median`	`np.nanmedian`	Compute median of elements
`np.percentile`	`np.nanpercentile`	Compute rank-based statistics of elements
`np.any`	N/A	Evaluate whether any elements are true
`np.all`	N/A	Evaluate whether all elements are true

6 布尔掩码

Operator	Equivalent ufunc	Operator	Equivalent ufunc
`==`	`np.equal`	`!=`	`np.not_equal`
`<`	`np.less`	`<=`	`np.less_equal`
`>`	`np.greater`	`>=`	`np.greater_equal`

统计True个数：np.count_nonzero()/np.sum()

检查是否为True：np.any();np.all()

布尔运算符：A AND B and NOT (NOT A OR NOT B)

Operator	Equivalent ufunc		Operator	Equivalent ufunc
`&`	`np.bitwise_and`		\|	`np.bitwise_or`
`^`	`np.bitwise_xor`		`~`	`np.bitwise_not`

使用布尔数组作为掩码，通过掩码选择数据的子数据集。抽出小于5的：x[x<5]

and or判断整个对象的真假；逻辑操作符指每个对象的比特位。

7 数组的排序

np.sort()不修改原始数组；修改可用数组的sort（）

np.argsort()返回排好序的索引值‘。

部分排序：np.partition(数组,k)

都可用属性axis沿行或列进行排列。

8 结构化数据

通过指定复合数据类型可以构造结构化数组：

data = np.zeros(4, dtype={'names':('name', 'age', 'weight'),

                          'formats':('U10', 'i4', 'f8')})

然后可以通过索引或名称查找相应值，还可以用布尔掩码进行筛选等。

可以使用np.dtype构造

np.dtype({'names':('name', 'age', 'weight'),

          'formats':((np.str_, 10), int, np.float32)})

np.dtype([('name', 'S10'), ('age', 'i4'), ('weight', 'f8')])

np.dtype('S10,i4,f8')

output:dtype([('f0', 'S10'), ('f1', '<i4'), ('f2', '<f8')])

元素甚至可以指定为数组或矩阵。

np.recarray记录数组

numpy小结的更多相关文章

Numpy 小结
Python 真火来学习一下,先来看一个库 NumPy.NumPy是Python语言的一个扩充程序库.支持高级大量的维度数组与矩阵运算,此外也针对数组运算提供大量的数学函数库. 1. 读取文件 num ...
numpy小结（一）
1.np.zero(10) 创建一个包含10个元素的一维数组 np.ones((10,10)) 创建一个包含10*10个元素1的二维数组 2.np.arange(10,50) ...
python 基础及资料汇总
Python 包.模块.类以及代码文件和目录的一种管理方案 Numpy 小结用 Python 3 的 async / await 做异步编程 K-means 在 Python 中的实现 ...
numpy用法小结
前言个人感觉网上对numpy的总结感觉不够详尽细致,在这里我对numpy做个相对细致的小结吧,在数据分析与人工智能方面会有所涉及到的东西在这里都说说吧,也是对自己学习的一种小结! numpy用法的介 ...
Numpy 用法小结
1. asarray 函数可以将输入数据转化为矩阵格式. 输入数据可以是(列表,元组,列表的列表,元组的元组,元组的列表等这些数组形式). >>> asarray([(1,2,3 ...
numpy.random模块用法小结
原文作者:aircraft 原文链接:https://www.cnblogs.com/DOMLX/p/9751471.html 1.np.random.random()函数参数 np.random.r ...
scikit-learn随机森林调参小结
在Bagging与随机森林算法原理小结中,我们对随机森林(Random Forest, 以下简称RF)的原理做了总结.本文就从实践的角度对RF做一个总结.重点讲述scikit-learn中RF的调参注 ...
scikit-learn 梯度提升树(GBDT)调参小结
在梯度提升树(GBDT)原理小结中,我们对GBDT的原理做了总结,本文我们就从scikit-learn里GBDT的类库使用方法作一个总结,主要会关注调参中的一些要点. 1. scikit-learn ...
scikit-learn Adaboost类库使用小结
在集成学习之Adaboost算法原理小结中,我们对Adaboost的算法原理做了一个总结.这里我们就从实用的角度对scikit-learn中Adaboost类库的使用做一个小结,重点对调参的注意事项做 ...

随机推荐

匹配字符串中的s开头的单词，并替换
String s="now it's sping,but today is so cold!"; String a=s.replaceAll("s\\w+",& ...
2018-2019-2 20165236郭金涛《网络对抗》Exp1 PC平台逆向破解
2018-2019-2 20165236郭金涛<网络对抗>Exp1 PC平台逆向破解一.实验内容 1.掌握NOP, JNE, JE, JMP, CMP汇编指令的机器码(0.5分) 2.掌 ...
mac 添加环境变量（jmeter添加至环境变量中）
Mac系统的环境变量,加载顺序为:a. /etc/profileb. /etc/pathsc. ~/.bash_profiled. ~/.bash_logine. ~/.profilef. ~/.ba ...
python操作email
python操作email 参考链接: python官网imaplib: https://docs.python.org/2/library/imaplib.html Python 用IMAP接收邮件 ...
office word memo
显示左侧目录树 office 和 wps 的差异 wps 的版本:视窗 ->文档结构图 office 的版本: 视图 ->导航窗格
LigerUI之Grid使用详解(一)——显示数据 --分页
http://www.cnblogs.com/jerehedu/p/4218560.html 首先给大家介绍最常用的数据展示组件Grid,使用步骤如下: 1.页面中正确引入样式文件及相应组件 < ...
与图论的邂逅03：Lengauer-Tarjan
回想一下,当我们在肝无向图连通性时,我们会遇到一个神奇的点——它叫割点.假设现在有一个无向图,它有一个割点,也就是说把割点删了之后图会分成两个联通块A,B.设点u∈A,v∈B,在原图中他们能够互相到达 ...
EF性能检测工具MiniProfilerEF6的使用
一.在VS项目中分别安装包MiniProfiler.MiniProfiler.EF6.MiniProfiler.MVC4 二.在Global.asax文件的Application_BeginReque ...
Redis详解与常见问题解决方案
Redis简介 redis是一个key-value存储系统.和Memcached类似,它支持存储的value类型相对更多,包括string(字符串).list(链表).set(集合).zset(sor ...
BeyondCompare使用一段时间后会因“许可证密钥已被撤销：3281-0350“而无法使用
解决方式: 1.用任意文本编辑软件打开“C:\Users\[Your User Name]\AppData\Roaming\Scooter Software\Beyond Compare 3\BCSt ...

numpy小结

numpy小结的更多相关文章

随机推荐

热门专题