一、numpy

NumPy的主要对象是同种元素的多维数组。这是一个所有的元素都是一种类型、通过一个正整数元组索引的元素表格(通常是元素是数字)。在NumPy中维度(dimensions)叫做轴(axes)，轴的个数叫做秩(rank)。

例如，在3D空间一个点的坐标 [1, 2, 3] 是一个秩为1的数组，因为它只有一个轴。那个轴长度为3.又例如，在以下例子中，数组的秩为2(它有两个维度).第一个维度长度为2,第二个维度长度为3.

[[ 1., 0., 0.],

 [ 0., 1., 2.]]

NumPy的数组类被称作 ndarray 。通常被称作数组。注意numpy.array和标准Python库类array.array并不相同，后者只处理一维数组和提供少量功能。更多重要ndarray对象属性有：

ndarray.ndim

数组轴的个数，在python的世界中，轴的个数被称作秩
ndarray.shape

数组的维度。这是一个指示数组在每个维度上大小的整数元组。例如一个n排m列的矩阵，它的shape属性将是(2,3),这个元组的长度显然是秩，即维度或者ndim属性
ndarray.size

数组元素的总个数，等于shape属性中元组元素的乘积。
ndarray.dtype

一个用来描述数组中元素类型的对象，可以通过创造或指定dtype使用标准Python类型。另外NumPy提供它自己的数据类型。
ndarray.itemsize

数组中每个元素的字节大小。例如，一个元素类型为float64的数组itemsiz属性值为8(=64/8),又如，一个元素类型为complex32的数组item属性为4(=32/8).
ndarray.data

包含实际数组元素的缓冲区，通常我们不需要使用这个属性，因为我们总是通过索引来使用数组中的元素。

>>> from numpy  import *

>>> a = arange(15).reshape(3, 5)

>>> a

array([[ 0,  1,  2,  3,  4],

       [ 5,  6,  7,  8,  9],

       [10, 11, 12, 13, 14]])

>>> a.shape

(3, 5)

>>> a.ndim

2

>>> a.dtype.name

'int32'

>>> a.itemsize

4

>>> a.size

15

>>> type(a)

numpy.ndarray

>>> b = array([6, 7, 8])

>>> b

array([6, 7, 8])

>>> type(b)

numpy.ndarray

1、numpy.apply_along_axis

官方文档给的：

numpy.apply_along_axis(func1d, axis, arr, *args, **kwargs)

Apply a function to 1-D slices along the given axis.

Execute func1d(a, *args) where func1d operates on 1-D arrays and a is a 1-D slice of arr along axis.

Parameters:	func1d : function This function should accept 1-D arrays. It is applied to 1-D slices of arr along the specified axis. axis : integer Axis along which arr is sliced. arr : ndarray Input array. args : any Additional arguments to func1d. kwargs : any Additional named arguments to func1d. New in version 1.9.0.
Returns:	apply_along_axis : ndarray The output array. The shape of outarr is identical to the shape of arr, except along the axisdimension. This axis is removed, and replaced with new dimensions equal to the shape of the return value of func1d. So if func1d returns a scalar outarr will have one fewer dimensions than arr.

Parameters:

func1d : function

This function should accept 1-D arrays. It is applied to 1-D slices of arr along the specified axis.

axis : integer

Axis along which arr is sliced.

arr : ndarray

Input array.

args : any

Additional arguments to func1d.

kwargs : any

Additional named arguments to func1d.

New in version 1.9.0.

Returns:

apply_along_axis : ndarray

The output array. The shape of outarr is identical to the shape of arr, except along the axisdimension. This axis is removed, and replaced with new dimensions equal to the shape of the return value of func1d. So if func1d returns a scalar outarr will have one fewer dimensions than arr.

举例：

>>> def my_func(a):#定义了一个my_func()函数，接受一个array的参数

...     """Average first and last element of a 1-D array"""

...     return (a[0] + a[-1]) * 0.5 #返回array的第一个元素和最后一个元素的平均值

>>> b = np.array([[1,2,3], [4,5,6], [7,8,9]])

>>> np.apply_along_axis(my_func, 0, b)

array([ 4.,  5.,  6.])

>>> np.apply_along_axis(my_func, 1, b)

array([ 2.,  5.,  8.])

定义了一个my_func()函数，接受一个array的参数，然后返回array的第一个元素和最后一个元素的平均值，生成一个array：

np.apply_along_axis(my_func, 0, b)意思是说把b按列，传给my_func，即求出的是矩阵列元素中第一个和最后一个的平均值，结果为；

4. 5. 6.

np.apply_along_axis(my_func, 1, b)意思是说把b按行，传给my_func，即求出的是矩阵行元素中第一个和最后一个的平均值，结果为；

2. 5. 8.

参考：https://docs.scipy.org/doc/numpy/reference/generated/numpy.apply_along_axis.html

2、numpy.linalg.norm

（1）np.linalg.inv()：矩阵求逆
（2）np.linalg.det()：矩阵求行列式（标量）

np.linalg.norm

顾名思义，linalg=linear+algebra，norm则表示范数，首先需要注意的是范数是对向量（或者矩阵）的度量，是一个标量（scalar）：

首先help(np.linalg.norm)查看其文档：

norm(x, ord=None, axis=None, keepdims=False)

这里我们只对常用设置进行说明，x表示要度量的向量，ord表示范数的种类，

>>> x = np.array([3, 4])

>>> np.linalg.norm(x)

5.

>>> np.linalg.norm(x, ord=2)

5.

>>> np.linalg.norm(x, ord=1)

7.

>>> np.linalg.norm(x, ord=np.inf)

4

范数理论的一个小推论告诉我们：ℓ1≥ℓ2≥ℓ∞

参考：http://blog.csdn.net/lanchunhui/article/details/51004387

3、numpy.expand_dims

主要是把array的维度扩大

numpy.expand_dims(a, axis)

举例：

>>> x = np.array([1,2])

>>> x.shape

(2,)

shape是求矩阵形状的。

>>> y = np.expand_dims(x, axis=0)

>>> y

array([[1, 2]])

>>> y.shape

(1, 2)

维度扩大，axis=0

>>> y = np.expand_dims(x, axis=1)  # Equivalent to x[:,newaxis]

>>> y

array([[1],

       [2]])

>>> y.shape

(2, 1)

维度扩大，axis=1

4、numpy.transpose

矩阵转置操作。

numpy.transpose(a, axes=None)

Permute the dimensions of an array.

Parameters:	a : array_like Input array. axes : list of ints, optional By default, reverse the dimensions, otherwise permute the axes according to the values given.
Returns:	p : ndarray a with its axes permuted. A view is returned whenever possible.

Parameters:

a : array_like

Input array.

axes : list of ints, optional

By default, reverse the dimensions, otherwise permute the axes according to the values given.

Returns:

p : ndarray

a with its axes permuted. A view is returned whenever possible.

举例：

>>> x = np.arange(4).reshape((2,2))

>>> x

array([[0, 1],

       [2, 3]])

>>> np.transpose(x)

array([[0, 2],

       [1, 3]])

>>> x=np.ones((1,2,3))

>>> x

array([[[ 1.,  1.,  1.],

        [ 1.,  1.,  1.]]])

>>> y=np.transpose(x,(1,0,2))

>>> y

array([[[ 1.,  1.,  1.]],

       [[ 1.,  1.,  1.]]])

>>> y.shape

(2, 1, 3)

>>>

实际上就是把相应的坐标位置交换。

np.transpose(x,(1,0,2)) ，表示x中坐标的第一个和第二个要互换。比如

array([[[ 1.,  1.,  1.]],

       [[ 1.,  1.,  1.]]])中的加粗的1，它的位置是（1,0,1），转换之后就变成了（1,0,2），把它从（1,0,1）这个位置，转移到（1,0,2）

看的具体一点：

>>> b = np.array([[1,2,3], [4,5,6], [7,8,9]])

>>> b

array([[1, 2, 3],

       [4, 5, 6],

       [7, 8, 9]])

>>> b.shape

(3, 3)

>>> c=np.transpose(b,(1,0))

>>> c

array([[1, 4, 7],

       [2, 5, 8],

       [3, 6, 9]])

>>>

这个操作依赖shape，实际上就是相应的坐标换位置，然后在从新放置元素。

二、skelearn

1.pca

1.1、函数原型及参数说明

sklearn.decomposition.PCA(n_components=None, copy=True, whiten=False)

参数说明：

n_components:

意义：PCA算法中所要保留的主成分个数n，也即保留下来的特征个数n

类型：int 或者 string，缺省时默认为None，所有成分被保留。

赋值为int，比如n_components=1，将把原始数据降到一个维度。

赋值为string，比如n_components='mle'，将自动选取特征个数n，使得满足所要求的方差百分比。

copy:

类型：bool，True或者False，缺省时默认为True。

意义：表示是否在运行算法时，将原始训练数据复制一份。若为True，则运行PCA算法后，原始训练数据的值不会有任何改变，因为是在原始数据的副本上进行运算；若为False，则运行PCA算法后，原始训练数据的值会改，因为是在原始数据上进行降维计算。

whiten:

类型：bool，缺省时默认为False

意义：白化，使得每个特征具有相同的方差。关于“白化”，可参考：Ufldl教程

1.2、PCA对象的属性

components_ ：返回具有最大方差的成分。

explained_variance_ratio_：返回所保留的n个成分各自的方差百分比。

n_components_：返回所保留的成分个数n。

mean_：

noise_variance_：

1.3、PCA对象的方法

fit(X,y=None)

fit()可以说是scikit-learn中通用的方法，每个需要训练的算法都会有fit()方法，它其实就是算法中的“训练”这一步骤。因为PCA是无监督学习算法，此处y自然等于None。

fit(X)，表示用数据X来训练PCA模型。

函数返回值：调用fit方法的对象本身。比如pca.fit(X)，表示用X对pca这个对象进行训练。

fit_transform(X)

用X来训练PCA模型，同时返回降维后的数据。

newX=pca.fit_transform(X)，newX就是降维后的数据。

inverse_transform()

将降维后的数据转换成原始数据，X=pca.inverse_transform(newX)

transform(X)

将数据X转换成降维后的数据。当模型训练好后，对于新输入的数据，都可以用transform方法来降维。

此外，还有get_covariance()、get_precision()、get_params(deep=True)、score(X, y=None)等方法，以后用到再补充吧。

1.4、example

以一组二维的数据data为例，data如下，一共12个样本（x,y），其实就是分布在直线y=x上的点，并且聚集在x=1、2、3、4上，各3个。

>>> data

array([[ 1.  ,  1.  ],

       [ 0.9 ,  0.95],

       [ 1.01,  1.03],

       [ 2.  ,  2.  ],

       [ 2.03,  2.06],

       [ 1.98,  1.89],

       [ 3.  ,  3.  ],

       [ 3.03,  3.05],

       [ 2.89,  3.1 ],

       [ 4.  ,  4.  ],

       [ 4.06,  4.02],

       [ 3.97,  4.01]])

data这组数据，有两个特征，因为两个特征是近似相等的，所以用一个特征就能表示了，即可以降到一维。下面就来看看怎么用sklearn中的PCA算法包。

（1）n_components设置为1，copy默认为True，可以看到原始数据data并未改变，newData是一维的，并且明显地将原始数据分成了四类。

>>> from sklearn.decomposition import PCA

>>> pca=PCA(n_components=1)

>>> newData=pca.fit_transform(data)

>>> newData

array([[-2.12015916],

       [-2.22617682],

       [-2.09185561],

       [-0.70594692],

       [-0.64227841],

       [-0.79795758],

       [ 0.70826533],

       [ 0.76485312],

       [ 0.70139695],

       [ 2.12247757],

       [ 2.17900746],

       [ 2.10837406]])

>>> data

array([[ 1.  ,  1.  ],

       [ 0.9 ,  0.95],

       [ 1.01,  1.03],

       [ 2.  ,  2.  ],

       [ 2.03,  2.06],

       [ 1.98,  1.89],

       [ 3.  ,  3.  ],

       [ 3.03,  3.05],

       [ 2.89,  3.1 ],

       [ 4.  ,  4.  ],

       [ 4.06,  4.02],

       [ 3.97,  4.01]])

（2）将copy设置为False，原始数据data将发生改变。

>>> pca=PCA(n_components=1,copy=False)

>>> newData=pca.fit_transform(data)

>>> data

array([[-1.48916667, -1.50916667],

       [-1.58916667, -1.55916667],

       [-1.47916667, -1.47916667],

       [-0.48916667, -0.50916667],

       [-0.45916667, -0.44916667],

       [-0.50916667, -0.61916667],

       [ 0.51083333,  0.49083333],

       [ 0.54083333,  0.54083333],

       [ 0.40083333,  0.59083333],

       [ 1.51083333,  1.49083333],

       [ 1.57083333,  1.51083333],

       [ 1.48083333,  1.50083333]])

（3）n_components设置为'mle'，看看效果，自动降到了1维。

>>> pca=PCA(n_components='mle')

>>> newData=pca.fit_transform(data)

>>> newData

array([[-2.12015916],

       [-2.22617682],

       [-2.09185561],

       [-0.70594692],

       [-0.64227841],

       [-0.79795758],

       [ 0.70826533],

       [ 0.76485312],

       [ 0.70139695],

       [ 2.12247757],

       [ 2.17900746],

       [ 2.10837406]])

（4）对象的属性值

>>> pca.n_components

1

>>> pca.explained_variance_ratio_

array([ 0.99910873])

>>> pca.explained_variance_

array([ 2.55427003])

>>> pca.get_params

<bound method PCA.get_params of PCA(copy=True, n_components=1, whiten=False)>

我们所训练的pca对象的n_components值为1，即保留1个特征，该特征的方差为2.55427003，占所有特征的方差百分比为0.99910873，意味着几乎保留了所有的信息。get_params返回各个参数的值。

（5）对象的方法

>>> newA=pca.transform(A)

对新的数据A，用已训练好的pca模型进行降维。

>>> pca.set_params(copy=False)

PCA(copy=False, n_components=1, whiten=False)

设置参数。

参考：http://doc.okbase.net/u012162613/archive/120946.html

2.svm

经常用到sklearn中的SVC函数，这里把文档中的参数翻译了一些，以备不时之需。

本身这个函数也是基于libsvm实现的，所以在参数设置上有很多相似的地方。（PS: libsvm中的二次规划问题的解决算法是SMO）。
sklearn.svm.SVC(C=1.0, kernel='rbf', degree=3, gamma='auto', coef0=0.0, shrinking=True, probability=False,

tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape=None,random_state=None)

参数：

l C：C-SVC的惩罚参数C?默认值是1.0

C越大，相当于惩罚松弛变量，希望松弛变量接近0，即对误分类的惩罚增大，趋向于对训练集全分对的情况，这样对训练集测试时准确率很高，但泛化能力弱。C值小，对误分类的惩罚减小，允许容错，将他们当成噪声点，泛化能力较强。

l kernel ：核函数，默认是rbf，可以是‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’

　　0 – 线性：u'v

　　 1 – 多项式：(gamma*u'*v + coef0)^degree

　　2 – RBF函数：exp(-gamma|u-v|^2)

　　3 –sigmoid：tanh(gamma*u'*v + coef0)

l degree ：多项式poly函数的维度，默认是3，选择其他核函数时会被忽略。

l gamma ： ‘rbf’,‘poly’ 和‘sigmoid’的核函数参数。默认是’auto’，则会选择1/n_features

l coef0 ：核函数的常数项。对于‘poly’和 ‘sigmoid’有用。

l probability ：是否采用概率估计？.默认为False

l shrinking ：是否采用shrinking heuristic方法，默认为true

l tol ：停止训练的误差值大小，默认为1e-3

l cache_size ：核函数cache缓存大小，默认为200

l class_weight ：类别的权重，字典形式传递。设置第几类的参数C为weight*C(C-SVC中的C)

l verbose ：允许冗余输出？

l max_iter ：最大迭代次数。-1为无限制。

l decision_function_shape ：‘ovo’, ‘ovr’ or None, default=None3

l random_state ：数据洗牌时的种子值，int值

主要调节的参数有：C、kernel、degree、gamma、coef0。

参考：http://blog.csdn.net/szlcw1/article/details/52336824

python中库学习的更多相关文章

python requests库学习笔记（上）
尊重博客园原创精神,请勿转载! requests库官方使用手册地址:http://www.python-requests.org/en/master/:中文使用手册地址:http://cn.pytho ...
python标准库学习-SimpleHTTPServer
这是一个专题记录学习python标准库的笔记及心得简单http服务 SimpleHTTPServer 使用 python -m SimpleHTTPServer 默认启动8000端口源码: &q ...
Python sh库学习上篇
官方文档有句话"allows you to call any program",并且:helps you write shell scripts in Python by givi ...
python中with学习
python中with是非常强大的一个管理器,我个人的理解就是,我们可以通过在我们的类里面自定义enter(self)和exit(self,err_type,err_value,err_tb)这两个内 ...
Python sh库学习
官方文档有句话"allows you to call any program",并且: helps you write shell scripts in Python by giv ...
【python标准库学习】thread，threading(一)多线程的介绍和使用
在单个程序中我们经常用多线程来处理不同的工作,尤其是有的工作需要等,那么我们会新建一个线程去等然后执行某些操作,当做完事后线程退出被回收.当一个程序运行时,就会有一个进程被系统所创建,同时也会有一个线 ...
Python turtle库学习笔记
1.简介 Python的turtle库的易操作,对初学者十分友好.对于初学者来说,刚学编程没多久可以写出许多有趣的可视化东西,这是对学习编程极大的鼓舞,可以树立对编程学习的信心.当然turtle本身也 ...
python requests库学习
Python 第三方 http 库-Requests 学习安装 Requests 1．通过pip安装 $ pip install requests 2．或者,下载代码后安装: $ git clone ...
Python中subprocess学习
subprocess的目的就是启动一个新的进程并且与之通信. subprocess模块中只定义了一个类: Popen.可以使用Popen来创建进程,并与进程进行复杂的交互.它的构造函数如下: subp ...

随机推荐

jquery tmpl学习资料 --{{each}} each使用
<!DOCTYPE html><html><head> <script src="Scripts/jquery-1.6.2.min.js&qu ...
WCF揭秘学习笔记（4）：可信赖会话、会话管理、队列、事务
可信赖会话 WCF的可信赖会话在绑定层保证消息只会被传输一次,并且保证消息间的顺序.当使用TCP通信时,协议本身保证了可靠性,但它只在两点间的网络包这个层面提供了这样的保证.WCF的可信赖会话特性保 ...
知识点查缺补漏贴02：Linux环境fork()函数详解
引言先来看一段代码吧, #include <sys/types.h> #include <unistd.h> #include <stdio.h> #includ ...
Mysql 於lampp xampp LinuxUbuntu下的配置
默认执行Lampp/Xampp 於Ubuntu下完成后,需要对mysql进行一系列的配置,方可进行更好的操作 lampp下的mysql配置文件路径: /opt/lampp/etc/my.cnf 1 配 ...
JpGraph使用详解之中文乱码解决方法
在前面的JpGraph使用详解这篇文章,已经对JpGraph的使用方法作了详细的交代,前面说好的,接下来解决中文乱码. JpGraph为什么会出现中文乱码在JpGraph中默认是要把字符串转成utf ...
unity3d的碰撞检测及trigger
A.基本概念要产生碰撞必须为游戏对象添加刚体(Rigidbody)和碰撞器,刚体可以让物体在物理影响下运动.碰撞体是物理组件的一类,它要与刚体一起添加到游戏对象上才能触发碰撞.如果两个刚体相互撞在一 ...
【洛谷】P1357 花园（状压+矩阵快速幂）
题目传送门:QWQ 分析因为m很小,考虑把所有状态压成m位二进制数. 那么总状态数小于$ 2^5 $. 如果状态$ i $能转移到$ j $,那么扔进一个矩阵,n次方快速幂一下. 答案是对角线之和 ...
Spring Security编程模型
1.采用spring进行权限控制 url权限控制 method权限控制实现:aop或者拦截器(本质就是之前之后进行控制)--------------------proxy就是 2.权限模型: 本质理 ...
Web 跨域请求(OCRS) 前端解决方案
1.同源策略如下: URL 说明是否允许通信 http://www.a.com/a.jshttp://www.a.com/b.js 同一域名下允许 http://www.a.com/lab/a.j ...
详解Tomcat配置及使用
2018年06月27日 23:42:34 尘埃丶落定阅读数:2351 版权声明:本文为博主原创文章,转载请附上作者与出处. https://blog.csdn.net/longyin0528/ ...

python中库学习