[转自] 用Python做统计分析（Scipy.stats的文档）

对scipy.stats的详细介绍：

这个文档说了以下内容，对python如何做统计分析感兴趣的人可以看看，毕竟Python的库也有点乱。有的看上去应该在一起的内容分散在scipy,pandas,sympy等库中。这里是一般统计功能的使用，在scipy库中。像什么时间序列之类的当然在其他地方，而且它们反过来就没这些功能。

随机变量样本抽取

84个连续性分布（告诉你有那么多，没具体介绍）

12个离散型分布

分布的密度分布函数，累计分布函数，残存函数，分位点函数，逆残存函数

分布的统计量：均值，方差，峰度，偏度，矩

分布的线性变换生成

数据的分布拟合

分布构造

描述统计

t检验，ks检验，卡方检验，正态性检，同分布检验

核密度估计（从样本估计概率密度分布函数）

Statistics (scipy.stats)

Introduction

介绍

In this tutorial we discuss many, but certainly not all, features of scipy.stats. The intention here is to provide a user with a working knowledge of this package. We refer to the reference manual for further details.

在这个教程我们讨论一些而非全部的scipy.stats模块的特性。这里我们的意图是提供给使用者一个关于这个包的实用性知识。我们推荐reference manual来介绍更多的细节。

Note: This documentation is work in progress.

注意：这个文档还在发展中。

Random Variables

随机变量

There are two general distribution classes that have been implemented for encapsulating continuous random variables anddiscrete random variables . Over  continuous random variables (RVs) and  discrete random variables have been implemented using these classes. Besides this, new routines and distributions can easily added by the end user. (If you create one, please contribute it).

有一些通用的分布类被封装在continuous random variables以及discrete random variables中。有80多个连续性随机变量(RVs)以及10个离散随机变量已经用这些类建立。同样，新的程序和分布可以被用户新创建（如果你创建了一个，请提供它帮助发展这个包）。

All of the statistics functions are located in the sub-package scipy.stats and a fairly complete listing of these functions can be obtained using info(stats). The list of the random variables available can also be obtained from the docstring for the stats sub-package.

所有统计函数被放在子包scipy.stats中，且有这些函数的一个几乎完整的列表可以使用info(stats)获得。这个列表里的随机变量也可以从stats子包的docstring中获得介绍。

In the discussion below we mostly focus on continuous RVs. Nearly all applies to discrete variables also, but we point out some differences here: Specific Points for Discrete Distributions.

在接下来的讨论中，沃恩着重于连续性随机变量(RVs)。几乎所有离散变量也符合下面的讨论，但是我们也要指出一些区别在Specific Points for Discrete Distributions中。

Getting Help

获得帮助

First of all, all distributions are accompanied with help functions. To obtain just some basic information we can call

在开始前，所有分布可以使用help函数得到解释。为获得这些信息只需要使用简单的调用：

>>>

>>> from scipy import stats

>>> from scipy.stats import norm

>>> print norm.__doc__

To find the support, i.e., upper and lower bound of the distribution, call:

为了找到支持，作为例子，我们用这种方式找分布的上下界

>>>

>>> print 'bounds of distribution lower: %s, upper: %s' % (norm.a,norm.b)

bounds of distribution lower: -inf, upper: inf

We can list all methods and properties of the distribution with dir(norm). As it turns out, some of the methods are private methods although they are not named as such (their name does not start with a leading underscore), for example veccdf, are only available for internal calculation (those methods will give warnings when one tries to use them, and will be removed at some point).

我们可以通过调用dir(norm)来获得关于这个（正态）分布的所有方法和属性。应该看到，一些方法是私有方法尽管其并没有以名称表示出来（比如它们前面没有以下划线开头），比如veccdf就只用于内部计算（试图使用那些方法将引发警告，它们可能会在后续开发中被移除）

To obtain the real main methods, we list the methods of the frozen distribution. (We explain the meaning of a frozen distribution below).

为了获得真正的主要方法，我们列举冻结分布的方法（我们将在下文解释何谓“冻结分布”）

>>>

>>> rv = norm()

>>> dir(rv)  # reformatted

    ['__class__', '__delattr__', '__dict__', '__doc__', '__getattribute__',

    '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__',

    '__repr__', '__setattr__', '__str__', '__weakref__', 'args', 'cdf', 'dist',

    'entropy', 'isf', 'kwds', 'moment', 'pdf', 'pmf', 'ppf', 'rvs', 'sf', 'stats']

Finally, we can obtain the list of available distribution through introspection:

最后，我们能通过内省获得所有的可用分布。

>>>

>>> import warnings

>>> warnings.simplefilter('ignore', DeprecationWarning)

>>> dist_continu = [d for d in dir(stats) if

...                 isinstance(getattr(stats,d), stats.rv_continuous)]

>>> dist_discrete = [d for d in dir(stats) if

...                  isinstance(getattr(stats,d), stats.rv_discrete)]

>>> print 'number of continuous distributions:', len(dist_continu)

number of continuous distributions:

>>> print 'number of discrete distributions:  ', len(dist_discrete)

number of discrete distributions:   

Common Methods

通用方法

The main public methods for continuous RVs are:

连续随机变量的主要公共方法如下：

rvs: Random Variates

pdf: Probability Density Function

cdf: Cumulative Distribution Function

sf: Survival Function (-CDF)

ppf: Percent Point Function (Inverse of CDF)

isf: Inverse Survival Function (Inverse of SF)

stats: Return mean, variance, (Fisher’s) skew, or (Fisher’s) kurtosis

moment: non-central moments of the distribution

rvs:随机变量

pdf：概率密度函。

cdf：累计分布函数

sf：残存函数（-CDF）

ppf：分位点函数（CDF的逆）

isf：逆残存函数（sf的逆）

stats:返回均值，方差，（费舍尔）偏态，（费舍尔）峰度。

moment:分布的非中心矩。

Let’s take a normal RV as an example.

让我们取得一个标准的RV作为例子。

>>>

>>> norm.cdf()

0.5

To compute the cdf at a number of points, we can pass a list or a numpy array.

为了计算在一个点上的cdf，我们可以传递一个列表或一个numpy数组。

>>>

>>> norm.cdf([-., , ])

array([ 0.15865525,  0.5       ,  0.84134475])

>>> import numpy as np

>>> norm.cdf(np.array([-., , ]))

array([ 0.15865525,  0.5       ,  0.84134475])

Thus, the basic methods such as pdf, cdf, and so on are vectorized with np.vectorize.

Other generally useful methods are supported too:

相应的，像pdf,cdf之类的简单方法可以被矢量化通过np.vectorize.

其他游泳的方法可以像这样使用。

>>>

>>> norm.mean(), norm.std(), norm.var()

(0.0, 1.0, 1.0)

>>> norm.stats(moments = "mv")

(array(0.0), array(1.0))

To find the median of a distribution we can use the percent point function ppf, which is the inverse of the cdf:

为了找到一个分部的中心，我们可以使用分位数函数ppf，其是cdf的逆。

>>>

>>> norm.ppf(0.5)

0.0

To generate a set of random variates:

为了产生一个随机变量集合。

>>>

>>> norm.rvs(size=)

array([-0.35687759,  1.34347647, -0.11710531, -1.00725181, -0.51275702])

Don’t think that norm.rvs() generates  variates:

不要认为norm.rvs()产生了五个变量。

>>>

>>> norm.rvs()

7.131624370075814

This brings us, in fact, to the topic of the next subsection.

这个引导我们可以得以进入下一部分的内容。

Shifting and Scaling

位移与缩放（线性变换）

All continuous distributions take loc and scale as keyword parameters to adjust the location and scale of the distribution, e.g. for the standard normal distribution the location is the mean and the scale is the standard deviation.

所有连续分布可以操纵loc以及scale参数作为修正location和scale的方式。作为例子，标准正态分布的location是均值而scale是标准差。

>>>

>>> norm.stats(loc = , scale = , moments = "mv")

(array(3.0), array(16.0))

In general the standardized distribution for a random variable X is obtained through the transformation (X - loc) / scale. The default values are loc =  and scale = .

通常经标准化的分布的随机变量X可以通过变换(X-loc)/scale获得。它们的默认值是loc=0以及scale=.

Smart use of loc and scale can help modify the standard distributions in many ways. To illustrate the scaling further, the cdf of an exponentially distributed RV with mean /λ is given by

F(x)=−exp(−λx)

By applying the scaling rule above, it can be seen that by taking scale  = ./lambda we get the proper scale.

聪明的使用loc与scale可以帮助以灵活的方式调整标准分布。为了进一步说明缩放的效果，下面给出期望为1/λ指数分布的cdf。

F(x)=−exp(−λx)

通过像上面那样使用scale，可以看到得到想要的期望值。

>>>

>>> from scipy.stats import expon

>>> expon.mean(scale=.)

3.0

The uniform distribution is also interesting:

均匀分布也是令人感兴趣的：

>>>

>>> from scipy.stats import uniform

>>> uniform.cdf([, , , , , ], loc = , scale = )

array([ .  ,  .  ,  0.25,  0.5 ,  0.75,  .  ])

Finally, recall from the previous paragraph that we are left with the problem of the meaning of norm.rvs(). As it turns out, calling a distribution like this, the first argument, i.e., the , gets passed to set the loc parameter. Let’s see:

最后，联系起我们在前面段落中留下的norm.rvs()的问题。事实上，像这样调用一个分布，其第一个参数，在这里是5，是把loc参数调到了5，让我们看：

>>>

>>> np.mean(norm.rvs(, size=))

4.983550784784704

Thus, to explain the output of the example of the last section: norm.rvs() generates a normally distributed random variate with mean loc=.

I prefer to set the loc and scale parameter explicitly, by passing the values as keywords rather than as arguments. This is less of a hassle as it may seem. We clarify this below when we explain the topic of freezing a RV.

在这里，为解释最后一段的输出：norm.rvs()产生了一个正态分布变量，其期望，即loc=.

我倾向于明确的使用loc,scale作为关键字而非参数。这看上去只是个小麻烦。我们澄清这一点在我们解释冻结RV的主题之前。

[原创博文] 用Python做统计分析（Scipy.stats的文档）的更多相关文章

python 使用win32com实现对word文档批量替换页眉页脚
最近由于工作需要,需要将70个word文件的页眉页脚全部进行修改,在想到这个无聊/重复/没有任何技术含量的工作时,我的内心是相当奔溃的.就在我接近奔溃的时候我突然想到完全可以用python脚本来实现这 ...
python开发_xml.dom_解析XML文档_完整版_博主推荐
在阅读之前,你需要了解一些xml.dom的一些理论知识,在这里你可以对xml.dom有一定的了解,如果你阅读完之后. 下面是我做的demo 运行效果: 解析的XML文件位置:c:\\test\\hon ...
用python批量生成简单的xml文档
最近生成训练数据时,给一批无效的背景图片生成对应的xml文档,我用python写了一个简单的批量生成xml文档的demo,遇见了意外的小问题,记录一下. 报错问题为:ImportError: No m ...
使用sphinx自动提取python中的注释成为接口文档
写好了代码,交付给他人使用的时候,查看代码固然可以了解各类和函数的功能细节,但接口文档能更方便的查找和说明功能.所以,一价与代码同步的接口文档是很有必要的.sphinx可以根据python中的注释,自 ...
python模块学习---HTMLParser(解析HTML文档元素)
HTMLParser是Python自带的模块,使用简单,能够很容易的实现HTML文件的分析. 本文主要简单讲一下HTMLParser的用法. 使用时需要定义一个从类HTMLParser继承的类,重定义 ...
Python将数据渲染到docx文档指定位置
超简单Python将指定数据插入到docx模板渲染并生成最近有一个需求,制作劳动合同表,要从excel表格中将每个人的数据导入到docx劳动合同中,重复量很大,因此可以使用python高效解决.为了 ...
转载: pyExcelerator(Python操作Excel内库)API 文档
1.pyExcelerator 主要通过俩个Object操作Excel: Workbook 和Worksheet2.pyExcelerator 读取Excel文件 parase_xls(filenam ...
Remove 以及dorp做实验验证MongoDB删除文档后索引是否会自动删除
下面是实验步骤: > db.things.find(){ "_id" : ObjectId("5652d71a1524dc14663060e8"), &q ...
python读取文件下的所有文档
两类文档存储在两个路径下,假设每类文档有25个文档 def spamTest(): docList = [] classList = [] fullText = [] for i in range(1 ...

随机推荐

关于margin和padding的总结
总结一下: 要想实现如(图一)效果,(即一个div中的子元素与父元素有间距): 如果类名为.middle的父元素没有写border,则类名为firstChild的子元素设置margin-top,会导致 ...
php 错误 Strict Standards: PHP Strict Standards: Declaration of .... should be compatible with that of 解决办法
错误原因:这是由于 php 5.3版本后.要求继承类必须在父类之后定义.否则就会出现Strict Standards: PHP Strict Standards: Declaration of ... ...
asp.net 计算两个时间差
两个时间相差多少 .net中的timespan应用2008/11/10 11:54TimeSpan 对象表示时间间隔或持续时间,按正负天数.小时数.分钟数.秒数以及秒的小数部分进行度量.用于度量持续时 ...
elipse 调试反射 invoke 子类
真实案例: 调试一个接口,子类invoke的,结果断点断不到: 查找两个项目间的关联.依赖,无果. 问人吧,结果是配置文件没改成本机: #============================# # ...
CentOS 系统配置完jdk,tomcat mysql,nginx 项目发布步骤
1.启动项目之前,一定要先启动nginx服务重启nginx服务的命令:./nginx -s reload 2.然后启动三个tomcat的服务 3.如果只能进入tomcat,不能进入项目:把tomca ...
【故障处理】ORA-12162: TNS:net service name is incorrectly specified
本文将给大家阐述一个因未设置系统环境变量ORACLE_SID导致ORA-12162错误的案例.希望大家有所思考. 1.获得有关ORA-12162报错信息的通用表述信息[oracle@asdlabdb0 ...
JDBC Driver
Infomix Driver: com.informix.jdbc.IfxDriver JDBC URL : jdbc:informix-sqli://<host>:<port> ...
CryptoAPI与openssl RSA非对称加密解密(PKCS1 PADDING)交互
(以下代码中都只做测试用,有些地方没有释放内存...这个自己解决下) 1.RSA非对称的,首先提供一个供测试用的证书和私钥的数据 1)pem格式的证书和私钥(公私钥是对应的)的base64编码 voi ...
Visual对象之DrawingContext.DrawRectangle在有的状态下似乎并不能提高性能
很多书上都提到使用Visual对象绘制图形可以提高绘图效率,但是经过本人亲测,结果却发现DrawingContext.DrawRectangle的效率远低于使用UIElement.Children.A ...
WIN8外包公司【经验分享】——升级WIN8.1后VS2012报错解决方法
今天升级WIN8.1的时候发现VS2012不能正常工作,原来的Silverlight项目也无法正常打开了,这是WIN8.1升级产生的bug. 得知微软提供了VISUAL STUDIO 2012 UPD ...

[原创博文] 用Python做统计分析 （Scipy.stats的文档）

[转自] 用Python做统计分析 （Scipy.stats的文档）

[原创博文] 用Python做统计分析 （Scipy.stats的文档）的更多相关文章

随机推荐

热门专题

[原创博文] 用Python做统计分析（Scipy.stats的文档）

[转自] 用Python做统计分析（Scipy.stats的文档）

[原创博文] 用Python做统计分析（Scipy.stats的文档）的更多相关文章