NumPy 之 案例(随机漫步)
import numpy as np
The numpy.random module supplements(补充) the built-in Python random with functions for efficiently generating whole arrays of sample values from many kinds of probalility distributions. For example, you can get a 4x4 array of samples from the standard normal distribution using normal:
samples = np.random.normal(size=(4,4))
samples
array([[-0.49777854, 1.01894039, 0.3542692 , 1.0187122 ],
[-0.07139068, -0.44245259, -2.05535526, 0.49974435],
[ 0.80183078, -0.11299759, 1.22759314, 1.37571884],
[ 0.32086762, -0.79930024, -0.31965109, 0.23004107]])
Python's built-in random module, by contrast(对比), only samples one value at a time. As you can see from this benchmark, numpy.random is well over an order of magnitude faster for generating very large samples:
from random import normalvariate
n = 100000
'python 运行时间:'
%timeit samples = [normalvariate(0,1) for _ in range(n)]
'python 运行时间:'
127 ms ± 7.72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
'np.random 运行时间:'
%timeit np.random.normal(size=n)
'np.random 运行时间:'
4.2 ms ± 277 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%time
# 我后来还是喜欢 %%time 这样的计时方式
from random import normalvariate
n = 1000000
'Python run time:'
samples = [normalvariate(0,1) for _ in range(n)]
Wall time: 1.08 s
We say that these are pseudorandom numbers(伪随机数) because they are generated by an algorithim with deterministic behavior(确定行为的算法生成的) You can change NumPy's random number generation seed number generation seed using np.random.seed:
"cj还是不理解seed(), 是让算法不改变吗? 每次都是同一个?"
np.random.seed(1234)
"现在理解了, seed(n),叫做随机种子, n表示在接下来的n次(抽样)的结果是一样的"
'cj还是不理解seed(), 是让算法不改变吗? 每次都是同一个?'
'现在理解了, seed(n),叫做随机种子, n表示在接下来的n次(抽样)的结果是一样的'
The data generation functions in numpy.random use a global random seed. To avoid global state, you can use numpy.random.RandomState to create a random number generator islolated(单独的) from others.
rng = np.random.RandomState(1234)
rng.randn(10)
array([ 0.47143516, -1.19097569, 1.43270697, -0.3126519 , -0.72058873,
0.88716294, 0.85958841, -0.6365235 , 0.01569637, -2.24268495])
See Table 4-8 for a partial list of functions available in numpy.random. I wil give some examples of leveraging(利用) these function's ablility to generate large arrays of samples all at onece in the next section.
- seed Seed the random number generator (不明白这个函数还是)
- permutation Return a random pemutation(排列) of a sequence, or return a permuted range
- shuffle Randomly permute(转换) a sequence in-place (随机洗牌)
- rand Draw samples from a uniform distribution U~[0, 1], rand(shape)(均匀分布, 每个值出现的可能性是一样的)
- uniform U~[0, 1], uniform(low=0, high=1, size)
- randint Draw random integers from a given low-to-high range. (a,b) 边界都是可以取到的
- randn 标准正态分布\(N(\mu=0, \sigma=1)\), randn(shape)
- normal 正态分布\(N(\mu, \sigma, size)\), normal(low=0, high=1, size)
- binomial 二项分布
- chisquare 卡方分布
- beta
- gamma
"permutaion(x), 产生0-x范围内x个随机自然数的一个排列"
np.random.permutation(6)
"shuffle(seq) 将一个序列随机打乱, in-place 哦 "
arr1 = np.arange(6)
np.random.shuffle(arr1)
arr1
'permutaion(x), 产生0-x范围内x个随机自然数的一个排列'
array([1, 2, 5, 0, 3, 4])
'shuffle(seq) 将一个序列随机打乱, in-place 哦 '
array([5, 2, 3, 0, 4, 1])
"rand(shape) 0-1的均匀分布哦"
"shape=(2x2x3)-> 2里的每个1是3,每个1里面是1 "
"[[], []]"
"[ [ [6],[6],[6] ], [ [6],[6],[6] ] ]"
np.random.rand(2,1,1)
"uniform()"
np.random.uniform(3,4,5)
'rand(shape) 0-1的均匀分布哦'
'shape=(2x2x3)-> 2里的每个1是3,每个1里面是1 '
'[[], []]'
'[ [ [6],[6],[6] ], [ [6],[6],[6] ] ]'
array([[[0.06152103]],
[[0.8725525 ]]])
'uniform()'
array([3.35219682, 3.62783057, 3.51758469, 3.37434633, 3.64026243])
"randn(shape), 标准正态分布"
np.random.randn(1,2,3)
"满足normal(loc=0, scale=1, size=None) 均值80, 标准差10"
np.random.normal(80, 20, 6)
'randn(shape), 标准正态分布'
array([[[0.49112636, 0.90638754, 0.05000051],
[1.21431522, 0.67847748, 1.3797269 ]]])
'满足normal(loc=0, scale=1, size=None) 均值80, 标准差10'
array([55.07130243, 56.34397557, 68.95608996, 31.40875572, 89.80741058,
37.38567435])
"binorma(n, p, size), n次试验, p次成功概率, 5个样本量"
np.random.binomial(100, 0.4, 5)
"chisquare(10,2) 服从于自由度为10的卡方分布下的2个样本"
np.random.chisquare(10, 2)
'binorma(n, p, size), n次试验, p次成功概率, 样本量'
array([47, 42, 41, 34, 44])
array([20.07301382, 14.54581473])
Example:Random Walks
The simulation of random walks(随机漫步) provides an illustrative(实例) of utilizing(使用) array operations. Let's first consider a simple random walk starting at 0 with steps of 1 and -1 occuring(出现) with equal probalility. (1,-1 等概率出现)
Here is a pure Python way to implement a single random walk with 1000 steps using the built-in random module.
import random
import matplotlib.pyplot as plt
def random_walk(position=0, steps=1000, walk=[]):
"""
position=0 # 初始位置
walk = [] # 结果列表
steps = 1000
"""
for i in range(steps):
state = 1 if random.randint(0,1) else -1
position += state
# 将每次位置存入结果列表中
walk.append(position)
return walk
# test
walk_result = random_walk()
"plot the first 100 values on one of these random walks:"
"默认折线图"
plt.plot(walk_result[:100])
'plot the first 100 values on one of these random walks:'
'默认折线图'
[<matplotlib.lines.Line2D at 0x19329c95208>]

You might make the observation that walk is simply the
cumulative(累积的) sum of the random steps and could be evaluate as an array expression, Thus, I use the np.random module to draw 1000 coin flips at once, set these to 1 and -1, and compute the cumulative sum:
nsteps = 1000
"用size代替for循环, 面向数组编程哦"
draws = np.random.randint(0, 2, size=nsteps)
"np.where 三元很厉害, 代替if-else"
steps = np.where(draws > 0, 1, -1)
'cumsum 累积求和'
walk = steps.cumsum()
plt.plot(walk)
'用size代替for循环, 面向数组编程哦'
'np.where 三元很厉害, 代替if-else'
'cumsum 累积求和'
[<matplotlib.lines.Line2D at 0x1932a032278>]

From this we can begin to extract(提取) statistics like the minmun and maxmun value along the walks trajectory(轨迹)
"walk的极值"
walk.min()
walk.max()
type(walk)
'walk的极值'
-17
15
numpy.ndarray
# cj test cumsum()
cj_arr = np.array([[1,2,3],[4,5,6]])
cj_arr
"cumsum()累积求和, 返回的是narray 可看到累积的过程哦"
np.cumsum(cj_arr, axis=0) # 往下, 显示没列的和
np.cumsum(cj_arr, axis=1) # 往右, 显示每行的和
array([[1, 2, 3],
[4, 5, 6]])
'cumsum()累积求和, 返回的是narray 可看到累积的过程哦'
array([[1, 2, 3],
[5, 7, 9]], dtype=int32)
array([[ 1, 3, 6],
[ 4, 9, 15]], dtype=int32)
A more complicated(更复杂的) statistic is the first crossing time, the step at which the random walk reaches a particular value. Here we might want to know how long it took the random walk to get at least 10 steps away from the orgin 0 in either direction. (距离原点为10, 需要多少次) np.abs(walk) >= 10 gives us a boolean array indicating(指明) where(是否) the walk has reached or exceeded 10, but we want the index of the first 10 or -10, Turn out(结果是), we can compute this using argmax, which returns the first index of maximum value in the boolean array(True is the maximum): -> argmax()返回数组中, 最大值的第一个索引, 配合maximum使用
"查询累积数组中, 绝对距离大于10的,的第一个值的索引 "
(np.abs(walk) >= 10).argmax()
'查询累积数组中, 绝对距离大于10的,的第一个值的索引 '
49
Note that(注意) using argmax here is not always efficient because it always makes a full scan of the array. -> 用argmax()来找最大值索引, 效率是不高的,因为需要遍历整个数组. In this special case, once a True is observed we know it to be the maximum value.
# cj test
np.max([1,3,2,5,7])
"maximum返回一个数组, 逐个比较传入数组的值,和第二个参数比较, quit其大,替换"
np.maximum([10,1,3,2,5,7],6)
7
'maximum返回一个数组, 逐个比较传入数组的值,和第二个参数比较, quit其大,替换'
array([10, 6, 6, 6, 6, 7])
"待补充中..."
np.maximum(np.abs(walk), 10).argmax()
530
# cj_arr1 = np.array([10,6,6,6,6,7]).argmax()
# cj_arr1
0
Simulating Many Random Walk at Once
If your goal to simulate many random walks, say 5000 of them, you can generate all of the random walks with minor modiffcations(微小的修改) to the preceding code(之前的代码). If passed a 2-tuple, the numpy.random functions will generate a two-dimensional array of draws, and we can compute the cumulative sum across the rows to compute all 5000 random walks in one shot:
# cj test
np.random.randint(0,2, size=(3,4))
array([[0, 0, 0, 1],
[0, 1, 0, 1],
[0, 0, 1, 1]])
"一次生成500次行走记录, 每次走1000不的大数据集5000 x 1000"
nwalks = 5000
nsteps = 1000
# 0 or 1
%time draws = np.random.randint(0, 2, size=(nwalks, nsteps))
"5000000万次, 这耗时也太短了,厉害了, 1秒=10^3ms"
" >0, 1, <0, -1 "
steps = np.where(draws > 0, 1, -1)
"轴1, 列方向, 右边, 按照每行累积"
walks = steps.cumsum(axis=1)
walks
'一次生成500次行走记录, 每次走1000不的大数据集5000 x 1000'
Wall time: 58 ms
'5000000万次, 这耗时也太短了,厉害了'
' >0, 1, <0, -1 '
'轴1, 列方向, 右边, 按照每行累积'
array([[ -1, 0, -1, ..., 22, 21, 22],
[ -1, -2, -3, ..., 52, 53, 54],
[ 1, 0, -1, ..., -20, -19, -20],
...,
[ -1, -2, -3, ..., -48, -47, -48],
[ 1, 2, 1, ..., -8, -7, -8],
[ -1, -2, -1, ..., -10, -11, -10]], dtype=int32)
Now, we can compute the maximun and minimum values obtained(获得) over all of the walks: ->获取最大值, 最小值
"整个数组的"
'最大值', walks.max()
'最小值', walks.min()
'整个数组的'
('最大值', 112)
('最小值', -109)
Out of these walks, let's compute the minimum crossing time to 30 or -30. This is slightly tricky(稍微有些棘手) because not all 5000 of them reach 30. We can check this using the any method:
hist30 = (np.abs(walks) >= 30).any(axis=1) # 按照行
hist30
'统计 number that hit 30 or -30, 有多少行'
hist30.sum()
array([ True, True, True, ..., True, True, True])
'统计 number that hit 30 or -30, 有多少行'
3441
We can use this boolean array to select out the rows of walks that actually cross the absolute 30 level an d call argmax across axis=1 to get the crossing times:
'cj待理解'
crossing_times = (np.abs(walks[hist30]) >= 30).argmax(1)
crossing_times.mean()
'cj待理解'
497.102877070619
Feel free to experiment(积极地尝试) with other distributions for the steps other than equal-sized coin flips(硬币试验). You need only use a different random number generation function, like normal to generate normally distribute steps with some mean and standard deviation(标准差)
import sys
steps = np.random.normal(loc=0, scale=0.25, size=(5000, 1000))
"查看这个对象占多少内存"
"{}的{}占用{}字节".format(steps.shape, steps.dtype, sys.getsizeof(steps))
'查看这个对象占多少内存'
'(5000, 1000)的float64占用40000112字节'
Conclusion
While(当然) much of the rest of the book will focus on building data wrangling(数据整理) skills with pandas, we will continue to work in a similar array-based style. In Appendix A, we will dig deeper(深入挖掘) into NumPy features to help you further develop your array computing skills.
NumPy 之 案例(随机漫步)的更多相关文章
- 今天给大家分享用Python matplotlib来写随机漫步的小程序
先安装两个库: pip install matplotlib pip install numpy 引用库: import matplotlib.pyplot as mp import numpy as ...
- Python 项目实践二(生成数据)第二篇之随机漫步
接着上节继续学习,在本节中,我们将使用Python来生成随机漫步数据,再使用matplotlib以引人瞩目的方式将这些数据呈现出来.随机漫步是这样行走得到的路径:每次行走都完全是随机的,没有明确的方向 ...
- Python实现随机漫步
随机漫步生成是无规则的,是系统自行选择的结果.根据设定的规则自定生成,上下左右的方位,每次所经过的方向路径. 首先,创建一个RandomWalk()类和fill_walk()函数 random_wal ...
- 醉汉随机行走/随机漫步问题(Random Walk Randomized Algorithm Python)
世界上有些问题看似是随机的(stochastic),没有规律可循,但很可能是人类还未发现和掌握这类事件的规律,所以说它们是随机发生的. 随机漫步(Random Walk)是一种解决随机问题的方法,它 ...
- matplotlib之随机漫步
# 随机漫步类 from random import choice from matplotlib import pyplot as plt from pylab import mpl from ma ...
- Python入门-随机漫步
Python入门-随机漫步,贴代码吧,都在代码里面 代码1 class文件 random_walk.py from random import choice class RandomWalk(): # ...
- Numpy知识之随机散步实例
类似于投硬币,简单的随机散步就是在前进一步(+1)和后退一步(-1)之间随机选择. 生成多个随机漫步. 并对多个随机漫步进行简单分析.
- 使用 numpy.random.choice随机采样
使用 numpy.random.choice随机采样: 说明: numpy.random.choice(a, size=None, replace=True, p=None) 示例: >> ...
- 【Python】随机漫步
创建Randomwalk()类 我们将使用Python来生成随机漫步数据,再使用matplotlib以引入瞩目的方式将这些数据呈现出来 首先创建类Randomwalk() from random im ...
随机推荐
- Spring Boot进阶系列三
Thymeleaf是官方推荐的显示引擎,这篇文章主要介绍怎么让spring boot整合Thymeleaf. 它是一个适用于Web和独立环境的现代服务器端Java模板引擎. Thymeleaf的主要 ...
- Linux文件内容查看相关命令
1.more命令 在Linux中,more命令是一个基于vi编辑器的文本过滤器,它能以全屏的方式按页显示文本文件的内容,more里面内置了一些快捷键. (1)命令语法 more(选项)(参数) (2) ...
- python 实例
进度条 import sys, time class ShowProcess(object): """ 显示处理进度的类 调用该类相关函数即可实现处理进度的显示 &quo ...
- Java学习:异常的概念
异常 异常概念 异常:指的是程序在执行过程中,出现的非正常的情况,最终导致JVM的非正常停止. 在Java等面向对象的编程语言中,异常本身是一个类,产生异常就是创建异常对象并抛出一个异常对象.Java ...
- MyBatis系列(二) MyBatis接口绑定与多参数传递
前言 通过上一篇博文的,已经可以做到通过MyBatis连接数据库,现在再来介绍一种方法通过接口绑定SQL语句. 不使用接口绑定的方式 不使用接口绑定的方式,是通过调用SqlSession中的selec ...
- vue实现跨域请求的设置
vue实现跨域请求,需要在vue.config.js里添加以下设置 proxy: { '/service/rest': { target: 'http://localhost:8080/autotab ...
- 示例:WPF开发的简单ObjectProperyForm用来绑定实体表单
原文:示例:WPF开发的简单ObjectProperyForm用来绑定实体表单 一.目的:自定义控件,用来直接绑定实体数据,简化开发周期 二.实现: 1.绑定实体对象 2.通过特性显示属性名称 3.通 ...
- .Net Core 指定编码格式的问题
我们在读取txt文件时,如果文件格式不是utf8,则获取的中文会乱码,所以要么另存文件为utf8格式,要么使用和文件相同的编码来读取. 如果文件为utf8,则: //一种 StreamReader s ...
- Sqoop import导入表时报错java.lang.ClassNotFoundException: org.json.JSONObject
报错原因:sqoop缺少java-json.jar包. 解决方案:一. 下载java-json.jar包地址:https://download.csdn.net/download/qq_2213643 ...
- Mysql中的排序查询
进阶3:排序查询 语法: select 查询列表 from 表 [where 筛选条件]order by 排序列表 [asc 升序 | desc降序] 例子 查询员工信息,要求工资从高到低 SELEC ...