import numpy as np

The numpy.random module supplements(补充) the built-in Python random with functions for efficiently generating whole arrays of sample values from many kinds of probalility distributions. For example, you can get a 4x4 array of samples from the standard normal distribution using normal:

samples = np.random.normal(size=(4,4))
samples
array([[-0.49777854,  1.01894039,  0.3542692 ,  1.0187122 ],
[-0.07139068, -0.44245259, -2.05535526, 0.49974435],
[ 0.80183078, -0.11299759, 1.22759314, 1.37571884],
[ 0.32086762, -0.79930024, -0.31965109, 0.23004107]])

Python's built-in random module, by contrast(对比), only samples one value at a time. As you can see from this benchmark, numpy.random is well over an order of magnitude faster for generating very large samples:

from random import normalvariate

n = 100000

'python 运行时间:'
%timeit samples = [normalvariate(0,1) for _ in range(n)]
'python 运行时间:'
127 ms ± 7.72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
'np.random 运行时间:'
%timeit np.random.normal(size=n)
'np.random 运行时间:'
4.2 ms ± 277 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%time 

# 我后来还是喜欢 %%time 这样的计时方式

from random import normalvariate 

n = 1000000

'Python run time:'
samples = [normalvariate(0,1) for _ in range(n)]
Wall time: 1.08 s

We say that these are pseudorandom numbers(伪随机数) because they are generated by an algorithim with deterministic behavior(确定行为的算法生成的) You can change NumPy's random number generation seed number generation seed using np.random.seed:

"cj还是不理解seed(), 是让算法不改变吗? 每次都是同一个?"

np.random.seed(1234)
"现在理解了, seed(n),叫做随机种子, n表示在接下来的n次(抽样)的结果是一样的"
'cj还是不理解seed(), 是让算法不改变吗? 每次都是同一个?'
'现在理解了, seed(n),叫做随机种子, n表示在接下来的n次(抽样)的结果是一样的'

The data generation functions in numpy.random use a global random seed. To avoid global state, you can use numpy.random.RandomState to create a random number generator islolated(单独的) from others.

rng = np.random.RandomState(1234)

rng.randn(10)
array([ 0.47143516, -1.19097569,  1.43270697, -0.3126519 , -0.72058873,
0.88716294, 0.85958841, -0.6365235 , 0.01569637, -2.24268495])

See Table 4-8 for a partial list of functions available in numpy.random. I wil give some examples of leveraging(利用) these function's ablility to generate large arrays of samples all at onece in the next section.

  • seed Seed the random number generator (不明白这个函数还是)
  • permutation Return a random pemutation(排列) of a sequence, or return a permuted range
  • shuffle Randomly permute(转换) a sequence in-place (随机洗牌)

  • rand Draw samples from a uniform distribution U~[0, 1], rand(shape)(均匀分布, 每个值出现的可能性是一样的)
  • uniform U~[0, 1], uniform(low=0, high=1, size)
  • randint Draw random integers from a given low-to-high range. (a,b) 边界都是可以取到的

  • randn 标准正态分布\(N(\mu=0, \sigma=1)\), randn(shape)
  • normal 正态分布\(N(\mu, \sigma, size)\), normal(low=0, high=1, size)
  • binomial 二项分布
  • chisquare 卡方分布
  • beta
  • gamma
"permutaion(x), 产生0-x范围内x个随机自然数的一个排列"
np.random.permutation(6) "shuffle(seq) 将一个序列随机打乱, in-place 哦 "
arr1 = np.arange(6)
np.random.shuffle(arr1)
arr1
'permutaion(x), 产生0-x范围内x个随机自然数的一个排列'
array([1, 2, 5, 0, 3, 4])
'shuffle(seq) 将一个序列随机打乱, in-place 哦 '
array([5, 2, 3, 0, 4, 1])
"rand(shape) 0-1的均匀分布哦"
"shape=(2x2x3)-> 2里的每个1是3,每个1里面是1 "
"[[], []]"
"[ [ [6],[6],[6] ], [ [6],[6],[6] ] ]" np.random.rand(2,1,1) "uniform()"
np.random.uniform(3,4,5)
'rand(shape) 0-1的均匀分布哦'
'shape=(2x2x3)-> 2里的每个1是3,每个1里面是1 '
'[[], []]'
'[  [  [6],[6],[6] ],   [ [6],[6],[6]  ]  ]'
array([[[0.06152103]],

       [[0.8725525 ]]])
'uniform()'
array([3.35219682, 3.62783057, 3.51758469, 3.37434633, 3.64026243])
"randn(shape), 标准正态分布"
np.random.randn(1,2,3) "满足normal(loc=0, scale=1, size=None) 均值80, 标准差10"
np.random.normal(80, 20, 6)
'randn(shape), 标准正态分布'

array([[[0.49112636, 0.90638754, 0.05000051],
[1.21431522, 0.67847748, 1.3797269 ]]])
'满足normal(loc=0, scale=1, size=None) 均值80, 标准差10'

array([55.07130243, 56.34397557, 68.95608996, 31.40875572, 89.80741058,
37.38567435])
"binorma(n, p, size), n次试验, p次成功概率, 5个样本量"
np.random.binomial(100, 0.4, 5) "chisquare(10,2) 服从于自由度为10的卡方分布下的2个样本"
np.random.chisquare(10, 2)
'binorma(n, p, size), n次试验, p次成功概率, 样本量'

array([47, 42, 41, 34, 44])

array([20.07301382, 14.54581473])

Example:Random Walks

The simulation of random walks(随机漫步) provides an illustrative(实例) of utilizing(使用) array operations. Let's first consider a simple random walk starting at 0 with steps of 1 and -1 occuring(出现) with equal probalility. (1,-1 等概率出现)

Here is a pure Python way to implement a single random walk with 1000 steps using the built-in random module.

import random
import matplotlib.pyplot as plt def random_walk(position=0, steps=1000, walk=[]):
"""
position=0 # 初始位置
walk = [] # 结果列表
steps = 1000
"""
for i in range(steps):
state = 1 if random.randint(0,1) else -1
position += state
# 将每次位置存入结果列表中
walk.append(position) return walk # test
walk_result = random_walk() "plot the first 100 values on one of these random walks:"
"默认折线图" plt.plot(walk_result[:100])
'plot the first 100 values on one of these random walks:'

'默认折线图'

[<matplotlib.lines.Line2D at 0x19329c95208>]

You might make the observation that walk is simply the

cumulative(累积的) sum of the random steps and could be evaluate as an array expression, Thus, I use the np.random module to draw 1000 coin flips at once, set these to 1 and -1, and compute the cumulative sum:

nsteps = 1000

"用size代替for循环, 面向数组编程哦"
draws = np.random.randint(0, 2, size=nsteps) "np.where 三元很厉害, 代替if-else"
steps = np.where(draws > 0, 1, -1) 'cumsum 累积求和'
walk = steps.cumsum() plt.plot(walk)
'用size代替for循环, 面向数组编程哦'

'np.where 三元很厉害, 代替if-else'

'cumsum 累积求和'

[<matplotlib.lines.Line2D at 0x1932a032278>]

From this we can begin to extract(提取) statistics like the minmun and maxmun value along the walks trajectory(轨迹)

"walk的极值"
walk.min()
walk.max() type(walk)
'walk的极值'

-17

15

numpy.ndarray

# cj test cumsum()
cj_arr = np.array([[1,2,3],[4,5,6]])
cj_arr "cumsum()累积求和, 返回的是narray 可看到累积的过程哦"
np.cumsum(cj_arr, axis=0) # 往下, 显示没列的和 np.cumsum(cj_arr, axis=1) # 往右, 显示每行的和
array([[1, 2, 3],
[4, 5, 6]])
'cumsum()累积求和, 返回的是narray 可看到累积的过程哦'

array([[1, 2, 3],
[5, 7, 9]], dtype=int32)
array([[ 1,  3,  6],
[ 4, 9, 15]], dtype=int32)

A more complicated(更复杂的) statistic is the first crossing time, the step at which the random walk reaches a particular value. Here we might want to know how long it took the random walk to get at least 10 steps away from the orgin 0 in either direction. (距离原点为10, 需要多少次) np.abs(walk) >= 10 gives us a boolean array indicating(指明) where(是否) the walk has reached or exceeded 10, but we want the index of the first 10 or -10, Turn out(结果是), we can compute this using argmax, which returns the first index of maximum value in the boolean array(True is the maximum): -> argmax()返回数组中, 最大值的第一个索引, 配合maximum使用

"查询累积数组中, 绝对距离大于10的,的第一个值的索引 "
(np.abs(walk) >= 10).argmax()
'查询累积数组中, 绝对距离大于10的,的第一个值的索引 '

49

Note that(注意) using argmax here is not always efficient because it always makes a full scan of the array. -> 用argmax()来找最大值索引, 效率是不高的,因为需要遍历整个数组. In this special case, once a True is observed we know it to be the maximum value.

# cj test

np.max([1,3,2,5,7])

"maximum返回一个数组, 逐个比较传入数组的值,和第二个参数比较, quit其大,替换"
np.maximum([10,1,3,2,5,7],6)
7

'maximum返回一个数组, 逐个比较传入数组的值,和第二个参数比较, quit其大,替换'

array([10,  6,  6,  6,  6,  7])

"待补充中..."
np.maximum(np.abs(walk), 10).argmax()
530

# cj_arr1 = np.array([10,6,6,6,6,7]).argmax()
# cj_arr1
0

Simulating Many Random Walk at Once

If your goal to simulate many random walks, say 5000 of them, you can generate all of the random walks with minor modiffcations(微小的修改) to the preceding code(之前的代码). If passed a 2-tuple, the numpy.random functions will generate a two-dimensional array of draws, and we can compute the cumulative sum across the rows to compute all 5000 random walks in one shot:

# cj test
np.random.randint(0,2, size=(3,4))
array([[0, 0, 0, 1],
[0, 1, 0, 1],
[0, 0, 1, 1]])
"一次生成500次行走记录, 每次走1000不的大数据集5000 x 1000"
nwalks = 5000
nsteps = 1000 # 0 or 1
%time draws = np.random.randint(0, 2, size=(nwalks, nsteps)) "5000000万次, 这耗时也太短了,厉害了, 1秒=10^3ms" " >0, 1, <0, -1 "
steps = np.where(draws > 0, 1, -1) "轴1, 列方向, 右边, 按照每行累积"
walks = steps.cumsum(axis=1) walks
'一次生成500次行走记录, 每次走1000不的大数据集5000 x 1000'

Wall time: 58 ms

'5000000万次, 这耗时也太短了,厉害了'

' >0, 1, <0, -1 '

'轴1, 列方向, 右边, 按照每行累积'

array([[ -1,   0,  -1, ...,  22,  21,  22],
[ -1, -2, -3, ..., 52, 53, 54],
[ 1, 0, -1, ..., -20, -19, -20],
...,
[ -1, -2, -3, ..., -48, -47, -48],
[ 1, 2, 1, ..., -8, -7, -8],
[ -1, -2, -1, ..., -10, -11, -10]], dtype=int32)

Now, we can compute the maximun and minimum values obtained(获得) over all of the walks: ->获取最大值, 最小值

"整个数组的"

'最大值', walks.max()
'最小值', walks.min()
'整个数组的'

('最大值', 112)

('最小值', -109)

Out of these walks, let's compute the minimum crossing time to 30 or -30. This is slightly tricky(稍微有些棘手) because not all 5000 of them reach 30. We can check this using the any method:

hist30 = (np.abs(walks) >= 30).any(axis=1)  # 按照行

hist30

'统计 number that hit 30 or -30, 有多少行'
hist30.sum()
array([ True,  True,  True, ...,  True,  True,  True])

'统计 number that hit 30 or -30, 有多少行'

3441

We can use this boolean array to select out the rows of walks that actually cross the absolute 30 level an d call argmax across axis=1 to get the crossing times:

'cj待理解'
crossing_times = (np.abs(walks[hist30]) >= 30).argmax(1)
crossing_times.mean()
'cj待理解'

497.102877070619

Feel free to experiment(积极地尝试) with other distributions for the steps other than equal-sized coin flips(硬币试验). You need only use a different random number generation function, like normal to generate normally distribute steps with some mean and standard deviation(标准差)

import sys

steps = np.random.normal(loc=0, scale=0.25, size=(5000, 1000))

"查看这个对象占多少内存"

"{}的{}占用{}字节".format(steps.shape, steps.dtype, sys.getsizeof(steps))
'查看这个对象占多少内存'

'(5000, 1000)的float64占用40000112字节'

Conclusion

While(当然) much of the rest of the book will focus on building data wrangling(数据整理) skills with pandas, we will continue to work in a similar array-based style. In Appendix A, we will dig deeper(深入挖掘) into NumPy features to help you further develop your array computing skills.

NumPy 之 案例(随机漫步)的更多相关文章

  1. 今天给大家分享用Python matplotlib来写随机漫步的小程序

    先安装两个库: pip install matplotlib pip install numpy 引用库: import matplotlib.pyplot as mp import numpy as ...

  2. Python 项目实践二(生成数据)第二篇之随机漫步

    接着上节继续学习,在本节中,我们将使用Python来生成随机漫步数据,再使用matplotlib以引人瞩目的方式将这些数据呈现出来.随机漫步是这样行走得到的路径:每次行走都完全是随机的,没有明确的方向 ...

  3. Python实现随机漫步

    随机漫步生成是无规则的,是系统自行选择的结果.根据设定的规则自定生成,上下左右的方位,每次所经过的方向路径. 首先,创建一个RandomWalk()类和fill_walk()函数 random_wal ...

  4. 醉汉随机行走/随机漫步问题(Random Walk Randomized Algorithm Python)

    世界上有些问题看似是随机的(stochastic),没有规律可循,但很可能是人类还未发现和掌握这类事件的规律,所以说它们是随机发生的. 随机漫步(Random  Walk)是一种解决随机问题的方法,它 ...

  5. matplotlib之随机漫步

    # 随机漫步类 from random import choice from matplotlib import pyplot as plt from pylab import mpl from ma ...

  6. Python入门-随机漫步

    Python入门-随机漫步,贴代码吧,都在代码里面 代码1 class文件 random_walk.py from random import choice class RandomWalk(): # ...

  7. Numpy知识之随机散步实例

    类似于投硬币,简单的随机散步就是在前进一步(+1)和后退一步(-1)之间随机选择. 生成多个随机漫步. 并对多个随机漫步进行简单分析.

  8. 使用 numpy.random.choice随机采样

    使用 numpy.random.choice随机采样: 说明: numpy.random.choice(a, size=None, replace=True, p=None) 示例: >> ...

  9. 【Python】随机漫步

    创建Randomwalk()类 我们将使用Python来生成随机漫步数据,再使用matplotlib以引入瞩目的方式将这些数据呈现出来 首先创建类Randomwalk() from random im ...

随机推荐

  1. ESA2GJK1DH1K升级篇: 移植远程更新程序到STM32F103RET6型号的单片机,基于(GPRS模块AT指令TCP透传方式)

    前言 上节实现远程更新是更新的STM32F103C8T6的单片机 GPRS网络(Air202/SIM800)升级STM32: 测试STM32远程乒乓升级,基于(GPRS模块AT指令TCP透传方式),定 ...

  2. 【BZOJ4722】由乃

    [BZOJ4722]由乃 题面 bzoj 题解 考虑到区间长度为\(14\)时子集个数\(2^{14}>14\times 1000\),由抽屉原理,区间长度最多为\(13\)(长度大于这个值就一 ...

  3. 【luoguP2252】 取石子游戏

    题目链接 定义\(f[i][j]\)表示\(a=i,b=j\)时是必胜态还是必败态,博弈DP可以解决\(a,b \leq 100\) 的情况 然后就可以找规律了,发现\(f[i][j]=0\)的情况很 ...

  4. [题解向] Luogu4092 [HEOI2016/TJOI2016]树

    #\(\mathcal{\color{red}{Description}}\) \(Link\) 给定一棵以\(1\)为根的树,有两种操作: \(C: \ \ x\)给点\(x\)打上花标记. \(Q ...

  5. springboot2.x自定义拦截把static静态文件给拦截的坑

    新人新帖,喷后请指正,谢谢 1.王中王,坑中坑 和很多人一样,我在springboo自定义配置登录拦截的时候,写一个WebConfig类继承WebMvcConfigureAdapter,重写AddRe ...

  6. Hbase安装使用

    启动Hadoop 启动Hbase jps 进入shell 建立表及使用

  7. 在WIN7操作系统下,如何显示文件夹里文件的数目

    在文件夹的“查看”选项中选择“状态栏”,那么在任务栏中可以显示windows7系统中文件夹中文件的数目.具体操作请参照以下步骤. 1.首先开启电脑,在电脑中进入到任意一个文件夹中. 2.然后在文件夹界 ...

  8. 为什么需要动态SQL

    为什么需要动态SQL 在使用EF或者写SQL语句时,查询条件往往是这样一种非常常见的逻辑:如果客户填了查询信息,则查询该条件:如果客户没填,则返回所有数据. 我常常看到很多人解决这类问题时使用了错误的 ...

  9. Windows10 下 JAVA JDK版本设置修改操作

    一般情况下,先修改系统环境变量,右键点击桌面上的“此电脑”图标中,选择“属性”,在弹出的属性窗口中选择“高级系统设置”,然后点击“环境变量”     在弹出窗口中的“系统变量”,查到“JAVA_HOM ...

  10. Zuul之路由熔断

    Zuul作为Netflix组件,可以与Ribbon.Eureka.Hystrix等组件结合,实现负载均衡.熔断器的功能 Spring boot2X集成zuul与consul实现负载均衡和反向代理 当后 ...