NumPy 之 案例(随机漫步)
import numpy as np
The numpy.random module supplements(补充) the built-in Python random with functions for efficiently generating whole arrays of sample values from many kinds of probalility distributions. For example, you can get a 4x4 array of samples from the standard normal distribution using normal:
samples = np.random.normal(size=(4,4))
samples
array([[-0.49777854, 1.01894039, 0.3542692 , 1.0187122 ],
[-0.07139068, -0.44245259, -2.05535526, 0.49974435],
[ 0.80183078, -0.11299759, 1.22759314, 1.37571884],
[ 0.32086762, -0.79930024, -0.31965109, 0.23004107]])
Python's built-in random module, by contrast(对比), only samples one value at a time. As you can see from this benchmark, numpy.random is well over an order of magnitude faster for generating very large samples:
from random import normalvariate
n = 100000
'python 运行时间:'
%timeit samples = [normalvariate(0,1) for _ in range(n)]
'python 运行时间:'
127 ms ± 7.72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
'np.random 运行时间:'
%timeit np.random.normal(size=n)
'np.random 运行时间:'
4.2 ms ± 277 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%time
# 我后来还是喜欢 %%time 这样的计时方式
from random import normalvariate
n = 1000000
'Python run time:'
samples = [normalvariate(0,1) for _ in range(n)]
Wall time: 1.08 s
We say that these are pseudorandom numbers(伪随机数) because they are generated by an algorithim with deterministic behavior(确定行为的算法生成的) You can change NumPy's random number generation seed number generation seed using np.random.seed:
"cj还是不理解seed(), 是让算法不改变吗? 每次都是同一个?"
np.random.seed(1234)
"现在理解了, seed(n),叫做随机种子, n表示在接下来的n次(抽样)的结果是一样的"
'cj还是不理解seed(), 是让算法不改变吗? 每次都是同一个?'
'现在理解了, seed(n),叫做随机种子, n表示在接下来的n次(抽样)的结果是一样的'
The data generation functions in numpy.random use a global random seed. To avoid global state, you can use numpy.random.RandomState to create a random number generator islolated(单独的) from others.
rng = np.random.RandomState(1234)
rng.randn(10)
array([ 0.47143516, -1.19097569, 1.43270697, -0.3126519 , -0.72058873,
0.88716294, 0.85958841, -0.6365235 , 0.01569637, -2.24268495])
See Table 4-8 for a partial list of functions available in numpy.random. I wil give some examples of leveraging(利用) these function's ablility to generate large arrays of samples all at onece in the next section.
- seed Seed the random number generator (不明白这个函数还是)
- permutation Return a random pemutation(排列) of a sequence, or return a permuted range
- shuffle Randomly permute(转换) a sequence in-place (随机洗牌)
- rand Draw samples from a uniform distribution U~[0, 1], rand(shape)(均匀分布, 每个值出现的可能性是一样的)
- uniform U~[0, 1], uniform(low=0, high=1, size)
- randint Draw random integers from a given low-to-high range. (a,b) 边界都是可以取到的
- randn 标准正态分布\(N(\mu=0, \sigma=1)\), randn(shape)
- normal 正态分布\(N(\mu, \sigma, size)\), normal(low=0, high=1, size)
- binomial 二项分布
- chisquare 卡方分布
- beta
- gamma
"permutaion(x), 产生0-x范围内x个随机自然数的一个排列"
np.random.permutation(6)
"shuffle(seq) 将一个序列随机打乱, in-place 哦 "
arr1 = np.arange(6)
np.random.shuffle(arr1)
arr1
'permutaion(x), 产生0-x范围内x个随机自然数的一个排列'
array([1, 2, 5, 0, 3, 4])
'shuffle(seq) 将一个序列随机打乱, in-place 哦 '
array([5, 2, 3, 0, 4, 1])
"rand(shape) 0-1的均匀分布哦"
"shape=(2x2x3)-> 2里的每个1是3,每个1里面是1 "
"[[], []]"
"[ [ [6],[6],[6] ], [ [6],[6],[6] ] ]"
np.random.rand(2,1,1)
"uniform()"
np.random.uniform(3,4,5)
'rand(shape) 0-1的均匀分布哦'
'shape=(2x2x3)-> 2里的每个1是3,每个1里面是1 '
'[[], []]'
'[ [ [6],[6],[6] ], [ [6],[6],[6] ] ]'
array([[[0.06152103]],
[[0.8725525 ]]])
'uniform()'
array([3.35219682, 3.62783057, 3.51758469, 3.37434633, 3.64026243])
"randn(shape), 标准正态分布"
np.random.randn(1,2,3)
"满足normal(loc=0, scale=1, size=None) 均值80, 标准差10"
np.random.normal(80, 20, 6)
'randn(shape), 标准正态分布'
array([[[0.49112636, 0.90638754, 0.05000051],
[1.21431522, 0.67847748, 1.3797269 ]]])
'满足normal(loc=0, scale=1, size=None) 均值80, 标准差10'
array([55.07130243, 56.34397557, 68.95608996, 31.40875572, 89.80741058,
37.38567435])
"binorma(n, p, size), n次试验, p次成功概率, 5个样本量"
np.random.binomial(100, 0.4, 5)
"chisquare(10,2) 服从于自由度为10的卡方分布下的2个样本"
np.random.chisquare(10, 2)
'binorma(n, p, size), n次试验, p次成功概率, 样本量'
array([47, 42, 41, 34, 44])
array([20.07301382, 14.54581473])
Example:Random Walks
The simulation of random walks(随机漫步) provides an illustrative(实例) of utilizing(使用) array operations. Let's first consider a simple random walk starting at 0 with steps of 1 and -1 occuring(出现) with equal probalility. (1,-1 等概率出现)
Here is a pure Python way to implement a single random walk with 1000 steps using the built-in random module.
import random
import matplotlib.pyplot as plt
def random_walk(position=0, steps=1000, walk=[]):
"""
position=0 # 初始位置
walk = [] # 结果列表
steps = 1000
"""
for i in range(steps):
state = 1 if random.randint(0,1) else -1
position += state
# 将每次位置存入结果列表中
walk.append(position)
return walk
# test
walk_result = random_walk()
"plot the first 100 values on one of these random walks:"
"默认折线图"
plt.plot(walk_result[:100])
'plot the first 100 values on one of these random walks:'
'默认折线图'
[<matplotlib.lines.Line2D at 0x19329c95208>]

You might make the observation that walk is simply the
cumulative(累积的) sum of the random steps and could be evaluate as an array expression, Thus, I use the np.random module to draw 1000 coin flips at once, set these to 1 and -1, and compute the cumulative sum:
nsteps = 1000
"用size代替for循环, 面向数组编程哦"
draws = np.random.randint(0, 2, size=nsteps)
"np.where 三元很厉害, 代替if-else"
steps = np.where(draws > 0, 1, -1)
'cumsum 累积求和'
walk = steps.cumsum()
plt.plot(walk)
'用size代替for循环, 面向数组编程哦'
'np.where 三元很厉害, 代替if-else'
'cumsum 累积求和'
[<matplotlib.lines.Line2D at 0x1932a032278>]

From this we can begin to extract(提取) statistics like the minmun and maxmun value along the walks trajectory(轨迹)
"walk的极值"
walk.min()
walk.max()
type(walk)
'walk的极值'
-17
15
numpy.ndarray
# cj test cumsum()
cj_arr = np.array([[1,2,3],[4,5,6]])
cj_arr
"cumsum()累积求和, 返回的是narray 可看到累积的过程哦"
np.cumsum(cj_arr, axis=0) # 往下, 显示没列的和
np.cumsum(cj_arr, axis=1) # 往右, 显示每行的和
array([[1, 2, 3],
[4, 5, 6]])
'cumsum()累积求和, 返回的是narray 可看到累积的过程哦'
array([[1, 2, 3],
[5, 7, 9]], dtype=int32)
array([[ 1, 3, 6],
[ 4, 9, 15]], dtype=int32)
A more complicated(更复杂的) statistic is the first crossing time, the step at which the random walk reaches a particular value. Here we might want to know how long it took the random walk to get at least 10 steps away from the orgin 0 in either direction. (距离原点为10, 需要多少次) np.abs(walk) >= 10 gives us a boolean array indicating(指明) where(是否) the walk has reached or exceeded 10, but we want the index of the first 10 or -10, Turn out(结果是), we can compute this using argmax, which returns the first index of maximum value in the boolean array(True is the maximum): -> argmax()返回数组中, 最大值的第一个索引, 配合maximum使用
"查询累积数组中, 绝对距离大于10的,的第一个值的索引 "
(np.abs(walk) >= 10).argmax()
'查询累积数组中, 绝对距离大于10的,的第一个值的索引 '
49
Note that(注意) using argmax here is not always efficient because it always makes a full scan of the array. -> 用argmax()来找最大值索引, 效率是不高的,因为需要遍历整个数组. In this special case, once a True is observed we know it to be the maximum value.
# cj test
np.max([1,3,2,5,7])
"maximum返回一个数组, 逐个比较传入数组的值,和第二个参数比较, quit其大,替换"
np.maximum([10,1,3,2,5,7],6)
7
'maximum返回一个数组, 逐个比较传入数组的值,和第二个参数比较, quit其大,替换'
array([10, 6, 6, 6, 6, 7])
"待补充中..."
np.maximum(np.abs(walk), 10).argmax()
530
# cj_arr1 = np.array([10,6,6,6,6,7]).argmax()
# cj_arr1
0
Simulating Many Random Walk at Once
If your goal to simulate many random walks, say 5000 of them, you can generate all of the random walks with minor modiffcations(微小的修改) to the preceding code(之前的代码). If passed a 2-tuple, the numpy.random functions will generate a two-dimensional array of draws, and we can compute the cumulative sum across the rows to compute all 5000 random walks in one shot:
# cj test
np.random.randint(0,2, size=(3,4))
array([[0, 0, 0, 1],
[0, 1, 0, 1],
[0, 0, 1, 1]])
"一次生成500次行走记录, 每次走1000不的大数据集5000 x 1000"
nwalks = 5000
nsteps = 1000
# 0 or 1
%time draws = np.random.randint(0, 2, size=(nwalks, nsteps))
"5000000万次, 这耗时也太短了,厉害了, 1秒=10^3ms"
" >0, 1, <0, -1 "
steps = np.where(draws > 0, 1, -1)
"轴1, 列方向, 右边, 按照每行累积"
walks = steps.cumsum(axis=1)
walks
'一次生成500次行走记录, 每次走1000不的大数据集5000 x 1000'
Wall time: 58 ms
'5000000万次, 这耗时也太短了,厉害了'
' >0, 1, <0, -1 '
'轴1, 列方向, 右边, 按照每行累积'
array([[ -1, 0, -1, ..., 22, 21, 22],
[ -1, -2, -3, ..., 52, 53, 54],
[ 1, 0, -1, ..., -20, -19, -20],
...,
[ -1, -2, -3, ..., -48, -47, -48],
[ 1, 2, 1, ..., -8, -7, -8],
[ -1, -2, -1, ..., -10, -11, -10]], dtype=int32)
Now, we can compute the maximun and minimum values obtained(获得) over all of the walks: ->获取最大值, 最小值
"整个数组的"
'最大值', walks.max()
'最小值', walks.min()
'整个数组的'
('最大值', 112)
('最小值', -109)
Out of these walks, let's compute the minimum crossing time to 30 or -30. This is slightly tricky(稍微有些棘手) because not all 5000 of them reach 30. We can check this using the any method:
hist30 = (np.abs(walks) >= 30).any(axis=1) # 按照行
hist30
'统计 number that hit 30 or -30, 有多少行'
hist30.sum()
array([ True, True, True, ..., True, True, True])
'统计 number that hit 30 or -30, 有多少行'
3441
We can use this boolean array to select out the rows of walks that actually cross the absolute 30 level an d call argmax across axis=1 to get the crossing times:
'cj待理解'
crossing_times = (np.abs(walks[hist30]) >= 30).argmax(1)
crossing_times.mean()
'cj待理解'
497.102877070619
Feel free to experiment(积极地尝试) with other distributions for the steps other than equal-sized coin flips(硬币试验). You need only use a different random number generation function, like normal to generate normally distribute steps with some mean and standard deviation(标准差)
import sys
steps = np.random.normal(loc=0, scale=0.25, size=(5000, 1000))
"查看这个对象占多少内存"
"{}的{}占用{}字节".format(steps.shape, steps.dtype, sys.getsizeof(steps))
'查看这个对象占多少内存'
'(5000, 1000)的float64占用40000112字节'
Conclusion
While(当然) much of the rest of the book will focus on building data wrangling(数据整理) skills with pandas, we will continue to work in a similar array-based style. In Appendix A, we will dig deeper(深入挖掘) into NumPy features to help you further develop your array computing skills.
NumPy 之 案例(随机漫步)的更多相关文章
- 今天给大家分享用Python matplotlib来写随机漫步的小程序
先安装两个库: pip install matplotlib pip install numpy 引用库: import matplotlib.pyplot as mp import numpy as ...
- Python 项目实践二(生成数据)第二篇之随机漫步
接着上节继续学习,在本节中,我们将使用Python来生成随机漫步数据,再使用matplotlib以引人瞩目的方式将这些数据呈现出来.随机漫步是这样行走得到的路径:每次行走都完全是随机的,没有明确的方向 ...
- Python实现随机漫步
随机漫步生成是无规则的,是系统自行选择的结果.根据设定的规则自定生成,上下左右的方位,每次所经过的方向路径. 首先,创建一个RandomWalk()类和fill_walk()函数 random_wal ...
- 醉汉随机行走/随机漫步问题(Random Walk Randomized Algorithm Python)
世界上有些问题看似是随机的(stochastic),没有规律可循,但很可能是人类还未发现和掌握这类事件的规律,所以说它们是随机发生的. 随机漫步(Random Walk)是一种解决随机问题的方法,它 ...
- matplotlib之随机漫步
# 随机漫步类 from random import choice from matplotlib import pyplot as plt from pylab import mpl from ma ...
- Python入门-随机漫步
Python入门-随机漫步,贴代码吧,都在代码里面 代码1 class文件 random_walk.py from random import choice class RandomWalk(): # ...
- Numpy知识之随机散步实例
类似于投硬币,简单的随机散步就是在前进一步(+1)和后退一步(-1)之间随机选择. 生成多个随机漫步. 并对多个随机漫步进行简单分析.
- 使用 numpy.random.choice随机采样
使用 numpy.random.choice随机采样: 说明: numpy.random.choice(a, size=None, replace=True, p=None) 示例: >> ...
- 【Python】随机漫步
创建Randomwalk()类 我们将使用Python来生成随机漫步数据,再使用matplotlib以引入瞩目的方式将这些数据呈现出来 首先创建类Randomwalk() from random im ...
随机推荐
- android开发环境sdk manager无法更新问题
由于无法fq,也没有vpn,建议各位新手不用sdk manager去安装,直接下载bundle包,就不用去折腾各种开发环境了.推荐bundle下载地址:http://adt.android-studi ...
- cc2530的第三次实验,按键中断控制流水灯
cc2530的第三次实验:按键中断控制流水灯 效果为按一次按键,流水灯亮一次 实验相关电路图: 实验相关寄存器: 初始化函数 //初始化LED灯 //设置P1SEL,通用为0,外设为1 1111110 ...
- 10-day03-注释
Python注释 msg = “我爱你中国!” #单行注释使用 ''''''多行注释使用 print(msg) #this code is for >>>> ‘’‘print( ...
- jedis代码操作
一.jedis快速入门 * Jedis: 一款java操作redis数据库的工具. * 使用步骤: 1. 下载jedis的jar包 2. 使用 //1. 获取连接 Jedis jedis = new ...
- listings技巧
1. \lstdefinestyle 参考 https://blog.csdn.net/ProgramChangesWorld/article/details/52142313 我们在使用listin ...
- 线程那些坑 - HttpContext
之前记录Log的时候,直接把经常用过的Log帮助类拷贝过来直接用,其中日志的填写路径要么是固定值,然后读取配置文件,要么就是用上下文动态获 System.Web.HttpContext.Current ...
- c# 自定义按钮,渐变颜色(含中心向四周渐变,单方向渐变)
废话不多言,直接代码: public class RoundButton : Button { bool clickBool = false; //1.设置圆形 //2.设置渐变色 //3.设置too ...
- Hbase(一)了解Hbase与Phoenix
前言 HBase是一个分布式的.面向列的开源数据库,该技术来源于 Fay Chang 所撰写的Google论文“Bigtable:一个结构化数据的分布式存储系统”.就像Bigtable利用了Googl ...
- ORM之Dapper
ORM之Dapper 一.下载安装: nuget 搜索dapper安装 二.使用: 三.优缺点: 优点: 1.开源.轻量.单文件(代码就一个SqlMapper.cs文件,编译后就40K的一个很小的Dl ...
- HashMap源码2
public class test { @SuppressWarnings({ "rawtypes", "unchecked" }) public static ...