面向分布式强化学习的经验回放框架——Reverb: A Framework for Experience Replay

论文题目：

Reverb: A Framework for Experience Replay

地址：

https://arxiv.org/pdf/2102.04736.pdf

框架代码地址：

https://github.com/deepmind/reverb

相关文章：

面向分布式强化学习的经验回放框架（使用例子Demo）——Reverb: A Framework for Experience Replay

pip安装方式：（该方式大概率无法成功安装，此时可以参考本文底部的详细安装教程）

pip install dm-reverb

注意事项：

由于该框架是为TensorFlow所设计的，因此该框架的输入和输出变量均为TensorFlow的向量tensor，如果其他深度学习框架需要使用该分布式经验池框架则需要手动将输入和输出的变量转为numpy.array再进行转换，比如pytorch的tensor需要先转为numpy.array，然后再转为tensorflow.tensor 。

reverb框架和TensorFlow框架均为Google内部使用的框架，因此可以参考的使用案例和教程代码都很少，这也是Google的计算框架难以被外界使用的一个原因，对于reverb框架来说，没有比较成熟的教程代码，因此难以使用。

------------------------------------------------------------------

偶然间看到了这个experience replay框架，这个框架可以被看做是公开的工业界使用的面向分布式的经验回放框架，这方面的工作一直较少，可能这样的工作更偏向于工程而不是学术所以导致很少有人在做，即使是那些工业界也少有人在做这方面的工作，但是这样的工作还是蛮有必要的，毕竟算法这东西最后还是要服务于工业界的。

-------------------------------------------------------------------------

介绍一个reverb的函数:

reverb.rate_limiters.SampleToInsertRatio

帮助文档：

SampleToInsertRatio(samples_per_insert: float, min_size_to_sample: int, error_buffer: Union[float, Tuple[float, float]])
|
| Maintains a specified ratio between samples and inserts.
|
| The limiter works in two stages:
|
| Stage 1. Size of table is lt `min_size_to_sample`.
| Stage 2. Size of table is ge `min_size_to_sample`.
|
| During stage 1 the limiter works exactly like MinSize, i.e. it allows
| all insert calls and blocks all sample calls. Note that it is possible to
| transition into stage 1 from stage 2 when items are removed from the table.
|
| During stage 2 the limiter attempts to maintain the ratio
| `samples_per_inserts` between the samples and inserts. This is done by
| measuring the "error" in this ratio, calculated as:
|
| number_of_inserts * samples_per_insert - number_of_samples
|
| If `error_buffer` is a number and this quantity is larger than
| `min_size_to_sample * samples_per_insert + error_buffer` then insert calls
| will be blocked; sampling will be blocked for error less than
| `min_size_to_sample * samples_per_insert - error_buffer`.
|
| If `error_buffer` is a tuple of two numbers then insert calls will block if
| the error is larger than error_buffer[1], and sampling will block if the error
| is less than error_buffer[0].
|
| `error_buffer` exists to avoid unnecessary blocking for a system that is
| more or less in equilibrium.

该函数通过设置：samples_per_insert和error_buffer变量实现对sample和insert操作的权衡，主要思想就是如果sample的过少就阻塞insert操作；如果insert的太少就阻塞sample。

通过判断number_of_inserts * samples_per_insert - number_of_samples的值来判断现在的sample和insert操作的权衡情况，如果该值大于min_size_to_sample * samples_per_insert + error_buffer，那么说明insert的太多了，需要阻塞insert操作，此时sample可以正常继续；如果该值小于min_size_to_sample * samples_per_insert - error_buffer，那么说明此时sample的太多了，此时需要阻塞sample操作，而insert操作可以正常继续。

========================================================

这个框架的安装方法（ubuntu系统环境下）：

强化学习分布式经验回放框架（experience replay）reverb的安装

=====================================================