Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

2019-07-15 22:23:02

Paper: https://arxiv.org/pdf/1801.01290.pdf or Updated Version: https://arxiv.org/pdf/1812.05905.pdf

Project: https://sites.google.com/view/soft-actor-critic or https://sites.google.com/view/sac-and-applications/

TensorFlow: https://github.com/haarnoja/sac

PyTorch: https://github.com/vitchyr/rlkit

Demo video: https://www.youtube.com/channel/UCxXt8Br3-wyluz9Q08-fsaA

Good Related Blog: https://zhuanlan.zhihu.com/p/70360272

==== Video Related Tutorials (A2C, A3C):

A brief review of Actor-Critic Algorithms: 　　https://www.youtube.com/watch?v=aODdNpihRwM

CS885 Lecture 7b: Actor Critic: 　　　　　　 https://www.youtube.com/watch?v=5Ke-d1Itk3k

DRL Lecture 6: Actor-Critic: 　　　　　　　 https://www.youtube.com/watch?v=j82QLgfhFiY&t=27s

Build an A2C agent that learns to play Sonic with Tensorflow (tutorial): 　　https://www.youtube.com/watch?v=GCfUdkCL7FQ

Reinforcement Learning 6: Policy Gradients and Actor Critics (Deep Mind): 　　 https://www.youtube.com/watch?v=bRfUxQs6xIM&t=27s

Actor Critic (A3C) Tutorial: 　　　　　　　　https://www.youtube.com/watch?v=O5BlozCJBSE

Actor Critic Algorithms: 　　　　　　　　　 https://www.youtube.com/watch?v=w_3mmm0P0j8&t=2s

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor的更多相关文章

18 Issues in Current Deep Reinforcement Learning from ZhiHu
深度强化学习的18个关键问题 from: https://zhuanlan.zhihu.com/p/32153603 85 人赞了该文章深度强化学习的问题在哪里?未来怎么走?哪些方面可以突破? 这两 ...
(zhuan) Deep Reinforcement Learning Papers
Deep Reinforcement Learning Papers A list of recent papers regarding deep reinforcement learning. Th ...
(转) Deep Reinforcement Learning: Pong from Pixels
Andrej Karpathy blog About Hacker's guide to Neural Networks Deep Reinforcement Learning: Pong from ...
论文笔记之：Asynchronous Methods for Deep Reinforcement Learning
Asynchronous Methods for Deep Reinforcement Learning ICML 2016 深度强化学习最近被人发现貌似不太稳定,有人提出很多改善的方法,这些方法有很 ...
深度强化学习：入门(Deep Reinforcement Learning: Scratching the surface)
RL的方案两个主要对象:Agent和Environment Agent观察Environment,做出Action,这个Action会对Environment造成一定影响和改变,继而Agent会从新 ...
深度强化学习（Deep Reinforcement Learning）入门：RL base & DQN-DDPG-A3C introduction
转自https://zhuanlan.zhihu.com/p/25239682 过去的一段时间在深度强化学习领域投入了不少精力,工作中也在应用DRL解决业务问题.子曰:温故而知新,在进一步深入研究和应 ...
(转) Deep Reinforcement Learning: Playing a Racing Game
Byte Tank Posts Archive Deep Reinforcement Learning: Playing a Racing Game OCT 6TH, 2016 Agent playi ...
Deep Reinforcement Learning with Iterative Shift for Visual Tracking
Deep Reinforcement Learning with Iterative Shift for Visual Tracking 2019-07-30 14:55:31 Paper: http ...
论文笔记之：Dueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement Learning ICML 2016 Best Paper 摘要:本文的贡献点主要是在 DQN ...

随机推荐

Mac版StarUML破解方法
StarUML是用nodejs写的.确切的说是用Electron前端框架写的.新版本中所有的starUML源代码是通过asar工具打包而成.确切的代码位置在“%LOCALAPPDATA%\Progra ...
图记 2016.1.7 获取本地图片、Bitmap转image
这几天完成的内容有: 1.“添加图片”按钮 2.添加图片功能遇到的问题: 我想要将添加图片按钮放在右下角,所以采用了相对布局,但是问题随之二来,因为将导航栏设置成了半透明,所以图片放到右下角之后,半 ...
GTID主从与传统主从复制
目录 1.主从复制 2.靠什么同步 3.pos与GTID的什么区别 4.GTID的工作原理 5.GTID参数配置 5.1 在主数据库里创建一个同步账号授权给从数据库使用 5.2 配置主数据库 5.3配 ...
react-native npm install
--create project react-native init myapp --version 0.55.4 cd myapp -- react ui npm i react-native-el ...
C.Minimum Array(二分+set）
题目描述: 知识点: lower_bound和uper_bound lower_bound(起始地址,结束地址,要查找的数值) 返回的是数值第一个出现的位置. upper_bound(起始地址,结 ...
微服务：springboot与swagger2的集成
现在测试都提倡自动化测试,那我们作为后台的开发人员,也得进步下啊,以前用postman来测试后台接口,那个麻烦啊,一个字母输错就导致测试失败,现在swagger的出现可谓是拯救了这些开发人员,便捷之处 ...
Vue 路由守卫解决页面退出和弹窗的显示冲突
在使用UI框架提供的弹出层Popup时,如Vant UI的popup,在弹出层显示时,点击物理按键或者小程序自带的返回时,会直接退出页面,这明显不符合页面逻辑. 解决思路: 在弹出层显示时,点击了返回 ...
sass变量的作用域
嵌套规则内定义的变量只能在嵌套规则内使用(局部变量),不在嵌套规则内定义的变量则可在任何地方使用(全局变量). <div class="test">111111111& ...
使用mybatis框架实现带条件查询-多条件（传入实体类）
在实际的项目开发中,使用mybatis框架查询的时候,不可能是只有一个条件的,大部分情况下是有多个条件的,那么多个条件应该怎样传入参数: 思考: 需求:根据用户姓名(模糊查询),和用户角色对用户表进 ...
ICMP隧道
参考文章:http://www.sohu.com/a/297393423_783648

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor的更多相关文章

随机推荐

热门专题