Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

2019-07-15 22:23:02

Paperhttps://arxiv.org/pdf/1801.01290.pdf or Updated Versionhttps://arxiv.org/pdf/1812.05905.pdf

Projecthttps://sites.google.com/view/soft-actor-critic or https://sites.google.com/view/sac-and-applications/

TensorFlowhttps://github.com/haarnoja/sac

PyTorchhttps://github.com/vitchyr/rlkit

Demo videohttps://www.youtube.com/channel/UCxXt8Br3-wyluz9Q08-fsaA

Good Related Bloghttps://zhuanlan.zhihu.com/p/70360272

==== Video Related Tutorials (A2C, A3C): 

A brief review of Actor-Critic Algorithms:   https://www.youtube.com/watch?v=aODdNpihRwM

CS885 Lecture 7b: Actor Critic:        https://www.youtube.com/watch?v=5Ke-d1Itk3k

DRL Lecture 6: Actor-Critic:          https://www.youtube.com/watch?v=j82QLgfhFiY&t=27s

Build an A2C agent that learns to play Sonic with Tensorflow (tutorial):   https://www.youtube.com/watch?v=GCfUdkCL7FQ

Reinforcement Learning 6: Policy Gradients and Actor Critics (Deep Mind):    https://www.youtube.com/watch?v=bRfUxQs6xIM&t=27s

Actor Critic (A3C) Tutorial:         https://www.youtube.com/watch?v=O5BlozCJBSE

Actor Critic Algorithms:            https://www.youtube.com/watch?v=w_3mmm0P0j8&t=2s

 

==

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor的更多相关文章

  1. 18 Issues in Current Deep Reinforcement Learning from ZhiHu

    深度强化学习的18个关键问题 from: https://zhuanlan.zhihu.com/p/32153603 85 人赞了该文章 深度强化学习的问题在哪里?未来怎么走?哪些方面可以突破? 这两 ...

  2. (zhuan) Deep Reinforcement Learning Papers

    Deep Reinforcement Learning Papers A list of recent papers regarding deep reinforcement learning. Th ...

  3. (转) Deep Reinforcement Learning: Pong from Pixels

    Andrej Karpathy blog About Hacker's guide to Neural Networks Deep Reinforcement Learning: Pong from ...

  4. 论文笔记之:Asynchronous Methods for Deep Reinforcement Learning

    Asynchronous Methods for Deep Reinforcement Learning ICML 2016 深度强化学习最近被人发现貌似不太稳定,有人提出很多改善的方法,这些方法有很 ...

  5. 深度强化学习:入门(Deep Reinforcement Learning: Scratching the surface)

    RL的方案 两个主要对象:Agent和Environment Agent观察Environment,做出Action,这个Action会对Environment造成一定影响和改变,继而Agent会从新 ...

  6. 深度强化学习(Deep Reinforcement Learning)入门:RL base & DQN-DDPG-A3C introduction

    转自https://zhuanlan.zhihu.com/p/25239682 过去的一段时间在深度强化学习领域投入了不少精力,工作中也在应用DRL解决业务问题.子曰:温故而知新,在进一步深入研究和应 ...

  7. (转) Deep Reinforcement Learning: Playing a Racing Game

    Byte Tank Posts Archive Deep Reinforcement Learning: Playing a Racing Game OCT 6TH, 2016 Agent playi ...

  8. Deep Reinforcement Learning with Iterative Shift for Visual Tracking

    Deep Reinforcement Learning with Iterative Shift for Visual Tracking 2019-07-30 14:55:31 Paper: http ...

  9. 论文笔记之:Dueling Network Architectures for Deep Reinforcement Learning

    Dueling Network Architectures for Deep Reinforcement Learning ICML 2016 Best Paper 摘要:本文的贡献点主要是在 DQN ...

随机推荐

  1. honeydctl命令

    # honeydctl Honeyd 1.5c Management Console Copyright (c) 2004 Niels Provos. All rights reserved. See ...

  2. 使用SAP Cloud Application Programming模型开发OData的一个实际例子

    刚刚过去的SAP TechEd上,SAP CTO Juergen Mueller向外界传递了一个重要的信息:身处云时代大环境下的SAP从业者,在SAP云平台上该如何选择适合自己的开发方式? Juerg ...

  3. scrapy 爬虫中间件-offsite和refer中间件

    环境使用anaconda 创建的pyithon3.6环境 mac下 source activate python36 mac@macdeMacBook-Pro:~$ source activate p ...

  4. MySQL DataType--定点数(Fixed-Point Types)学习

    DECIMAL和NUMERIC MySQL支持两种定点数类型:DECIMAL和NUMERIC,而NUMERIC实现为DECIMAL,因此MySQL中DECIMAL和NUMERIC等价相同. 如使用下面 ...

  5. JS正则表达式提取数字

    /** * [参数str] * @type {var String} * return 30 */ var str = "ren民BI30kuai" console.log(str ...

  6. H3C WLAN相关组织和标准

  7. div折角~~~

    代码: <!DOCTYPE html> <html> <head> <meta charset="utf-8"> <title ...

  8. 【Maven错误】 Non-resolvable parent POM for ...... Return code is: 500 , ReasonPhrase:Internal Server Error. and 'parent.relativePath' points at no local POM @ line 14, column 11

    一.异常信息 [INFO] Scanning for projects... Downloading: http://www.myhost.com/maven/jdk18/org/springfram ...

  9. 记录第n次网站渗透经历

    如标题所示,第x次实战获取webshell的经历是非常美好且需要记录的(毕竟开始写博客了嘛).这能够证明这一路来的学习没有白费,也应用上了该用的知识. 首先怎么说呢,某天去补天看了看漏洞,发现有一个网 ...

  10. Java实现行列式计算

    前天我看线代书,看到行列式,发现是个递归的式子,恰巧又正在学java,产生写程序实现的想法.写了两个小时,觉得实现了,写了个行列式放进去测试,我放的是 这个行列式,经过程序计算后发现结果是0.我以为我 ...