Reinforcement Learning 强化学习入门
https://www.zhihu.com/question/277325426
https://github.com/jinglescode/reinforcement-learning-tic-tac-toe/blob/master/README.md
Intuition
After a long day at work, you are deciding between 2 choices: to head home and write a Medium article or hang out with friends at a bar. If you choose to hang out with friends, your friends will make you feel happy; whereas heading home to write an article, you’ll end up feeling tired after a long day at work. In this example, enjoying yourself is a reward and feeling tired is viewed as a negative reward, so why write articles?
Because in life, we don’t just think about immediate rewards; we plan a course of actions to determine the possible future rewards that may follow. Perhaps writing an article may brush up your understanding of a particular topic really well, get recognised and ultimately lands you that dream job you’ve always wanted. In this scenario, getting your dream job is a delayed reward from a list of actions you took, then we want to assign some value for being at those states (for example “going home and write an article”). In order to determine the value of a state, we call this the “value function”.
So how do we learn from our past? Let’s say you made some great decisions and are in the best state of your life. Now look back at the various decisions you’ve made to reach this stage: what do you attribute your success to? What are the previous states that led you to this success? What are the actions you did in the past that led you to this state of receiving this reward? How is the action you are doing now related to the potential reward you may receive in the future?
Reinforcement Learning — Implement TicTacToe
How to use reinforcement learning to play tic-tac-toe
https://github.com/MJeremy2017/reinforcement-learning-implementation/tree/master/TicTacToe
直接看这个「井字棋」的代码,结合反复阅读这几篇文章,慢慢理解 Q-Learning 是个什么东西,每个参数的意义又是什么。
https://github.com/ZuzooVn/machine-learning-for-software-engineers
https://machinelearningmastery.com/machine-learning-for-programmers/#comment-358985
https://towardsdatascience.com/simple-reinforcement-learning-q-learning-fcddc4b6fe56
What’s ‘Q’?
The ‘q’ in q-learning stands for quality. Quality in this case represents how useful a given action is in gaining some future reward.
How Does Learning Rate Decay Help Modern Neural Networks?
https://smartlabai.medium.com/reinforcement-learning-algorithms-an-intuitive-overview-904e2dff5bbc
RL 分为 Model-free 和 Model-based 两类
Q-Learning 就属于 Model-free
http://incompleteideas.net/book/the-book-2nd.html
http://incompleteideas.net/book/code/code2nd.html
Reinforcement Learning 强化学习入门的更多相关文章
- [Reinforcement Learning] 强化学习介绍
随着AlphaGo和AlphaZero的出现,强化学习相关算法在这几年引起了学术界和工业界的重视.最近也翻了很多强化学习的资料,有时间了还是得自己动脑筋整理一下. 强化学习定义 先借用维基百科上对强化 ...
- The categories of Reinforcement Learning 强化学习分类
RL分为三大类: (1)通过行为的价值来选取特定行为的方法,具体 包括使用表格学习的 q learning, sarsa, 使用神经网络学习的 deep q network: (2)直接输出行为的 p ...
- 【论文研读】强化学习入门之DQN
最近在学习斯坦福2017年秋季学期的<强化学习>课程,感兴趣的同学可以follow一下,Sergey大神的,有英文字幕,语速有点快,适合有一些基础的入门生. 今天主要总结上午看的有关DQN ...
- 强化学习入门基础-马尔可夫决策过程(MDP)
作者:YJLAugus 博客: https://www.cnblogs.com/yjlaugus 项目地址:https://github.com/YJLAugus/Reinforcement-Lear ...
- gym强化学习入门demo——随机选取动作 其实有了这些动作和反馈值以后就可以用来训练DNN网络了
# -*- coding: utf-8 -*- import gym import time env = gym.make('CartPole-v0') observation = env.reset ...
- DQN(Deep Q-learning)入门教程(一)之强化学习介绍
什么是强化学习? 强化学习(Reinforcement learning,简称RL)是和监督学习,非监督学习并列的第三种机器学习方法,如下图示: 首先让我们举一个小时候的例子: 你现在在家,有两个动作 ...
- <Machine Learning - 李宏毅> 学习笔记
<Machine Learning - 李宏毅> 学习笔记 b站视频地址:李宏毅2019国语 第一章 机器学习介绍 Hand crafted rules Machine learning ...
- 【强化学习】MOVE37-Introduction(导论)/马尔科夫链/马尔科夫决策过程
写在前面的话:从今日起,我会边跟着硅谷大牛Siraj的MOVE 37系列课程学习Reinforcement Learning(强化学习算法),边更新这个系列.课程包含视频和文字,课堂笔记会按视频为单位 ...
- 强化学习 reinforcement learning: An Introduction 第一章, tic-and-toc 代码示例 (结构重建版,注释版)
强化学习入门最经典的数据估计就是那个大名鼎鼎的 reinforcement learning: An Introduction 了, 最近在看这本书,第一章中给出了一个例子用来说明什么是强化学习, ...
随机推荐
- OpenSUSE Leap 42.1 KDE Ultmate Linux Distribution终极Linux系统试用与SSH连接
系统安装环境: #一台旧笔记本电脑 #CPU Intel(R) Core(TM) i3 M 380 2.53GHz (4核) #内存 1G #硬盘存储 250G #系统型号 OpenSUSE Leap ...
- H5页面怎么跳转到公众号主页?看过来
前言: 做公众号开发的小伙伴,可能会遇到这种需求: 在一个H5页面点击一个关注公众号按钮跳转到公众号主页. 听到这个需求的一瞬间,疑惑了!这不可能! 摸了摸高亮的额头!没办法,做还是要做的 开始上解决 ...
- vulnhub-Lampiao脏牛提权
准备工作 在vulnhub官网下载lampiao靶机Lampião: 1 ~ VulnHub 导入到vmware,设置成NAT模式 打开kali准备进行渗透(ip:192.168.200.6) 信息收 ...
- Distance Queries 距离咨询 (LCA倍增模板)
农夫约翰有N(2<=N<=40000)个农场,标号1到N.M(2<=M<=40000)条的不同的垂直或水平的道路连结着农场,道路的长度不超过1000.这些农场的分布就像下面的地 ...
- 在Linearlayout中新增ScrollView支持滚动
https://blog.csdn.net/wenzhi20102321/article/details/53491176 1.一般只需要在布局中加个ScrollView即可 2.如果布局中包含lis ...
- HTTP状态码关于各个网站的实地调查
我使用的是新版Edge浏览器,右键,点击检查,点击网络,可以看到请求的各种文件.那么以此来看看状态码的使用吧. 101 与websocket相关,websocket在慕课网中的应用 - KeBoom ...
- NOIP 模拟 $38\; \rm c$
题解 \(by\;zj\varphi\) 发现就是一棵树,但每条边都有多种不同的颜色,其实只需要保留随便三种颜色即可. 直接点分治,将询问离线,分成一端为重心,和两端都不为重心的情况. 每次只关心经过 ...
- P5038 奇怪的游戏
题目询问了一个不能确定的时间,所以显然做法中要包含一个二分答案. 我们将整张图分为黑白点两种,黑点旁边的点就是白点,白点旁边的点就是黑点,想一下就能知道,每次操作会使黑白点的数字各加一,而我们的目的就 ...
- SpringDataJpa使用原生sql(EntityManager)动态拼接,分页查询
SpringDataJpa Spring Data JPA是较大的Spring Data系列的一部分,可轻松实现基于JPA的存储库.该模块处理对基于JPA的数据访问层的增强支持.它使构建使用数据访问技 ...
- vue--三种组件中之间的传值
参考网址:https://www.jianshu.com/p/46573a741c29 一.父子组件之间的传值----props/$emit 组件之间的传值,我们比较常用到的是props/$emit ...