Reinforcement Learning Q-learning 算法学习-3

//Q-learning 源码分析。 
import java.util.Random;

public class QLearning1

{

    private static final int Q_SIZE = 6;

    private static final double GAMMA = 0.8;

    private static final int ITERATIONS = 10;

    private static final int INITIAL_STATES[] = new int[] {1, 3, 5, 2, 4, 0};

    private static final int R[][] = new int[][] {{-1, -1, -1, -1, 0, -1},

                                                  {-1, -1, -1, 0, -1, 100},

                                                  {-1, -1, -1, 0, -1, -1},

                                                  {-1, 0, 0, -1, 0, -1},

                                                  {0, -1, -1, 0, -1, 100},

                                                  {-1, 0, -1, -1, 0, 100}};

    private static int q[][] = new int[Q_SIZE][Q_SIZE];

    private static int currentState = 0;

    private static void train()

    {

        initialize();

        // Perform training, starting at all initial states.

        for(int j = 0; j < ITERATIONS; j++)

        {

            for(int i = 0; i < Q_SIZE; i++)

            {

                episode(INITIAL_STATES[i]);

            } // i

        } // j

        System.out.println("Q Matrix values:");

        for(int i = 0; i < Q_SIZE; i++)

        {

            for(int j = 0; j < Q_SIZE; j++)

            {

                System.out.print(q[i][j] + ",\t");

            } // j

            System.out.print("\n");

        } // i

        System.out.print("\n");

        return;

    }

    private static void test()

    {

        // Perform tests, starting at all initial states.

        System.out.println("Shortest routes from initial states:");

        for(int i = 0; i < Q_SIZE; i++)

        {

            currentState = INITIAL_STATES[i];

            int newState = 0;

            do

            {

                newState = maximum(currentState, true);

                System.out.print(currentState + ", ");

                currentState = newState;

            }while(currentState < 5);

            System.out.print("5\n");

        }

        return;

    }

    private static void episode(final int initialState)

    {

        currentState = initialState;

        // Travel from state to state until goal state is reached.

        do

        {

            chooseAnAction();

        }while(currentState == 5);

        // When currentState = 5, Run through the set once more for convergence.

        for(int i = 0; i < Q_SIZE; i++)

        {

            chooseAnAction();

        }

        return;

    }

    private static void chooseAnAction()

    {

        int possibleAction = 0;

        // Randomly choose a possible action connected to the current state.

        possibleAction = getRandomAction(Q_SIZE);

        if(R[currentState][possibleAction] >= 0){

            q[currentState][possibleAction] = reward(possibleAction);

            currentState = possibleAction;

        }

        return;

    }

    private static int getRandomAction(final int upperBound)

    {

        int action = 0;

        boolean choiceIsValid = false;

        // Randomly choose a possible action connected to the current state.

        while(choiceIsValid == false)

        {

            // Get a random value between 0(inclusive) and 6(exclusive).

            action = new Random().nextInt(upperBound);

            if(R[currentState][action] > -1){

                choiceIsValid = true;

            }

        }

        return action;

    }

    private static void initialize()

    {

        for(int i = 0; i < Q_SIZE; i++)

        {

            for(int j = 0; j < Q_SIZE; j++)

            {

                q[i][j] = 0;

            } // j

        } // i

        return;

    }

    private static int maximum(final int State, final boolean ReturnIndexOnly)

    {

        // If ReturnIndexOnly = True, the Q matrix index is returned.

        // If ReturnIndexOnly = False, the Q matrix value is returned.

        int winner = 0;

        boolean foundNewWinner = false;

        boolean done = false;

        while(!done)

        {

            foundNewWinner = false;

            for(int i = 0; i < Q_SIZE; i++)

            {

                if(i != winner){             // Avoid self-comparison.

                    if(q[State][i] > q[State][winner]){

                        winner = i;

                        foundNewWinner = true;

                    }

                }

            }

            if(foundNewWinner == false){

                done = true;

            }

        }

        if(ReturnIndexOnly == true){

            return winner;

        }else{

            return q[State][winner];

        }

    }

    private static int reward(final int Action)

    {

        return (int)(R[currentState][Action] + (GAMMA * maximum(Action, false)));

    }

    public static void main(String[] args)

    {

        train();

        test();

        return;

    }

}

Reinforcement Learning Q-learning 算法学习-3的更多相关文章

Reinforcement Learning Q-learning 算法学习-2
在阅读了Q-learning 算法学习-1文章之后. 我分析了这个算法的本质. 算法本质个人分析. 1.算法的初始状态是随机的,所以每个初始状态都是随机的,所以每个初始状态出现的概率都一样的.如果训练 ...
增强学习（五）----- 时间差分学习(Q learning, Sarsa learning)
接下来我们回顾一下动态规划算法(DP)和蒙特卡罗方法(MC)的特点,对于动态规划算法有如下特性: 需要环境模型,即状态转移概率$P_{sa}$ 状态值函数的估计是自举的(bootstrapping ...
强化学习9-Deep Q Learning
之前讲到Sarsa和Q Learning都不太适合解决大规模问题,为什么呢? 因为传统的强化学习都有一张Q表,这张Q表记录了每个状态下,每个动作的q值,但是现实问题往往极其复杂,其状态非常多,甚至是连 ...
机器学习实战（Machine Learning in Action）学习笔记————08.使用FPgrowth算法来高效发现频繁项集
机器学习实战(Machine Learning in Action)学习笔记————08.使用FPgrowth算法来高效发现频繁项集关键字:FPgrowth.频繁项集.条件FP树.非监督学习作者:米 ...
机器学习实战（Machine Learning in Action）学习笔记————07.使用Apriori算法进行关联分析
机器学习实战(Machine Learning in Action)学习笔记————07.使用Apriori算法进行关联分析关键字:Apriori.关联规则挖掘.频繁项集作者:米仓山下时间:2018 ...
机器学习实战（Machine Learning in Action）学习笔记————06.k-均值聚类算法（kMeans）学习笔记
机器学习实战(Machine Learning in Action)学习笔记————06.k-均值聚类算法(kMeans)学习笔记关键字:k-均值.kMeans.聚类.非监督学习作者:米仓山下时间: ...
机器学习实战（Machine Learning in Action）学习笔记————02.k-邻近算法（KNN）
机器学习实战(Machine Learning in Action)学习笔记————02.k-邻近算法(KNN) 关键字:邻近算法(kNN: k Nearest Neighbors).python.源 ...
强化学习_Deep Q Learning(DQN)_代码解析
Deep Q Learning 使用gym的CartPole作为环境,使用QDN解决离散动作空间的问题. 一.导入需要的包和定义超参数 import tensorflow as tf import n ...
如何用简单例子讲解 Q - learning 的具体过程？
作者:牛阿链接:https://www.zhihu.com/question/26408259/answer/123230350来源:知乎著作权归作者所有.商业转载请联系作者获得授权,非商业转载请注明 ...
机器学习实战（Machine Learning in Action）学习笔记————09.利用PCA简化数据
机器学习实战(Machine Learning in Action)学习笔记————09.利用PCA简化数据关键字:PCA.主成分分析.降维作者:米仓山下时间:2018-11-15机器学习实战(Ma ...

随机推荐

hadoop28---netty传对象
Netty中,通讯的双方建立连接后,会把数据按照ByteBuf的方式进行传输,例如http协议中,就是通过HttpRequestDecoder对ByteBuf数据流进行处理,转换成http的对象.基于 ...
RPC数据通信
RPC全称为Remote Procedure Call,翻译过来为“远程过程调用”.目前,主流的平台中都支持各种远程调用技术,以满足分布式系统架构中不同的系统之间的远程通信和相互调用.远程调用的应用场 ...
ASP.NET WebAPI2 发布之后404 Not Found
方法一:首先确保服务器安装.Net FrameWork 4.0 并且注册IIS 方法二:对应应用程序池版本为v4.0,模式为集成方法三:在web.config中加入 <system.webSe ...
node-inspector使用方法
开发node.js程序使用的是javascript语言,其中最麻烦的还是调试,这里介绍一下node-inspector使用方法.具体资料可以看参考资料中的GITHUB文档. 方法/步骤使用命令$ ...
双camera景深计算
https://sanwen8.cn/p/2e41VC5.html 本文系微信公众号<大话成像>,知乎专栏< all in camera>原创文章,转载请注明出处. 接着上一篇 ...
【Java】仿真qq尝试：用户注册（二）
参考: 1.corejavaI:使用解耦的try/catch与try/finally 2.Java中try catch finally语句中含有return语句的执行情况(总结版):http://bl ...
Spring_IOC&DI概述
CSS3 网格布局(grid-layout)基础知识 - 网格模板属性(grid-template)使用说明
CSS3引入了新的网格布局(grid layout),以适应显示和设计技术的发展(尤其是移动设备优先的响应式设计). 主要目标是建立一个稳定可预料且语义正确的网页布局模式,用来替代过往表现不稳定且繁琐 ...
5.6 WebDriver API实例讲解（41-50）
41.操作Web页面的滚动条 (1)滑动页面的滚动条到页面的最下面. (2)滑动页面的滚动条到页面的某个元素. (3)滑动页面的滚动条向下移动某个数量的像素. package apiSample; i ...
js的onclick字符串参数的解决办法
<a href='#' onclick='onedit(\""+ name + "\")';>编辑</a>" 一些写法实例~~ ...

Reinforcement Learning Q-learning 算法学习-3

Reinforcement Learning Q-learning 算法学习-3的更多相关文章

随机推荐

热门专题