Deep Reinforcement Learning Based Trading Application at JP Morgan Chase

https://medium.com/@ranko.mosic/reinforcement-learning-based-trading-application-at-jp-morgan-chase-f829b8ec54f2

FT released a story today about the new application that will optimize JP Morgan Chase trade execution ( Business Insider article on the same topic for readers that do not have FT subscription ). The intent is to reduce market impact and provide best trade execution results for large orders.

It is a complex application with many moving parts:

 

Its core is an RL algorithm that learns to perform the best action ( choose optimal price, duration and order size ) based on market conditions. It is not clear if it is Sarsa ( On-Policy TD Control) or Q-learning (Off-Policy Temporal Difference Control Algorithm ) as both algorithms are present in JP Morgan slides:

 

Sarsa

 

Q-learning

State consists of price series, expected spread cost, fill probability, size placed, as well as elapsed time, %progress, etc. Rewards are immediate rewards ( price spread ) and terminal ( end of episode ) rewards like completion, order duration and market penalties ( obviously those are negative rewards that punish the agent along these dimensions ).

 

Actions are memorized as weights of a Deep Neural Network — function approximation via NN is used since state, action space is too big to be handled in tabular form. We assume stochastic gradient descent is used for both feed forward and backprop operation operation ( hence Deep designation ):

 

JP Morgan is convinced this is the very first real time trading AI/ML application on Wall Street. We are assuming this is not true i.e. there are surely other players operating in this space as RL implementation to order execution is known for quite a while now ( Kearns and Nevmyvaka 2006 ).

The latest LOXM developmentswill be presented at QuantMinds Conference in Lisbon (May of 2018).

Instinet is also using Q-learning, probably for the same purpose ( market impact reduction ).

[转]Deep Reinforcement Learning Based Trading Application at JP Morgan Chase的更多相关文章

  1. 【资料总结】| Deep Reinforcement Learning 深度强化学习

    在机器学习中,我们经常会分类为有监督学习和无监督学习,但是尝尝会忽略一个重要的分支,强化学习.有监督学习和无监督学习非常好去区分,学习的目标,有无标签等都是区分标准.如果说监督学习的目标是预测,那么强 ...

  2. (转) Deep Reinforcement Learning: Playing a Racing Game

    Byte Tank Posts Archive Deep Reinforcement Learning: Playing a Racing Game OCT 6TH, 2016 Agent playi ...

  3. (转) Deep Reinforcement Learning: Pong from Pixels

    Andrej Karpathy blog About Hacker's guide to Neural Networks Deep Reinforcement Learning: Pong from ...

  4. (转) Playing FPS games with deep reinforcement learning

    Playing FPS games with deep reinforcement learning 博文转自:https://blog.acolyer.org/2016/11/23/playing- ...

  5. (zhuan) Deep Reinforcement Learning Papers

    Deep Reinforcement Learning Papers A list of recent papers regarding deep reinforcement learning. Th ...

  6. 论文笔记之:Asynchronous Methods for Deep Reinforcement Learning

    Asynchronous Methods for Deep Reinforcement Learning ICML 2016 深度强化学习最近被人发现貌似不太稳定,有人提出很多改善的方法,这些方法有很 ...

  7. [DQN] What is Deep Reinforcement Learning

    已经成为DL中专门的一派,高大上的样子 Intro: MIT 6.S191 Lecture 6: Deep Reinforcement Learning Course: CS 294: Deep Re ...

  8. 论文笔记:Learning how to Active Learn: A Deep Reinforcement Learning Approach

    Learning how to Active Learn: A Deep Reinforcement Learning Approach 2018-03-11 12:56:04 1. Introduc ...

  9. 18 Issues in Current Deep Reinforcement Learning from ZhiHu

    深度强化学习的18个关键问题 from: https://zhuanlan.zhihu.com/p/32153603 85 人赞了该文章 深度强化学习的问题在哪里?未来怎么走?哪些方面可以突破? 这两 ...

随机推荐

  1. MIR7预制发票扣除已经预制的数量(每月多次预制,未即时过账)

    业务场景见抬头,有没有标准的解决方案就不说了,也没去考虑... 这个增强还是SAP老表提供的,感谢,省了不少时间. INCLUDE:LMR1MF6S 最后的位置 ENHANCEMENT ZMIR7_0 ...

  2. GetMapping 和 PostMapping最大的差别(转)

    原文地址:GetMapping 和 PostMapping  Spring4.3中引进了{@GetMapping.@PostMapping.@PutMapping.@DeleteMapping.@Pa ...

  3. ORA-04068 / ORA-04065 / ORA-06508 详细说明

    关于在运行ORACLE 包发生ORA-04068 / ORA-04065 / ORA-06508 代码异常的原因只有一个,那就是包含了全局变量/常量的包,在会话保留期间被执行了编译. 对于此类错误,我 ...

  4. Spring注解之@Retention

    作用是定义被它所注解的注解保留多久,一共有三种策略,定义在RetentionPolicy枚举中: package java.lang.annotation; /** * Annotation rete ...

  5. HTML相关知识点总结

    1.表格<table>常用属性 cellspacing:两个单元格之间的距离 注:属性值为数字,效果图如下(左边cellspacing="0",右边cellspacin ...

  6. [codechef July Challenge 2017] Chef and Sign Sequences

    CHEFSIGN: 大厨与符号序列题目描述大厨昨天捡到了一个奇怪的字符串 s,这是一个仅包含‘<’.‘=’和‘>’三种比较符号的字符串.记字符串长度为 N,大厨想要在字符串的开头.结尾,和 ...

  7. [codechef July Challenge 2017] Pishty and tree

    PSHTTR: Pishty 和城堡题目描述Pishty 是生活在胡斯特市的一个小男孩.胡斯特是胡克兰境内的一个古城,以其中世纪风格的古堡和非常聪明的熊闻名全国.胡斯特的镇城之宝是就是这么一座古堡,历 ...

  8. 常用Linux源小记

    常用国内镜像站: 阿里云:http://mirrors.aliyun.com/ 中科大:http://mirrors.ustc.edu.cn/ 清华:https://mirrors.tuna.tsin ...

  9. [转]JVM内存模型

    最近排查一个线上java服务常驻内存异常高的问题,大概现象是:java堆Xmx配置了8G,但运行一段时间后常驻内存RES从5G逐渐增长到13G #补图#,导致机器开始swap从而服务整体变慢.由于Xm ...

  10. jcmd

    1.jps 2.jcmd 1761[pid] PerfCounter.print 查看指定进程的性能统计信息 概述 在JDK1.7以后,新增了一个命令行工具 jcmd.他是一个多功能的工具,可以用它来 ...