[转]Deep Reinforcement Learning Based Trading Application at JP Morgan Chase
Deep Reinforcement Learning Based Trading Application at JP Morgan Chase
https://medium.com/@ranko.mosic/reinforcement-learning-based-trading-application-at-jp-morgan-chase-f829b8ec54f2
FT released a story today about the new application that will optimize JP Morgan Chase trade execution ( Business Insider article on the same topic for readers that do not have FT subscription ). The intent is to reduce market impact and provide best trade execution results for large orders.
It is a complex application with many moving parts:

Its core is an RL algorithm that learns to perform the best action ( choose optimal price, duration and order size ) based on market conditions. It is not clear if it is Sarsa ( On-Policy TD Control) or Q-learning (Off-Policy Temporal Difference Control Algorithm ) as both algorithms are present in JP Morgan slides:

Sarsa

Q-learning
State consists of price series, expected spread cost, fill probability, size placed, as well as elapsed time, %progress, etc. Rewards are immediate rewards ( price spread ) and terminal ( end of episode ) rewards like completion, order duration and market penalties ( obviously those are negative rewards that punish the agent along these dimensions ).

Actions are memorized as weights of a Deep Neural Network — function approximation via NN is used since state, action space is too big to be handled in tabular form. We assume stochastic gradient descent is used for both feed forward and backprop operation operation ( hence Deep designation ):

JP Morgan is convinced this is the very first real time trading AI/ML application on Wall Street. We are assuming this is not true i.e. there are surely other players operating in this space as RL implementation to order execution is known for quite a while now ( Kearns and Nevmyvaka 2006 ).
The latest LOXM developmentswill be presented at QuantMinds Conference in Lisbon (May of 2018).
Instinet is also using Q-learning, probably for the same purpose ( market impact reduction ).
[转]Deep Reinforcement Learning Based Trading Application at JP Morgan Chase的更多相关文章
- 【资料总结】| Deep Reinforcement Learning 深度强化学习
在机器学习中,我们经常会分类为有监督学习和无监督学习,但是尝尝会忽略一个重要的分支,强化学习.有监督学习和无监督学习非常好去区分,学习的目标,有无标签等都是区分标准.如果说监督学习的目标是预测,那么强 ...
- (转) Deep Reinforcement Learning: Playing a Racing Game
Byte Tank Posts Archive Deep Reinforcement Learning: Playing a Racing Game OCT 6TH, 2016 Agent playi ...
- (转) Deep Reinforcement Learning: Pong from Pixels
Andrej Karpathy blog About Hacker's guide to Neural Networks Deep Reinforcement Learning: Pong from ...
- (转) Playing FPS games with deep reinforcement learning
Playing FPS games with deep reinforcement learning 博文转自:https://blog.acolyer.org/2016/11/23/playing- ...
- (zhuan) Deep Reinforcement Learning Papers
Deep Reinforcement Learning Papers A list of recent papers regarding deep reinforcement learning. Th ...
- 论文笔记之:Asynchronous Methods for Deep Reinforcement Learning
Asynchronous Methods for Deep Reinforcement Learning ICML 2016 深度强化学习最近被人发现貌似不太稳定,有人提出很多改善的方法,这些方法有很 ...
- [DQN] What is Deep Reinforcement Learning
已经成为DL中专门的一派,高大上的样子 Intro: MIT 6.S191 Lecture 6: Deep Reinforcement Learning Course: CS 294: Deep Re ...
- 论文笔记:Learning how to Active Learn: A Deep Reinforcement Learning Approach
Learning how to Active Learn: A Deep Reinforcement Learning Approach 2018-03-11 12:56:04 1. Introduc ...
- 18 Issues in Current Deep Reinforcement Learning from ZhiHu
深度强化学习的18个关键问题 from: https://zhuanlan.zhihu.com/p/32153603 85 人赞了该文章 深度强化学习的问题在哪里?未来怎么走?哪些方面可以突破? 这两 ...
随机推荐
- Leetcode 128 *
class Solution { public: int longestConsecutive(vector<int>& nums) { ; unordered_map<in ...
- Hadoop 2.7.3 完全分布式维护-动态增加datanode篇
原有环境 http://www.cnblogs.com/ilifeilong/p/7406944.html IP host JDK linux hadop role 172.16.101 ...
- [codechef July Challenge 2017] Pishty and tree
PSHTTR: Pishty 和城堡题目描述Pishty 是生活在胡斯特市的一个小男孩.胡斯特是胡克兰境内的一个古城,以其中世纪风格的古堡和非常聪明的熊闻名全国.胡斯特的镇城之宝是就是这么一座古堡,历 ...
- Java序列化的作用和反序列化
1.序列化是干什么的? 简单说就是为了保存在内存中的各种对象的状态(也就是实例变量,不是方法),并且可以把保存的对象状态再读出来.虽然你可以用你自己的各种各样的方法来保存object states,但 ...
- Oracle RAC时间同步(NTP/CTSS)
1.RAC 相关时间同步(time synchronization)Oracle Grid可用两种方式进行时间同步1)基于OS的NTP2)基于clusterware的CTSS(Cluster Time ...
- Spring boot异常统一处理方法:@ControllerAdvice注解的使用、全局异常捕获、自定义异常捕获
一.全局异常 1.首先创建异常处理包和类 2.使用@ControllerAdvice注解,全局捕获异常类,只要作用在@RequestMapping上,所有的异常都会被捕获 package com.ex ...
- 用WebStorm进行Angularjs 2的开发
环境准备: WebStorm开发工具 https://pan.baidu.com/s/1o8maQLG 提取密码(加群获取599606903) nodejs https://nodejs.org ...
- It is never too late!
整理着过去的学习笔记,零零碎碎的,偶尔夹杂着当时的心境. 泛泛的学着东西,不很系统,不很深入,倒像是在拾海,有时捡捡贝壳,有时抓抓螃蟹.叹服大海的神奇,还没来得及深钻某个领域. (以下内容写于2016 ...
- 每天CSS学习之caption-side
caption-side是CSS2的属性,其作用是规定标题的位置在表格的上面还是下面. 1.top:默认值,将表格标题定位在表格之上.如下例子: caption{ caption-side:top; ...
- minifilter
暑假刚开始的时候,参照<寒江独钓>这本书,用VS2015写过的一个minifilter的框架,今天在博客上分享出来. VS2015已经有了minifilter的框架模板,直接生成了mini ...