Reinforcement Learning for Profit

July 17, 2016

Is RL being used in revenue generating systems today?

 

Recently, one of my facebook friends, and alumni of the University of Alberta (with a PhD in Computing Science), Cosmin Paduraru posed a question:

Where is Reinforcement Learning used in revenue generating systems today?

I have been thinking about this lots over the last month as I attended two international conferences on Artificial Intelligence and Machine Learning (ICML and IJCAI) in NYC, USA. It is important to explore future prospects both inside and outside academia — In case you need a catch up, I am currently at the University of Alberta working on a PhD in Computing Science with a focus on Reinforcement Learning and Artificial Intelligence.

With the success of modern AI systems — out of the winter and into the spring — many companies have invested and continue to invested heavily into modern AI systems, backed by teams of leading researchers in the field (e.g. FacebookGoogleMicrosoftIBMTwitter, etc.).

With that said, maybe Cosmin is right, Reinforcement Learning (Sutton and Barto 1998, and this killer-intro by the fantastically talented Andrej Karpathy) is seemingly publicly underrepresented in currently deployed systems making money in the real world, or is it?

Adapted from Sutton and Barto 1998 and WALL-E

Luckily I was at the International Joint Conference on Artificial Intelligence where I was attending a panel discussion on The Business of AI, the panel was composed of all speakers from the industry day. A desirable venue to solicit a wide variety of opinions from thought leaders in the field.

So I posed the question to them, their responses went as follows:

Peter Norvig (Director of Research at Google): “well… AlphaGo made a million bucks and then gave it away”… a recent tweet from Demis Hassabis (Google DeepMind) confirms:

Pleased to confirm the recipients of the #AlphaGo $1m prize! @UNICEF_uk@CodeClub, and the American, European and Korean Go associations

— Demis Hassabis (@demishassabis) June 6, 2016

Peter Stone (Founder and President, Cogitai. Professor UoT (Austin)) gave lots of great examples of recent applications:

He said,“We are on the cusp of moving from the academic lab to the industry for RL, adaptation, and lifelong learning…We are at the cusp, and that is the main motivation from Cogitai”

He also referenced work by Thomas G. Dietterich on invasive species management, wildfire suppression, by Joelle Pineau on applying RL in healthcare, and by Andrew Ng and Drew Bagnall on helicopter control. All of these could be as a practical demonstrations of specific, developing industrial applications.

Hiroaki Kitano (President & CEO SONY Computer Science Laboratories) said that this is a current research area for Sony and to expect profitability using these and advancing RL algorithms in 2-5 years. Almost 10 years after Sony’s last robotic venture, the Aibo, Sony CEO Kazuo Hirai has just recently (late June 2016) said “the robots we are developing can have emotional bonds with customers, giving them joy and becoming the objects of love”.

Guruduth Banavar (Chief Science Officer, Cognitive Computing, IBM Research) predicted that this is going to happen, sooner rather than later, and his prediction was that it will happen in the domain of conversational systems, dialog systems, and understanding the larger context of conversations. He also mentioned that the illustrious Gerald Tesauro (the man behind TD-Gammon) is working on these problems. Interesting that he did not mention Watson

Some interesting answers from industry leaders. But I was surprised that no one mentioned:recommender systems (like those on Amazon, Netflix, Yelp, and nicely formalized as an RL problem in 2005 by Shani et al.), are these systems all collaborative filtering? Surely not.

No one mentioned that Google Reinforcement Learning Architecture (here is a quick summary), which I can only imagine could be behind some of the personal recommendations and rankings that Google does behind-the-scenes on Search, YouTube, and maybe … Maps?

No one mentioned contextual bandits, sometimes called associative RL (as discussed by Li et al. 2010 for news recommendation), for serving ads and news stories. These systems are surely deployed on large-scale news sites by the publishers to maximize click-through-ratios and create a personalized experience. Microsoft recently announced Multiworld Testing Decision Service, for making context based decisions… I guess there were no Microsoft representatives on the panel to toot this horn (thanks for the catch Pardis)

With so much potentially out there, why was there no mention of these use cases for reinforcement learning? Where else could RL be hiding in the money-making wild? RL seems like an ideal candidate for systems of personalization on large-scale, sequential decision-making problems… so what am I missing?

(转) Reinforcement Learning for Profit的更多相关文章

  1. [转]Introduction to Learning to Trade with Reinforcement Learning

    Introduction to Learning to Trade with Reinforcement Learning http://www.wildml.com/2018/02/introduc ...

  2. Introduction to Learning to Trade with Reinforcement Learning

    http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/ The academic ...

  3. Machine Learning Algorithms Study Notes(5)—Reinforcement Learning

    Reinforcement Learning 对于控制决策问题的解决思路:设计一个回报函数(reward function),如果learning agent(如上面的四足机器人.象棋AI程序)在决定 ...

  4. (转) Playing FPS games with deep reinforcement learning

    Playing FPS games with deep reinforcement learning 博文转自:https://blog.acolyer.org/2016/11/23/playing- ...

  5. (zhuan) Deep Reinforcement Learning Papers

    Deep Reinforcement Learning Papers A list of recent papers regarding deep reinforcement learning. Th ...

  6. (转) Deep Learning Research Review Week 2: Reinforcement Learning

      Deep Learning Research Review Week 2: Reinforcement Learning 转载自: https://adeshpande3.github.io/ad ...

  7. Learning Roadmap of Deep Reinforcement Learning

    1. 知乎上关于DQN入门的系列文章 1.1 DQN 从入门到放弃 DQN 从入门到放弃1 DQN与增强学习 DQN 从入门到放弃2 增强学习与MDP DQN 从入门到放弃3 价值函数与Bellman ...

  8. Open source packages on Deep Reinforcement Learning

    智能车 self driving car + 强化学习 reinforcement learning + 神经网络 模拟 https://github.com/MorvanZhou/my_resear ...

  9. (转) Deep Reinforcement Learning: Playing a Racing Game

    Byte Tank Posts Archive Deep Reinforcement Learning: Playing a Racing Game OCT 6TH, 2016 Agent playi ...

随机推荐

  1. SharePoint 2013 配置我的网站 图文引导

    博客地址:http://blog.csdn.net/FoxDave 本篇我们来讲述一下关于SharePoint中我的网站(My Sites)相关的东西. 我的网站是SharePoint 2013中面向 ...

  2. c# Winforms WebBrowser - Clear all cookies

    Hello,   I recently search for a method to delete all cookies from the build in .NET WinForms WebBro ...

  3. 理解Mac和iOS中的 Event 处理

    根据现在的理解,我把event处理分为5部分,第一是,Event处理的Architecture:第二是,Event的Dispatch到first responder之前: 第三是,Event从firs ...

  4. String性能优化

    String 使用的优化建议 其他 String 使用的优化建议 以上我们描述了在我们的大量文本分析案例中调用 String 的 subString方法导致内存消耗的问题,下面再列举一些其他将导致内存 ...

  5. mysql 创建存储过程注意

    最近在利用navicat创建存储过程时,总是报1064语法错误,而且每次都是指向第一行,百思不得姐,如下图: 后来发现,原来是输入参数没有定义长度导致,所以以后真要注意 加上入参长度即可:IN `sT ...

  6. java读取大容量excel之二(空格、空值问题)

    最近在项目中发现,对于Excel2007(底层根本是xml) ,使用<java读取大容量excel之一>中的方式读取,若待读取的excel2007文件中某一列是空值,(注意,所谓的空值是什 ...

  7. SQL实现字段内容查找和替换

  8. alloc和初始化的定义

    1.alloc是为原始实例进行分配内存,但是还不能使用 2.初始化的作用就是将一个对象的初始状态(即它的实例变量和属性)设定为合理的值,然后返回对象.它的目的就是返回一个有用的值

  9. Android ListView ListActivity PreferenceActivity背景变黑的问题ZT

    Android ListView ListActivity PreferenceActivity背景变黑的问题 ListView在滚动时背景会变暗甚至变黑,这个要从Listview的效果说起,默认的L ...

  10. PHP面向对象的程序设计一些简单的概念

    一.面向对象的概述    数组和对象,都属于PHP中的复合类型(一个变量可以存储多个单元) 对象比数组更强大,不仅可以存储多个数据,还可以将函数存在对象中 对象的三大特性:封装,继承,多态 面向对象编 ...