Why The Golden Age Of Machine Learning is Just Beginning

Even though the buzz around neural networks, artificial intelligence, and machine learning has been relatively recent, as many know, there is nothing new about any of these methods. If so many of the core algorithms and approaches have been around for decades, why is it just now that they are getting their day in the sun?

To answer that question, we can take a look at what has happened over the last five years or so with the attention and tooling around data. And we can also point to the dramatic increase in scalable compute power, or to be more specific about it, performance per watt and bit. These two factors combined have fed the development fury, growing data analysis well beyond the standard database and calculation approaches that have themselves been around for decades. The point is, we are at peak “data hype”—there was a rush to develop a host of new tools and frameworks (Hadoop, as but one example) to support larger, more complex datasets, then a secondary effort to push the performance of the data analysis on new or enhanced frameworks.

So could it be that machine learning in particular is the next natural step for all the companies and end users who have climbed aboard the data express? Indeed, the attention around large-scale, complex analytics and the systems and frameworks to support them spurred some of that evolution. But ultimately, one could make the argument that for some analytical workloads, in both the research and enterprise spaces, those advances have hit their own peak. All of the new methods and approaches that grew from the fertile “big data” soils have been sown and tested. And there is, again, for a narrow (but growing) set of workloads, room for another way of thinking about complex problem solving.

This is not to say that there hasn’t’ been ongoing research and development around new machine learning approaches that can leverage ultra-scalable hardware. But there is a bigger story, explains Patrick Hall, who has the unique position of being the senior machine learning scientists at statistics software giant, SAS. His title is noteworthy because he is finding solutions to problems that don’t fit well into classical statistical modeling approaches (which is what his company specializes in) with the goal of integrating those methods into existing enterprise products—at least at some point.

Hall’s assertion is that while all of the aforementioned trends are pushing machine learning to the forefront, the one thing that is different now is that data finally exists in sufficient volume that it does not work well for statistical analysis. That, coupled with the new developments in machine learning algorithms, means that the golden age of machine learning is finally arriving.

“This is data that can be found in many places; it’s wider than it is long—it has more columns than rows, more variables and observations. All of that is a bad fit for traditional statistics. There is now more data with correlated variables (for instance, pixels that are related in image data) or even in text mining.” Hall says equally, there is a wealth of new data from a range of sources that is defined by missing or sparse data where 1 percent or less of an entire dataset contains actual variables.

And for businesses that want to invest in analyzing this data where traditional statistics don’t fit, there is a huge opportunity–one that is feeding a new wealth of startups and new initiatives from established analytics companies who seem to be getting the message that calling a product “machine learning” even if it’s just a slightly upped version of analytics, is the rage. That causes a problem of definition, and there are, without naming names, some serious examples of analytics and BI companies taking the same old software and slapping a “machine learning” label on it simply because it sounds more robust or complex than data analytics. This is one of the growing pains for any new technology area, especially when the hype machine revs its mighty engines. Hall says users need to understand their data and problem and once that happens, it will be clear whether or not a standard statistics and database solution will suit versus something more versatile (and likely complex).

This isn’t to say that every traditional statistics and database company is changing its product messaging instead of the technology around machine learning. SAS introduced its first data mining product in the late 1990s (Enterprise Data Miner) and at the time, it had many of the machine learning models that are garnering all the hype lately (neural networks, decision trees, K means clustering, etc). There were, even then, Hall says, some emerging use cases where data was coming from the enterprise data warehouse to fit against models that lacked any parametric assumptions. So it’s not new—but the scope and number of those problems is growing, even in places where one might not expect it.

Among the enterprise arenas ripe for a machine learning boom are banking, insurance, and the credit card industries. Interestingly, all three of these are examples of regulated markets where having a black box approach to a problem is problematic for regulators. “There is always a tradeoff with machine learning. You trade interpretability for what you’re hoping is more accuracy and this is a tough tradeoff for regulated industries, but the fact is, they are seeing an opportunity finally and this tradeoff is one they are increasingly comfortable with.”

Hall and his company are well aware that they will have to keep innovating on both the language and product level to keep pace with the wave of machine learning startups that are being funded one by one. “There is indeed a lot of competition for attention right now,” he agrees. “We are trying to adapt our technology to these problems with concurrency and scalability for machine learningbut this is SAS, which means we are confined to a language syntax that, admittedly, looks old.” He says that even though the technology is jut as robust as ever, SAS is “stuck” because changing the core syntax means that the mainframes at American Express and Bank of America will come crashing down. “What we can do is change what runs behind that syntax, and that is what we are working on now.”

It is hard to say at this point how large-scale enterprises will think about all of that data in their warehouses that doesn’t fit the standard regression modeling bill. But to do be fair, doing more complex things with familiar frameworks and approaches is going to have its value, especially for regulated industries who are looking to beef up their analysis using machine learning methods since at least there is a root level of formality and familiarity. This is where SAS is hoping to succeed with its foray into machine learning for large enterprise—and where some of the emerging startups will have a tough time moving past consumer-focused image and facial recognition, speech recognition, or other areas.

It also might be too soon to say that machine learning is seeing the dawn of its golden age, but there is something on the horizon, glinting in the distance. Given the wealth of new investment and attention around machine learning as next great partner for the big data tools and approaches, this does not seem like a stretch.

Why The Golden Age Of Machine Learning is Just Beginning的更多相关文章

  1. How do I learn machine learning?

    https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644   How Can I Learn X? ...

  2. Machine Learning for Developers

    Machine Learning for Developers Most developers these days have heard of machine learning, but when ...

  3. Course Machine Learning Note

    Machine Learning Note Introduction Introduction What is Machine Learning? Two definitions of Machine ...

  4. Machine Learning读书会,面试&算法讲座,算法公开课,创业活动,算法班集锦

    Machine Learning读书会,面试&算法讲座,算法公开课,创业活动,算法班集锦 近期活动: 2014年9月3日,第8次西安面试&算法讲座视频 + PPT 的下载地址:http ...

  5. 【Machine Learning】决策树案例:基于python的商品购买能力预测系统

    决策树在商品购买能力预测案例中的算法实现 作者:白宁超 2016年12月24日22:05:42 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现的深入理解.本 ...

  6. Practical Machine Learning For The Uninitiated

    Practical Machine Learning For The Uninitiated Last fall when I took on ShippingEasy's machine learn ...

  7. 《Machine Learning》系列学习笔记之第一周

    <Machine Learning>系列学习笔记 第一周 第一部分 Introduction The definition of machine learning (1)older, in ...

  8. How to use data analysis for machine learning (example, part 1)

    In my last article, I stated that for practitioners (as opposed to theorists), the real prerequisite ...

  9. Machine Learning|Andrew Ng|Coursera 吴恩达机器学习笔记

    Week1: Machine Learning: A computer program is said to learn from experience E with respect to some ...

随机推荐

  1. 【转】【Asp.Net】asp.net(c#) 网页跳转

    在asp.net下,经常需要页面的跳转,下面是具体的几种方法.跳转页面是大部编辑语言中都会有的,正面我们来分别介绍一下关于.net中response.redirect sever.execute se ...

  2. C# 将透明图片的非透明区域转换成Region

    以下代码实现将一张带透明度的png图片的非透明部分转换成Region输出 /// <summary> /// 根据图片得到一个图片非透明部分的区域 /// </summary> ...

  3. .net mvc Bundle 问题解决方案

    使用.net MVC4 开发Web项目时,可以利用"Bundle"对Css.JS文件进行压缩打包,一方面可以减少数据加载的次数,另一方面可以减少数据传输量,但在实际使用中却遇到了问 ...

  4. Linux 进程通信(共享内存区)

    共享内存是由内核出于在多个进程间交换信息的目的而留出的一块内存区(段). 如果段的权限设置恰当,每个要访问该段内存的进程都可以把它映像到自己的私有地址空间中. 如果一个进程更新了段中的数据,其他进程也 ...

  5. C语言 百炼成钢16

    //题目46:海滩上有一堆桃子,五只猴子来分.第一只猴子把这堆桃子凭据分为五份,多了一个,这只 //猴子把多的一个扔入海中,拿走了一份.第二只猴子把剩下的桃子又平均分成五份,又多了 //一个,它同样把 ...

  6. 浅谈VC++中预编译的头文件放那里的问题分析

    用C++写程序,肯定要用预编译头文件,就是那个stdafx.h.不过我一直以为只要在.cpp文件中包含stdafx.h 就使用了预编译头文件,其实不对.在VC++中,预编译头文件是指放到stdafx. ...

  7. SgmlReader使用方法

    HtmlAgilityPack是一个开源的html解析器,底层是通过将html格式转成标准的xml格式文件来实现的(使用dot net里的XPathDocument等xml相关类),可以从这里下载:h ...

  8. 用 eric6 与 PyQt5 实现python的极速GUI编程(系列03)---- Drawing(绘图)(1)-- 绘写文字

    [概览] 本文实现如下的程序:(在窗体中绘画出文字) 主要步骤如下: 1.在eric6中新建项目,新建窗体 2.(自动打开)进入PyQt5 Desinger,编辑图形界面,保存 3.回到eric 6, ...

  9. Linux下的MySQL简单操作(服务启动与关闭、启动与关闭、查看版本)

    小弟今天记录一下在Linux系统下面的MySQL的简单使用,如下: 服务启动与关闭 启动与关闭 查看版本 环境 Linux版本:centeros 6.6(下面演示),Ubuntu 12.04(参见文章 ...

  10. LeetCode:Single Number II

    题目地址:here 题目大意:一个整数数组中,只有一个数出现一次,其余数都出现3次,在O(n)时间,O(1)空间内找到这个出现一次的数 对于”只有一个数出现一次,其余数出现2次“的情况,很简单,只要把 ...