转自:https://www.technologyreview.com/s/608921/ai-algorithms-are-starting-to-teach-ai-algorithms/#

You Could Become an AI Master Before You Know It. Here’s How.

Automating machine learning will make the technology more accessible to non–AI experts.

 

At first blush, Scot Barton might not seem like an AI pioneer. He isn’t building self-driving cars or teaching computers to thrash humans at computer games. But within his role at Farmers Insurance, he is blazing a trail for the technology.

Barton leads a team that analyzes data to answer questions about customer behavior and the design of different policies. His group is now using all sorts of cutting-edge machine-learning techniques, from deep neural networks to decision trees. But Barton did not hire an army of AI wizards to make this possible. His team uses a platform called DataRobot, which automates a lot of difficult work involved in applying such techniques.

The insurance company’s work with DataRobot hints at how artificial intelligence might have to evolve in the next few years if it is to realize its enormous potential. Beyond spectacular demonstrations like DeepMind’s game-playing software AlphaGo, AI does have the power to revolutionize entire industries and make all sorts of businesses more efficient and productive. This, in turn, could help rejuvenate the economy by increasing overall productivity. But in order for this to happen, the technology will need to become a whole lot easier to use.

Related Story

China’s AI Awakening
中国 人工智能 的崛起

The West shouldn’t fear China’s artificial-intelligence revolution. It should copy it.

The problem is that many of the steps involved in using existing AI techniques currently require significant expertise. And it isn’t as simple as building a more user-friendly interface on top of things, because engineers often have to apply judgment and know-how when crafting and tweaking their code.

But AI researchers and companies are now trying to address this by essentially turning the technology on itself, using machine learning to automate the trickier aspects of developing AI algorithms. Some experts are even building the equivalent of AI-powered operating systems designed to make applications of the technology as accessible as Microsoft Excel is today.

DataRobot is a step in that direction. You feed in raw data, and the platform automatically cleans and reformats it. Then it runs dozens of different algorithms at once against it, ranking their performance. Barton first tried using the platform by inputting a bunch of insurance data to see if it could predict a specific dollar value. Compared with a standard, hand-built statistical approach, the model selected had a 20 percent lower error rate. “Out of the box, with the push of one button; that’s pretty impressive,” he says.

AI Skills Gap

The reality of applying AI was laid bare in a report published by the consulting company McKinsey in June of this year. This report concludes that artificial intelligence, especially machine learning, may overhaul big industries, including manufacturing, finance, and health care, potentially adding up to $126 billion to the U.S. economy by 2025. But the report has one big caveat: a critical talent shortage.

There is certainly a big push to train as many people as possible to use AI (see “Andrew Ng’s Next Trick: Training a Million AI Experts”). But that will take time, and not everyone can become an AI master. The best way to maximize the impact of any technology is to make it as accessible as possible. Only then will AI begin to creep into ordinary offices and workplaces. DataRobot is already being used in some of those settings.

 
JAY DANIEL WRIGHT

Late one afternoon, DataRobot’s office in Boston’s financial district is deserted apart from a handful of engineers milling around a large display. The company’s solution certainly seems impressive when Jonathan Dahlberg, one of the consultants, gives me a demo. He loads up a public data set of loan applications and payments, and then he has the system develop a bunch of models to see if there are any patterns in why people default.

In a few seconds, dozens of competing algorithms appear on the screen; at the top is a relatively unsexy but widely used gradient-boosting technique called XGBoost. This quickly shows that applicants’ income is especially important, but so is the reason they give for wanting a loan. It turns out that people who mention “starting a business” in their application are an especially bad bet.

DataRobot might match the expertise or skill of a really good data scientist, Dahlberg says, but it can offer a broader perspective. A person might rely too heavily on a certain technique, and DataRobot could automatically reveal a fundamentally better approach. It is also still possible for a user to manually modify the underlying algorithm using the programming languages Python or R. Without a close examination, it’s hard to know how well the system automates some of the trickier aspects of data science, like data cleaning and feature engineering, but it seems to take care of a surprising amount.

Sign up for The Download
What's important in technology and innovation, delivered to you every day.
Sign Up

Manage your newsletter preferences

The company’s CEO, Jeremy Achin, was inspired to start a company after watching The Social Network, as he admits a little sheepishly when we meet for coffee near MIT. But he got the idea for DataRobot while taking part in data-science competitions on the crowdsourcing platform Kaggle, which was acquired by Google earlier this year. Kaggle offer prizes for the algorithm that performs best at making a specific prediction from a large data set. This task typically involves developing a machine-learning algorithm that feeds on the data. As one of the best early Kaggle contestants, Achin realized he was already automating a lot of the steps involved in each competition. “I thought that if we collected enough data sets, enough problems, and ran enough experiments, we could do machine learning on machine learning. That was the original idea,” he says.

The idea clearly resonated with investors. DataRobot, started in 2012, has raised more than $100 million, including $54 million this March, around the same time that Kaggle was acquired. The company says it has more than 100 customers already. Achin says the concept is a lot less popular with many data scientists, who either feel that their skills cannot be automated or worry that they will be. But he believes that most businesses will have no other option if they want to make use of AI. “I don’t care how many people change their title to ‘data scientist’ on LinkedIn,” he says. “You’re not going to move the needle.”

Self-Learning Systems

The shortage of data scientists is inspiring many others to work on automating machine learning. A growing number of research papers are popping up on using its techniques to automate more and more aspects of AI.

One of the world’s biggest players in AI, Google, is also turning its attention to the idea. Google has invested enormous sums in developing powerful AI algorithms and deploying them across its services. But the company is also keen to add more AI to its cloud services. And going beyond simple tools for image or text classification will mean automating more of the work involved in training machine-learning models.

 
JAY DANIEL WRIGHT

“The goal is to make this technology more accessible,” says John Giannandrea, a Scottish computer engineer who leads Google’s AI efforts. “So anybody could say ‘Build me a predictive model’ and it goes off and does it.”

Earlier this year, the company announced some significant progress toward this goal, demonstrating an experimental way to automate the process of tuning deep-learning neural networks (see “AI Software Learns to Make AI Software”). These are perhaps the most powerful machine-learning algorithms around, and they have significantly improved the state of the art in image and voice recognition. But they are also notoriously difficult to engineer. Giannandrea says this work is now producing some very promising results, in some cases matching the performance of systems developed by hand. And he expects Google to release more results in coming months.

Others have even grander designs. Eric Xing, a professor at Carnegie Mellon University, for instance, is developing what amounts to an operating system built from different machine-learning components. This OS uses virtualization and machine learning to abstract away much of the complexity in designing and training AI. It even features a graphical user interface that can be used to train a machine-learning model on a particular data set.

Xing was educated in China and studied at UC Berkeley alongside Andrew Ng, now a well-known figure in the world of AI. He is very polite, and surprisingly casual about wanting to reinvent the way people use computers. Xing envisions his AI OS becoming as easy to use as something like Microsoft’s spreadsheet package, Excel. “This is a core issue across the whole of AI,” he says. “The barrier to entry is just too high.”

Xing has created a company, Petuum, to develop the OS, and it has already created a series of tools aimed at bringing machine learning to medicine. “Doctors want an interface and medical records, images—each requires a different machine-learning approach,” he says. Petuum is also gearing up to release its platform.

Petuum’s OS, and other tools for automating AI, will face some unique challenges. There are already concerns about machine-learning algorithms inadvertently absorbing biases from training data, and some models are simply too opaque to examine carefully (see “The Dark Secret at the Heart of AI”). If AI becomes much easier to use, it’s possible these issues could become more widespread and more entrenched.

“To do machine learning really well, you need a PhD and about five years of experience,” says Rich Caruana, a senior researcher at Microsoft who has been doing data science for about 20 years. “There are many pitfalls. Does your algorithm expire after six months, and is it interpretable?”

Caruana believes it should be possible to automate some of the steps a data scientist needs to take in order to guard against such problems—something similar to a pilot’s pre-flight checklist. But he cautions against trusting too much in systems that promise to automate everything. “I know,” he says, “because I’ve stubbed my toe along the way.”

[转]You Could Become an AI Master Before You Know It. Here’s How.的更多相关文章

  1. git subtree用法(转)

    git subtree用法 一.使用场景 例如,在项目Game中有一个子目录AI.Game和AI分别是一个独立的git项目,可以分开维护.为了避免直接复制粘贴代码,我们希望Game中的AI子目录与AI ...

  2. git 版本库拆分和subtree用法

    git 版本库拆分 原文地址: https://segmentfault.com/a/1190000002548731 程序员最爽的事情是什么?删删删!所有项目本来都很苗条的,时间长了难免有一些越搞越 ...

  3. 使用GIT SUBTREE集成项目到子目录(转)

    原文:http://aoxuis.me/post/2013-08-06-git-subtree 使用场景 例如,在项目Game中有一个子目录AI.Game和AI分别是一个独立的git项目,可以分开维护 ...

  4. Artificial intelligence(AI)

    ORM: https://github.com/sunkaixuan/SqlSugar 微软DEMO: https://github.com/Microsoft/BotBuilder 注册KEY:ht ...

  5. HDU5900 QSC and Master(区间DP + 最小费用最大流)

    题目 Source http://acm.hdu.edu.cn/showproblem.php?pid=5900 Description Every school has some legends, ...

  6. 2016 年沈阳网络赛---QSC and Master(区间DP)

    题目链接 http://acm.hdu.edu.cn/showproblem.php?pid=5900 Problem Description Every school has some legend ...

  7. HDU 5900 QSC and Master 区间DP

    QSC and Master Problem Description   Every school has some legends, Northeastern University is the s ...

  8. 2016沈阳网络赛 QSC and Master

    QSC and Master Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 131072/131072 K (Java/Others) ...

  9. 程序员带你一步步分析AI如何玩Flappy Bird

    以下内容来源于一次部门内部的分享,主要针对AI初学者,介绍包括CNN.Deep Q Network以及TensorFlow平台等内容.由于笔者并非深度学习算法研究者,因此以下更多从应用的角度对整个系统 ...

随机推荐

  1. slaac

    https://zhidao.baidu.com/question/460186176.html slaac是IPv6中的术语.Stateless address autoconfiguration, ...

  2. freeswitch笔记

    freeswitch知识点:播放录音命令:originate user/1000 &playback(/tmp/123.wav)查看当前注册用户命令:sofia status profile ...

  3. Android Activity之间的传值示例

    AndroidManifest.xml <?xml version="1.0" encoding="utf-8"?> <manifest xm ...

  4. SAP Module Pool Program Learning Documentation——Commit Work and Update dtab

    When using Native SQL to directly manipulate database tables, it makes a difference to use COMMIT WO ...

  5. Spring Boot 揭秘与实战(八) 发布与部署 - 远程调试

    文章目录 1. 依赖 2. 部署 3. 调试 4. 源代码 设置远程调试,可以在正式环境上随时跟踪与调试生产故障. 依赖 在 pom.xml 中增加远程调试依赖. <plugins> &l ...

  6. 【转载】 Pytorch(0)降低学习率torch.optim.lr_scheduler.ReduceLROnPlateau类

    原文地址: https://blog.csdn.net/weixin_40100431/article/details/84311430 ------------------------------- ...

  7. MMON进程手工启动

    手工启动MMON进程 1. 故障现象 #某帅哥接到业务人员反映系统缓慢,RAC环境 #生成AWR报告发现节点1没有数据 #查询快照视图,发现只有节点1没有快照记录,节点2正常存在快照记录 SYS &g ...

  8. 关系数据库(MySQL)的规范化、以及设计原则

    1.了解范式(NF) 为了使得关系数据库能够符合规范理论,所有的数据库表都要满足:范式. 关系数据库的范式有五类:第一范式,第二范式,....第五范式.下面我们来了解一下前三个范式: 第一范式:简单来 ...

  9. HTML5 canvas 内部元素事件响应

    HTML5 canvas 内部元素事件响应 isPointInPath 只能拿当前上下文的路径 重画每个部分 都isPointInPath判断

  10. HDU 1171 Big Event in HDU dp背包

    Time Limit: 10000/5000 MS (Java/Others) Memory Limit: 65536/32768 K (Java/Others) Total Submission(s ...