转自：https://www.technologyreview.com/s/608921/ai-algorithms-are-starting-to-teach-ai-algorithms/#

You Could Become an AI Master Before You Know It. Here’s How.

Automating machine learning will make the technology more accessible to non–AI experts.

by Will Knight
October 17, 2017

At first blush, Scot Barton might not seem like an AI pioneer. He isn’t building self-driving cars or teaching computers to thrash humans at computer games. But within his role at Farmers Insurance, he is blazing a trail for the technology.

Barton leads a team that analyzes data to answer questions about customer behavior and the design of different policies. His group is now using all sorts of cutting-edge machine-learning techniques, from deep neural networks to decision trees. But Barton did not hire an army of AI wizards to make this possible. His team uses a platform called DataRobot, which automates a lot of difficult work involved in applying such techniques.

The insurance company’s work with DataRobot hints at how artificial intelligence might have to evolve in the next few years if it is to realize its enormous potential. Beyond spectacular demonstrations like DeepMind’s game-playing software AlphaGo, AI does have the power to revolutionize entire industries and make all sorts of businesses more efficient and productive. This, in turn, could help rejuvenate the economy by increasing overall productivity. But in order for this to happen, the technology will need to become a whole lot easier to use.

China’s AI Awakening
中国人工智能的崛起

The West shouldn’t fear China’s artificial-intelligence revolution. It should copy it.

The problem is that many of the steps involved in using existing AI techniques currently require significant expertise. And it isn’t as simple as building a more user-friendly interface on top of things, because engineers often have to apply judgment and know-how when crafting and tweaking their code.

But AI researchers and companies are now trying to address this by essentially turning the technology on itself, using machine learning to automate the trickier aspects of developing AI algorithms. Some experts are even building the equivalent of AI-powered operating systems designed to make applications of the technology as accessible as Microsoft Excel is today.

DataRobot is a step in that direction. You feed in raw data, and the platform automatically cleans and reformats it. Then it runs dozens of different algorithms at once against it, ranking their performance. Barton first tried using the platform by inputting a bunch of insurance data to see if it could predict a specific dollar value. Compared with a standard, hand-built statistical approach, the model selected had a 20 percent lower error rate. “Out of the box, with the push of one button; that’s pretty impressive,” he says.

AI Skills Gap

The reality of applying AI was laid bare in a report published by the consulting company McKinsey in June of this year. This report concludes that artificial intelligence, especially machine learning, may overhaul big industries, including manufacturing, finance, and health care, potentially adding up to $126 billion to the U.S. economy by 2025. But the report has one big caveat: a critical talent shortage.

There is certainly a big push to train as many people as possible to use AI (see “Andrew Ng’s Next Trick: Training a Million AI Experts”). But that will take time, and not everyone can become an AI master. The best way to maximize the impact of any technology is to make it as accessible as possible. Only then will AI begin to creep into ordinary offices and workplaces. DataRobot is already being used in some of those settings.

JAY DANIEL WRIGHT

Late one afternoon, DataRobot’s office in Boston’s financial district is deserted apart from a handful of engineers milling around a large display. The company’s solution certainly seems impressive when Jonathan Dahlberg, one of the consultants, gives me a demo. He loads up a public data set of loan applications and payments, and then he has the system develop a bunch of models to see if there are any patterns in why people default.

In a few seconds, dozens of competing algorithms appear on the screen; at the top is a relatively unsexy but widely used gradient-boosting technique called XGBoost. This quickly shows that applicants’ income is especially important, but so is the reason they give for wanting a loan. It turns out that people who mention “starting a business” in their application are an especially bad bet.

DataRobot might match the expertise or skill of a really good data scientist, Dahlberg says, but it can offer a broader perspective. A person might rely too heavily on a certain technique, and DataRobot could automatically reveal a fundamentally better approach. It is also still possible for a user to manually modify the underlying algorithm using the programming languages Python or R. Without a close examination, it’s hard to know how well the system automates some of the trickier aspects of data science, like data cleaning and feature engineering, but it seems to take care of a surprising amount.

What's important in technology and innovation, delivered to you every day.

Manage your newsletter preferences

The company’s CEO, Jeremy Achin, was inspired to start a company after watching The Social Network, as he admits a little sheepishly when we meet for coffee near MIT. But he got the idea for DataRobot while taking part in data-science competitions on the crowdsourcing platform Kaggle, which was acquired by Google earlier this year. Kaggle offer prizes for the algorithm that performs best at making a specific prediction from a large data set. This task typically involves developing a machine-learning algorithm that feeds on the data. As one of the best early Kaggle contestants, Achin realized he was already automating a lot of the steps involved in each competition. “I thought that if we collected enough data sets, enough problems, and ran enough experiments, we could do machine learning on machine learning. That was the original idea,” he says.

The idea clearly resonated with investors. DataRobot, started in 2012, has raised more than $100 million, including $54 million this March, around the same time that Kaggle was acquired. The company says it has more than 100 customers already. Achin says the concept is a lot less popular with many data scientists, who either feel that their skills cannot be automated or worry that they will be. But he believes that most businesses will have no other option if they want to make use of AI. “I don’t care how many people change their title to ‘data scientist’ on LinkedIn,” he says. “You’re not going to move the needle.”

Self-Learning Systems

The shortage of data scientists is inspiring many others to work on automating machine learning. A growing number of research papers are popping up on using its techniques to automate more and more aspects of AI.

One of the world’s biggest players in AI, Google, is also turning its attention to the idea. Google has invested enormous sums in developing powerful AI algorithms and deploying them across its services. But the company is also keen to add more AI to its cloud services. And going beyond simple tools for image or text classification will mean automating more of the work involved in training machine-learning models.

JAY DANIEL WRIGHT

“The goal is to make this technology more accessible,” says John Giannandrea, a Scottish computer engineer who leads Google’s AI efforts. “So anybody could say ‘Build me a predictive model’ and it goes off and does it.”

Earlier this year, the company announced some significant progress toward this goal, demonstrating an experimental way to automate the process of tuning deep-learning neural networks (see “AI Software Learns to Make AI Software”). These are perhaps the most powerful machine-learning algorithms around, and they have significantly improved the state of the art in image and voice recognition. But they are also notoriously difficult to engineer. Giannandrea says this work is now producing some very promising results, in some cases matching the performance of systems developed by hand. And he expects Google to release more results in coming months.

Others have even grander designs. Eric Xing, a professor at Carnegie Mellon University, for instance, is developing what amounts to an operating system built from different machine-learning components. This OS uses virtualization and machine learning to abstract away much of the complexity in designing and training AI. It even features a graphical user interface that can be used to train a machine-learning model on a particular data set.

Recommended for You

Xing was educated in China and studied at UC Berkeley alongside Andrew Ng, now a well-known figure in the world of AI. He is very polite, and surprisingly casual about wanting to reinvent the way people use computers. Xing envisions his AI OS becoming as easy to use as something like Microsoft’s spreadsheet package, Excel. “This is a core issue across the whole of AI,” he says. “The barrier to entry is just too high.”

Xing has created a company, Petuum, to develop the OS, and it has already created a series of tools aimed at bringing machine learning to medicine. “Doctors want an interface and medical records, images—each requires a different machine-learning approach,” he says. Petuum is also gearing up to release its platform.

Petuum’s OS, and other tools for automating AI, will face some unique challenges. There are already concerns about machine-learning algorithms inadvertently absorbing biases from training data, and some models are simply too opaque to examine carefully (see “The Dark Secret at the Heart of AI”). If AI becomes much easier to use, it’s possible these issues could become more widespread and more entrenched.

“To do machine learning really well, you need a PhD and about five years of experience,” says Rich Caruana, a senior researcher at Microsoft who has been doing data science for about 20 years. “There are many pitfalls. Does your algorithm expire after six months, and is it interpretable?”

Caruana believes it should be possible to automate some of the steps a data scientist needs to take in order to guard against such problems—something similar to a pilot’s pre-flight checklist. But he cautions against trusting too much in systems that promise to automate everything. “I know,” he says, “because I’ve stubbed my toe along the way.”

[转]You Could Become an AI Master Before You Know It. Here’s How.的更多相关文章

git subtree用法(转)
git subtree用法一.使用场景例如,在项目Game中有一个子目录AI.Game和AI分别是一个独立的git项目,可以分开维护.为了避免直接复制粘贴代码,我们希望Game中的AI子目录与AI ...
git 版本库拆分和subtree用法
git 版本库拆分原文地址: https://segmentfault.com/a/1190000002548731 程序员最爽的事情是什么?删删删!所有项目本来都很苗条的,时间长了难免有一些越搞越 ...
使用GIT SUBTREE集成项目到子目录（转）
原文:http://aoxuis.me/post/2013-08-06-git-subtree 使用场景例如,在项目Game中有一个子目录AI.Game和AI分别是一个独立的git项目,可以分开维护 ...
Artificial intelligence(AI)
ORM: https://github.com/sunkaixuan/SqlSugar 微软DEMO: https://github.com/Microsoft/BotBuilder 注册KEY:ht ...
HDU5900 QSC and Master（区间DP + 最小费用最大流）
题目 Source http://acm.hdu.edu.cn/showproblem.php?pid=5900 Description Every school has some legends, ...
2016 年沈阳网络赛---QSC and Master(区间DP)
题目链接 http://acm.hdu.edu.cn/showproblem.php?pid=5900 Problem Description Every school has some legend ...
HDU 5900 QSC and Master 区间DP
QSC and Master Problem Description Every school has some legends, Northeastern University is the s ...
2016沈阳网络赛 QSC and Master
QSC and Master Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 131072/131072 K (Java/Others) ...
程序员带你一步步分析AI如何玩Flappy Bird
以下内容来源于一次部门内部的分享,主要针对AI初学者,介绍包括CNN.Deep Q Network以及TensorFlow平台等内容.由于笔者并非深度学习算法研究者,因此以下更多从应用的角度对整个系统 ...

随机推荐

【转】C++四种类型转换方式
C++四种类型转换方式 https://blog.csdn.net/lv_amelia/article/details/79483579 C风格的强制类型转换(Type Case)很简单,不管什么类型 ...
java通过配置文件（Properties类）连接Oracle数据库代码示例
import java.io.FileInputStream; import java.io.IOException; import java.io.InputStream; import java. ...
ERROR: gnu-config-native-20150728+gitAUTOINC+b576fa87c1-r0 do_unpack: Function failed: Fetcher failure: Fetch command failed with exit code 128, output: fatal: the '--set-upstream' option is no longer
/********************************************************************** * ERROR: gnu-config-native-2 ...
[LeetCode&Python] Problem 447. Number of Boomerangs
Given n points in the plane that are all pairwise distinct, a "boomerang" is a tuple of po ...
[LeetCode&Python] Problem 606. Construct String from Binary Tree
You need to construct a string consists of parenthesis and integers from a binary tree with the preo ...
Justiﬁed Jungle
Problem J: Justiﬁed Jungle Time limit: 6 s Memory l imit: 512 MiB As you probably know, a tree is a ...
数据库设计画图工具powerdesigner
powerdesigner 教程:http://jingyan.baidu.com/article/bea41d43684fa4b4c51be6cf.html
ACM-ICPC 2018 沈阳赛区网络预赛-K：Supreme Number
Supreme Number A prime number (or a prime) is a natural number greater than 11 that cannot be formed ...
2017.7.11 linux 挂载
挂载:Liunx采用树形的文件管理系统,也就是在Linux系统中,可以说已经没有分区的概念了.分区在Linux和其他设备一样都只是一个文件.要使用一个分区必须把它加载到文件系统中.这可能难于理解,继续 ...
hdu2255 奔小康赚大钱二分图最佳匹配--KM算法
传说在遥远的地方有一个非常富裕的村落,有一天,村长决定进行制度改革:重新分配房子.这可是一件大事,关系到人民的住房问题啊.村里共有n间房间,刚好有n家老百姓,考虑到每家都要有房住(如果有老百姓没房子住 ...

[转]You Could Become an AI Master Before You Know It. Here’s How.