Seven Steps to Success

Machine Learning in Practice

Project failures in IT are all too common. The risks are higher if you are adopting a new technology that is unfamiliar to your organisation. Machine learning has been around for a long time in academia, but awareness and development of the technology has only recently reached a point at which its benefits are becoming attractive to business. There is huge potential to reduce costs and find new revenue by applying this technology correctly, but there are also pitfalls.

This guide will help you apply machine learning effectively to solve practical problems within your organisation. I’ll talk about issues that I’ve encountered applying machine learning in industry. My experience is in applying machine learning to analysis of text, however I believe the lessons I have learnt are generally applicable. I have been able to deliver significant and measurable benefits through applying machine learning, and I hope that I can enable you to do the same.

I will assume that you know the basics of machine learning, and that you have a real-world problem that you want to apply it to. This is not an introduction to machine learning (there are already plenty of those), however I don’t assume that you’re a machine learning expert. A lot of the advice is non-technical and would be just as useful to a product manager wanting to understand the technology as a software developer creating a solution.

Clearly understand the business need

Understanding the business need is important for any project, but it is easy to get blinded by technological possibilities. Is machine learning really going to benefit the company, or is it possible to achieve the same goals (or most of them) with some simple rules? The goal is to build a solution, not to do machine learning for the sake of it.

Try and identify all the metrics that are important to the business. The metrics we are optimising for have a profound effect on the solution we choose, so it is important to identify these early on. It also affects what alternatives there are to machine learning.

In the case of classification problems, potential metrics to consider are

  • accuracy: the proportion of all instances classified correctly. Note that this can be very misleading if the data is biased (if 90% of the data is from class 1, we can get 90% accuracy by simply classifying everying as being from that class). Real data is normally biased in some way. For this reason, you may want to consider an average of the accuracy on each class, or some other measure.

  • precision is needed when the results need to look good, for example if they are being presented to customers without any manual filtering after the machine learning phase.

  • high recall is important when combining machine learning with manual analysis to produce a combined system with high overall accuracy.

  • F1 score, or more generally Fβ score is useful when a trade-off between precision and recall is needed, and β can be adjusted to prefer one over the other.

Customer Service at Direct Electric

Direct Electric are a large electricity company based in the south of England. Dave, the head of customer service, is concerned about response times for upset customers who contact the company online. He wants to ensure that if a customer sends an angry email, a representative will get back to them quickly.

“At the moment, it takes about two days to respond, and I’d like to get that down to half a day,” he explains to Samantha, the resident machine learning expert on the software development team. Dave has heard about automated sentiment analysis, and wonders if that could be used to quickly identify the emails of interest, so that they can be prioritised by the customer service team.

“What we could do,” suggests Samantha, “is try and identify the emails that are likely to carry negative sentiment automatically, and send those to your team to look at first.”

“That sounds good!”

“The thing is,” says Samantha, “A machine-learning based system isn’t going to get everything right. Would it matter if we missed some of the negative sentiment emails?” Samantha thinks a high precision system may be what they are looking for. In this case, we will most likely have to sacrifice recall, and miss some of the emails of interest.

“Well, not really,” says Dave, “it’s only really useful to us if it finds them all.”

“Well, if you want to guarantee you find all of them,” says Samantha, “the only way to do that is to examine them manually.” Dave looks crestfallen. “But,” she continues, “we could probably get nearly all of them. Would it matter if we accidentally prioritised some articles that aren’t really negative?” She is thinking of trying to build a system with high recall, which will probably mean lower precision.

“That would be fine,” says Dave. “After all, at the moment, we’re reading them all.”


Sign up below to read all seven chapters: #### 1. Clearly understand the business need #### 2. Know what’s possible #### 3. Know the data #### 4. Plan for change #### 5. Avoid premature optimisation #### 6. Mitigate risks #### 7. Use common sense

Seven Steps to Success Machine Learning in Practice的更多相关文章

  1. Python (1) - 7 Steps to Mastering Machine Learning With Python

    Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...

  2. How do I learn machine learning?

    https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644   How Can I Learn X? ...

  3. 17 Great Machine Learning Libraries

    17 Great Machine Learning Libraries 08 October 2013 After wonderful feedback on my previous post on ...

  4. [C2P3] Andrew Ng - Machine Learning

    ##Advice for Applying Machine Learning Applying machine learning in practice is not always straightf ...

  5. Roles on a Machine Learning Project (机器学习项目中的角色)

    原文 :https://medium.com/machine-learning-in-practice/roles-on-a-machine-learning-project-216903a6dc12 ...

  6. Machine Learning and Data Mining(机器学习与数据挖掘)

    Problems[show] Classification Clustering Regression Anomaly detection Association rules Reinforcemen ...

  7. [C2P1] Andrew Ng - Machine Learning

    About this Course Machine learning is the science of getting computers to act without being explicit ...

  8. (转) Graph-powered Machine Learning at Google

        Graph-powered Machine Learning at Google     Thursday, October 06, 2016 Posted by Sujith Ravi, S ...

  9. ON THE EVOLUTION OF MACHINE LEARNING: FROM LINEAR MODELS TO NEURAL NETWORKS

    ON THE EVOLUTION OF MACHINE LEARNING: FROM LINEAR MODELS TO NEURAL NETWORKS We recently interviewed ...

随机推荐

  1. dig命令 安装

    获取容器 dns 信息 需要安装dig 命令 yum install bind-utils

  2. android 63 Fragment

    #Fragment 是3.0平板才引入进来的,3.0之后就加入了Fragment.原来是一个屏幕就是一个Activity,>片段,碎片 1. 定义某一个片段的界面 继承Fragment类 pub ...

  3. java se 6在solaris的可观察性特征分析

        java平台标准版(java se)6,代码名为"mustang",是最新的java se发行版本(正在开发中).java se 6源码和二进制代码都可以在www.java ...

  4. Yet Another 10 Common Mistakes Java Developers Make When Writing SQL (You Won’t BELIEVE the Last One)--reference

    (Sorry for that click-bait heading. Couldn’t resist ;-) ) We’re on a mission. To teach you SQL. But ...

  5. Linux shell的&&和||--转载

    Linux shell的&&和||   shell 在执行某个命令的时候,会返回一个返回值,该返回值保存在 shell 变量 $? 中.当 $? == 0 时,表示执行成功:当 $? ...

  6. Android canvas rotate():平移旋转坐标系至任意原点任意角度-------附:android反三角函数小结

    自然状态下,坐标系以屏幕左上角为原点,向右是x正轴,向下是y正轴.现在要使坐标系的原点平移至任一点O(x,y),且旋转a角度,如何实现? 交待下我的问题背景,已知屏幕上有两点p1和p2,构成直线l.我 ...

  7. Binding 中 Elementname,Source,RelativeSource 三种绑定的方式

    在WPF应用的开发过程中Binding是一个非常重要的部分. 在实际开发过程中Binding的不同种写法达到的效果相同但事实是存在很大区别的. 这里将实际中碰到过的问题做下汇总记录和理解. 1. so ...

  8. Spring MVC 中的 forward 和 redirect

    Spring MVC 中,我们在返回逻辑视图时,框架会通过 viewResolver 来解析得到具体的 View,然后向浏览器渲染.假设逻辑视图名为 hello,通过配置,我们配置某个 ViewRes ...

  9. Java-hibernate的映射文件

    Hibernate 需要知道怎样去加载(load)和存储(store)持久化类的对象.这正是 Hibernate 映 射文件发挥作用的地方.映射文件告诉 Hibernate 它应该访问数据库(data ...

  10. IE6 中的最大最小寬度和高度 css 高度 控制(兼容版本)

    /* 最小寬度 */.min_width{min-width:300px; /* sets max-width for IE */ _width:expression(document.body.cl ...