Reject Inference: Your Data is Deceiving You
Keyword: Reject Inference
Suppose there is a dataset of several attributes, including working conditions, credit history, and property, that have been provided by the bank. The sample classified the customers according to whether they paid off their loans on time. Those who paid off their loans on time were classified as “good customers”, and those who did not pay off their loans on time were classified as “bad customers”.
If Rick, an employee of the bank, uses this dataset to do data analysis directly, what will happen?
Take one of these attributes as an example.
1 : unemployed
2 : skilled employee
3 : management/ highly qualified employee/ officer
Which of these three groups of people, by instinct, should have the best credit? Most people would think it is the second or the third category. However, the data give us a different answer.
As the data shows, the first group of customers is “better than” the third group of customers. After looking at the data, Rick might reach the conclusion that lending more money to the unemployed people is better than lending money to those who are the highly qualified employee, officer, or management board. Is it correct? Let’s think about it a little bit.
Let’s review the process of collecting data:
- Rick’s Customer applies for a personal loan
- If it is approved, go to step 3. Otherwise, it will not be counted as a data point in Rick’s data set.
- If a customer pays off the loan on time, he will be labeled as a “Good Customer”. Otherwise, he will be labeled as a “Bad Customer”.
Before collecting data, there is a crucial step - Step 2. That is to say, the customers who are collected by Rick have already been selected by the bank. Those who applied for a personal loan but didn’t get approved are not in this dataset.
Here I would like to ask you a question: which has the greater risk, jumping from the 4th floor or the 70th floor? (Please do not try it, it is just an example.) You may reply immediately: “The 70th floor, of course!”
You are wrong. I am not asking about the probability of death. I am asking about risk. Suppose someone will offer you 10 billion if you can jump from 70th floor without dying, then you probably won’t bet with him. However, suppose someone will offer you 10 billion if you can jump from 4th floor without dying, then you might want to give it a shot because you know you may not die.
The customers who make the bank feel like jumping from the 70th floor, are most likely rejected by the bank from the beginning. The bank usually has a hard time to make decisions on the application of the customers who make the bank feel like jumping from the 4th floor.
“The 70th floor” customers are likely existing in the first group of customers. So if the bank approved their application, then there must be some reasons support the bank to believe they will pay off their loans. If the bank approved every first-group customer’s application, the data may be different from current data.
Using the data analysis before didn't really understand the meaning of the data may result in you are deceived by your data.
There are lots of factors should be taken into consideration in an evaluation, but I have to simplify the explanation here. If there are any mistakes or anything make you uncomfortable, please let me know so that I can fix it.
Reject Inference: Your Data is Deceiving You的更多相关文章
- Data Visualization – Banking Case Study Example (Part 1-6)
python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...
- es6中promise ALL Race Resolve Reject finish的实现
function mypromise(func){ this.statue = "pending"; this.data = null; this.resolveCallback ...
- 信用评分卡 (part 5 of 7)
python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...
- 信用评分卡Credit Scorecards (1-7)
欢迎关注博主主页,学习python视频资源,还有大量免费python经典文章 python风控评分卡建模和风控常识 https://study.163.com/course/introductio ...
- cvpr2015papers
@http://www-cs-faculty.stanford.edu/people/karpathy/cvpr2015papers/ CVPR 2015 papers (in nicer forma ...
- ES6笔记(5)-- Generator生成器函数
系列文章 -- ES6笔记系列 接触过Ajax请求的会遇到过异步调用的问题,为了保证调用顺序的正确性,一般我们会在回调函数中调用,也有用到一些新的解决方案如Promise相关的技术. 在异步编程中,还 ...
- 【深度学习Deep Learning】资料大全
最近在学深度学习相关的东西,在网上搜集到了一些不错的资料,现在汇总一下: Free Online Books by Yoshua Bengio, Ian Goodfellow and Aaron C ...
- ES6扫盲
原文阅读请点击此处 一.let和const { // let声明的变量只在let命令所在的代码块内有效 let a = 1; var b = 2; } console.log(a); // 报错: R ...
- 【腾讯Bugly干货分享】打造“微信小程序”组件化开发框架
本文来自于腾讯Bugly公众号(weixinBugly),未经作者同意,请勿转载,原文地址:http://mp.weixin.qq.com/s/2nQzsuqq7Avgs8wsRizUhw 作者:Gc ...
随机推荐
- DML-删除
方式一:使用delete一.删除单表的记录★语法:delete from 表名 [where 筛选条件][limit 条目数]二.级联删除[补充]语法:delete 别名1,别名2 from 表1 别 ...
- Oracle中字符串截取常用方法总结
substr 函数:截取字符串 语法:SUBSTR(string,start, [length]) string:表示源字符串,即要截取的字符串. start:开始位置,从1开始查找.如果start ...
- swift3.0 保存图片到本地,申请权限
1.info中写上 <key>NSCameraUsageDescription</key> <string>需要您的同意才能读取媒体资料库</string&g ...
- React 父子组件和非父子组件传值
零.this.props 可以接收到 外界的传值 和 此组件标签内部自定义的方法 例: <one vals={message} sendVal={this ...
- laravel 安装添加多站点
官方文档如下 https://learnku.com/laravel/t/1160/laravel-nginx-multi-site-configuration
- 我的名字叫hadoop
第一回 新入环境 我的名字是hadoop,我一出生我的爸爸雅虎就给我取了这样一个名字:hadoop,我也不知道为什么叫这个名字,刚出生没多久,雅虎爸爸就把我领进一个黑暗的屋子里面,屋里堆满了黑色的 ...
- 嵌入式LInux之C语言提升篇---Skr Skr Up Up
嵌入式C语言提升 致敬:每一个奋斗的人! Up Up UpC语言常见编译错误1.预处理错误 -E 1-1 找不到源文件错误 自己定义头文件 使用 “xxx.h” 搜索的目录 ./ ...
- PAT (Basic Level) Practice 1032 挖掘机技术哪家强
个人练习 为了用事实说明挖掘机技术到底哪家强,PAT 组织了一场挖掘机技能大赛.现请你根据比赛结果统计出技术最强的那个学校. 输入格式: 输入在第 1 行给出不超过 10^5的正整数 N,即参赛人数 ...
- 小程序开发-10-新版Music组件、组件通信与wxss样式复用
加入缓存提升用户体验 思路:先从缓存中寻找数据或者从服务器中获取数据写入缓存中 优点:减少网络访问次数,提升用户体验 解决缓存带来的问题 问题:比如原先是不喜欢的在点击喜欢的时候,跳到下一期刊后返回来 ...
- 观看杨老师(杨旭)Asp.Net Core MVC入门教程记录
观看杨老师(杨旭)Asp.Net Core MVC入门教程记录 ASP.NET Core MVC入门 Asp.Net Core启动和配置 Program类,Main方法 Startup类 依赖注入,I ...