信用评分卡 (part 2of 7)
python信用评分卡(附代码,博主录制)

统计和数据挖掘中分类问题
Classification Problem in Statistics & Data Mining
I must say I was shocked when Amishi, a girl little over three years old, announced that going forward she is only friends with my wife and not me. Her reason for the breakup was that I am a boy and girls can only be friends with girls. She has learned this social norm from her friends at the preschool. I still remember the way she modeled for me in her swimsuit and umbrella just a few months ago. She was aware of the boy-girl difference even then, it is just she has learned this weird social norm now. The point over here is that toddlers can distinguish genders without much effort. Nature has given us a built-in equation to classify gender through a mere glance with a high degree of precision. Imagine a similar mechanism to distinguish between good and bad borrowers. You are talking about every banker’s dream. However, evolution has trained us to mate not to lend.
我必须说,当三十岁的女孩Amishi宣布前进时,她只是与我的妻子而不是我的朋友,我感到震惊。 分手的原因是我是男孩,女孩只能是女孩的朋友。 她从幼儿园的朋友那里学到了这种社会规范。 几个月前,我还记得她在泳衣和雨伞中为我塑造的方式。 即便如此,她也意识到了男女之间的差异,现在只是她已经学会了这种奇怪的社会规范。 这里的重点是,幼儿可以毫不费力地区分性别。 大自然给了我们一个内置的方程式,通过高度精确的一瞥来对性别进行分类。 想象一下类似的机制来区分好的和坏的借款人。 你在谈论每个银行家的梦想。 然而,进化训练我们交配不放贷。

Predictive Analytics: Classification Problem – by Roopam
As I have mentioned in the previous article, scorecards have their roots in the classification problem in statistics and data mining. The idea with most classification problems is to create a mathematical equation to distinguish dichotomous variables. These variables can only take two values such as
• Male/ Female
• Good / Bad
• Yes / No
• God / Devil
• Happy / Sad
• Sales / No Sales
The list can go on until eternity. The reason why most business problems try to model dichotomies is that it is easy to comprehend for us humans. We must appreciate that dichotomies are never absolute and have degrees attached to them. For example, I am 80% good and 20% bad – at least I would like to believe this. I shall keep Pareto’s 80-20 principle away from this i.e. my 20% bad is responsible for my 80% of behavior.
正如我在上一篇文章中提到的,记分卡的根源在于统计和数据挖掘中的分类问题。 大多数分类问题的想法是创建一个数学方程来区分二分变量。 这些变量只能采用两个值,例如
•男/女
• 好坏
•是/否
•上帝/魔鬼
•快乐/悲伤
•销售/无销售
这份清单可以持续到永恒。 大多数商业问题试图模拟二分法的原因是它很容易理解我们人类。 我们必须明白,二分法从来都不是绝对的,是有度的。 例如,我80%好,20%坏 - 至少我想相信这一点。 我将保持帕累托的80-20原则远离这一点,即我的20%不好对我80%的行为负责。
Credit Scorecards Development – Problem Statement & Sampling(坏客户定义是灵活的)
In the case of credit scorecards, the problem statement is to distinguish analytically between the good and bad borrowers. Hence, the first task is to define a good and a bad borrower. For most loan products, good and bad credit is defined in the following way
1. Good loan: never or once missed on the EMI payment
2. Bad loan: ever missed 3 consecutive EMIs in a row (i.e. 90 days-past-due)
Additionally, for tagging someone good or bad, you need to observe his or her behavior for a significant length of time. This length of time varies from product to product based on the tenor of the loan. For home loans, with a tenor of 20 years, 2-3 years is a reasonable observation period.
However, there is nothing sacrosanct about the above definition and can be modified at the discretion of the analyst. Roll-rate analysis and vintage analysis are the two analytical tools you may want to consider while constructing the above definition.
信用记分卡开发 - 问题陈述和抽样
在信用记分卡的情况下,问题陈述是在好的和坏的借款人之间进行分析。因此,第一个任务是定义一个好的和坏的借款人。对于大多数贷款产品,信用良好和不良以下列方式定义
1.良好的贷款:永远或曾一次逾期
2.不良贷款:连续3次错过EMI(即90天过期)
此外,为了标记好人或坏人,你需要在很长一段时间内观察他或她的行为。根据贷款期限,这段时间因产品而异。对于房屋贷款,期限为20年,2 - 3年是合理的观察期。
但是,对于上述定义没有什么神圣不可侵犯的,可以由分析师自行决定修改。滚动率分析和复古分析是您在构建上述定义时可能需要考虑的两种分析工具。
Sampling Strategy for Credit Scorecards
A few years ago, I did a daylong workshop on Statistical Inference for a large German shipping & cargo company in Mumbai. At the time of Q&A session the Vice President of operations asked a tricky question, what is a good sample size to achieve good precision? He was looking for a one-size-fits-all answer and I wish it were that simple. The sample size depends on the degree of similarity or homogeneity of the population in question. For example, what do you think is a good sample size to answer the following two questions?
1. What is the salinity of the Pacific Ocean?
2. Is there another planet with intelligent life in the Universe?
In terms of population size, a number of drops in the ocean and planets in the Universe is similar. A couple of drops of water are enough to answer the first question since the salinity of oceans is fairly constant. On the other hand, the second question is a black swan problem. You may need to visit every single planet to rule our possibility of an intelligent form of life.
For credit scorecard development, the accepted rule of thumb for sample size is at least 1000 records of both good and bad loans. There is no reason why you cannot build a scorecard with a smaller sample size (say 500 records). However, the analyst needs to be cautious in doing so because a higher degree of randomness creeps in a small data sample. Additionally, it is also advisable to keep the sample window as short as possible i.e. a financial quarter or two while scorecard development. Further, the sample is divided into two pieces – usually, 70 % for development and remaining for validation sample. We discuss the development and validation sample in detail in the subsequent sections of this series.
信用记分卡的抽样策略
几年前,我为孟买的一家大型德国航运和货运公司举办了为期一天的统计推断研讨会。在问答环节时,运营副总裁提出了一个棘手的问题,即获得良好精度的样本量是多少?他正在寻找一个通用的答案,我希望它很简单。样本量取决于所讨论的群体的相似程度或同质性。例如,您认为回答以下两个问题的样本量是多少?
1.太平洋的盐度是多少?
2.宇宙中还有另一个拥有智慧生命的星球吗?
就人口规模而言,宇宙中海洋和行星的数量下降是相似的。由于海洋的盐度相当稳定,几滴水足以回答第一个问题。另一方面,第二个问题是黑天鹅问题。您可能需要访问每个星球来统治我们生活的智能生活的可能性。
对于信用记分卡开发,样本大小的公认经验法则是至少1000个好的和坏的贷款记录。没有理由不能建立样本量较小的记分卡(比如500条记录)。但是,分析师需要谨慎行事,因为较小程度的随机性会在小数据样本中蔓延。此外,还建议尽可能缩短样本窗口,即在记分卡开发时用一个或两个季度数据。此外,样品分为两部分 - 通常70%用于显影,剩余用于验证样品。我们将在本系列的后续章节中详细讨论开发和验证示例。

Credit Scorecard Development: Sampling Strategy – by Roopam
Sign-off Note
In the next article, we will discuss an important topic of variables classing and coarse classing for credit scorecards. See you soon

信用评分卡 (part 2of 7)的更多相关文章
- 信用评分卡(A卡/B卡/C卡)的模型简介及开发流程|干货
https://blog.csdn.net/varyall/article/details/81173326 如今在银行.消费金融公司等各种贷款业务机构,普遍使用信用评分,对客户实行打分制,以期对客户 ...
- 信用评分卡 (part 7 of 7)
python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...
- 信用评分卡 (part 6 of 7)
python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...
- 信用评分卡 (part 5 of 7)
python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...
- 信用评分卡 (part 4 of 7)
python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...
- 信用评分卡 (part 3of 7)
python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...
- 信用评分卡 (part 1 of 7)
python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...
- 信用评分卡Credit Scorecards (1-7)
欢迎关注博主主页,学习python视频资源,还有大量免费python经典文章 python风控评分卡建模和风控常识 https://study.163.com/course/introductio ...
- python德国信用评分卡建模(附代码AAA推荐)
欢迎关注博主主页,学习python视频资源,还有大量免费python经典文章 python信用评分卡建模视频系列教程(附代码) 博主录制 https://study.163.com/course/i ...
随机推荐
- Luogu1137 旅行计划(拓扑排序)
题目传送门 拓扑排序板子题,模拟即可. 代码 #include<cstdio> #include<iostream> #include<cmath> #includ ...
- pysphere VMware控制模块的一些函数的说明
对于虚拟机的操作获得虚拟机对象 当你正常连接了服务器后,你就可以使用以下两种方式来得到虚拟机对象. get_vm_by_path get_vm_by_name 虚拟机路径可以从虚拟机右键信息中的”Ed ...
- 数据库 -- mysql支持的数据类型
mysql支持的数据类型 数值类型 MySQL支持所有标准SQL数值数据类型. 这些类型包括严格数值数据类型(INTEGER.SMALLINT.DECIMAL和NUMERIC),以及近似数值数据类型( ...
- Codeforces1101F Trucks and Cities 【滑动窗口】【区间DP】
题目分析: 2500的题目为什么我想了这么久... 考虑答案是什么.对于一辆从$s$到$t$的车,它有$k$次加油的机会.可以发现实际上是将$s$到$t$的路径以城市为端点最多划分为最大长度最小的$k ...
- P2521 [HAOI2011]防线修建
题目链接:P2521 [HAOI2011]防线修建 题意:给定点集 每次有两种操作: 1. 删除一个点 (除开(0, 0), (n, 0), 与指定首都(x, y)) 2. 询问上凸包长度 至于为什么 ...
- 自学Python4.3-装饰器固定格式
自学Python之路-Python基础+模块+面向对象自学Python之路-Python网络编程自学Python之路-Python并发编程+数据库+前端自学Python之路-django 自学Pyth ...
- mac上安装ta-lib
Now I am ready to start installing TA-Lib. Generally I followed the steps listed in here. 1. Install ...
- SP8791 DYNALCA - Dynamic LCA 解题报告
SP8791 DYNALCA - Dynamic LCA 有一个森林最初由 \(n (1 \le n \le 100000)\) 个互不相连的点构成 你需要处理以下操作: link A B:添加从顶点 ...
- bandwagon host
104.20.6.63 bandwagonhost.com 104.20.6.63 bwh1.net
- Keepalived+Nginx搭建主从高可用并带nginx检测
应用环境:部分时候,WEB访问量一般,或者测试使用,利用Keepalived给Nginx做高可用即可满足要求. 测试环境: 搭建步骤: 1. 安装软件 在Nginx-A和Nginx-B上: ~]# ...