信用评分卡 (part 2of 7)
python信用评分卡(附代码,博主录制)

统计和数据挖掘中分类问题
Classification Problem in Statistics & Data Mining
I must say I was shocked when Amishi, a girl little over three years old, announced that going forward she is only friends with my wife and not me. Her reason for the breakup was that I am a boy and girls can only be friends with girls. She has learned this social norm from her friends at the preschool. I still remember the way she modeled for me in her swimsuit and umbrella just a few months ago. She was aware of the boy-girl difference even then, it is just she has learned this weird social norm now. The point over here is that toddlers can distinguish genders without much effort. Nature has given us a built-in equation to classify gender through a mere glance with a high degree of precision. Imagine a similar mechanism to distinguish between good and bad borrowers. You are talking about every banker’s dream. However, evolution has trained us to mate not to lend.
我必须说,当三十岁的女孩Amishi宣布前进时,她只是与我的妻子而不是我的朋友,我感到震惊。 分手的原因是我是男孩,女孩只能是女孩的朋友。 她从幼儿园的朋友那里学到了这种社会规范。 几个月前,我还记得她在泳衣和雨伞中为我塑造的方式。 即便如此,她也意识到了男女之间的差异,现在只是她已经学会了这种奇怪的社会规范。 这里的重点是,幼儿可以毫不费力地区分性别。 大自然给了我们一个内置的方程式,通过高度精确的一瞥来对性别进行分类。 想象一下类似的机制来区分好的和坏的借款人。 你在谈论每个银行家的梦想。 然而,进化训练我们交配不放贷。

Predictive Analytics: Classification Problem – by Roopam
As I have mentioned in the previous article, scorecards have their roots in the classification problem in statistics and data mining. The idea with most classification problems is to create a mathematical equation to distinguish dichotomous variables. These variables can only take two values such as
• Male/ Female
• Good / Bad
• Yes / No
• God / Devil
• Happy / Sad
• Sales / No Sales
The list can go on until eternity. The reason why most business problems try to model dichotomies is that it is easy to comprehend for us humans. We must appreciate that dichotomies are never absolute and have degrees attached to them. For example, I am 80% good and 20% bad – at least I would like to believe this. I shall keep Pareto’s 80-20 principle away from this i.e. my 20% bad is responsible for my 80% of behavior.
正如我在上一篇文章中提到的,记分卡的根源在于统计和数据挖掘中的分类问题。 大多数分类问题的想法是创建一个数学方程来区分二分变量。 这些变量只能采用两个值,例如
•男/女
• 好坏
•是/否
•上帝/魔鬼
•快乐/悲伤
•销售/无销售
这份清单可以持续到永恒。 大多数商业问题试图模拟二分法的原因是它很容易理解我们人类。 我们必须明白,二分法从来都不是绝对的,是有度的。 例如,我80%好,20%坏 - 至少我想相信这一点。 我将保持帕累托的80-20原则远离这一点,即我的20%不好对我80%的行为负责。
Credit Scorecards Development – Problem Statement & Sampling(坏客户定义是灵活的)
In the case of credit scorecards, the problem statement is to distinguish analytically between the good and bad borrowers. Hence, the first task is to define a good and a bad borrower. For most loan products, good and bad credit is defined in the following way
1. Good loan: never or once missed on the EMI payment
2. Bad loan: ever missed 3 consecutive EMIs in a row (i.e. 90 days-past-due)
Additionally, for tagging someone good or bad, you need to observe his or her behavior for a significant length of time. This length of time varies from product to product based on the tenor of the loan. For home loans, with a tenor of 20 years, 2-3 years is a reasonable observation period.
However, there is nothing sacrosanct about the above definition and can be modified at the discretion of the analyst. Roll-rate analysis and vintage analysis are the two analytical tools you may want to consider while constructing the above definition.
信用记分卡开发 - 问题陈述和抽样
在信用记分卡的情况下,问题陈述是在好的和坏的借款人之间进行分析。因此,第一个任务是定义一个好的和坏的借款人。对于大多数贷款产品,信用良好和不良以下列方式定义
1.良好的贷款:永远或曾一次逾期
2.不良贷款:连续3次错过EMI(即90天过期)
此外,为了标记好人或坏人,你需要在很长一段时间内观察他或她的行为。根据贷款期限,这段时间因产品而异。对于房屋贷款,期限为20年,2 - 3年是合理的观察期。
但是,对于上述定义没有什么神圣不可侵犯的,可以由分析师自行决定修改。滚动率分析和复古分析是您在构建上述定义时可能需要考虑的两种分析工具。
Sampling Strategy for Credit Scorecards
A few years ago, I did a daylong workshop on Statistical Inference for a large German shipping & cargo company in Mumbai. At the time of Q&A session the Vice President of operations asked a tricky question, what is a good sample size to achieve good precision? He was looking for a one-size-fits-all answer and I wish it were that simple. The sample size depends on the degree of similarity or homogeneity of the population in question. For example, what do you think is a good sample size to answer the following two questions?
1. What is the salinity of the Pacific Ocean?
2. Is there another planet with intelligent life in the Universe?
In terms of population size, a number of drops in the ocean and planets in the Universe is similar. A couple of drops of water are enough to answer the first question since the salinity of oceans is fairly constant. On the other hand, the second question is a black swan problem. You may need to visit every single planet to rule our possibility of an intelligent form of life.
For credit scorecard development, the accepted rule of thumb for sample size is at least 1000 records of both good and bad loans. There is no reason why you cannot build a scorecard with a smaller sample size (say 500 records). However, the analyst needs to be cautious in doing so because a higher degree of randomness creeps in a small data sample. Additionally, it is also advisable to keep the sample window as short as possible i.e. a financial quarter or two while scorecard development. Further, the sample is divided into two pieces – usually, 70 % for development and remaining for validation sample. We discuss the development and validation sample in detail in the subsequent sections of this series.
信用记分卡的抽样策略
几年前,我为孟买的一家大型德国航运和货运公司举办了为期一天的统计推断研讨会。在问答环节时,运营副总裁提出了一个棘手的问题,即获得良好精度的样本量是多少?他正在寻找一个通用的答案,我希望它很简单。样本量取决于所讨论的群体的相似程度或同质性。例如,您认为回答以下两个问题的样本量是多少?
1.太平洋的盐度是多少?
2.宇宙中还有另一个拥有智慧生命的星球吗?
就人口规模而言,宇宙中海洋和行星的数量下降是相似的。由于海洋的盐度相当稳定,几滴水足以回答第一个问题。另一方面,第二个问题是黑天鹅问题。您可能需要访问每个星球来统治我们生活的智能生活的可能性。
对于信用记分卡开发,样本大小的公认经验法则是至少1000个好的和坏的贷款记录。没有理由不能建立样本量较小的记分卡(比如500条记录)。但是,分析师需要谨慎行事,因为较小程度的随机性会在小数据样本中蔓延。此外,还建议尽可能缩短样本窗口,即在记分卡开发时用一个或两个季度数据。此外,样品分为两部分 - 通常70%用于显影,剩余用于验证样品。我们将在本系列的后续章节中详细讨论开发和验证示例。

Credit Scorecard Development: Sampling Strategy – by Roopam
Sign-off Note
In the next article, we will discuss an important topic of variables classing and coarse classing for credit scorecards. See you soon

信用评分卡 (part 2of 7)的更多相关文章
- 信用评分卡(A卡/B卡/C卡)的模型简介及开发流程|干货
https://blog.csdn.net/varyall/article/details/81173326 如今在银行.消费金融公司等各种贷款业务机构,普遍使用信用评分,对客户实行打分制,以期对客户 ...
- 信用评分卡 (part 7 of 7)
python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...
- 信用评分卡 (part 6 of 7)
python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...
- 信用评分卡 (part 5 of 7)
python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...
- 信用评分卡 (part 4 of 7)
python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...
- 信用评分卡 (part 3of 7)
python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...
- 信用评分卡 (part 1 of 7)
python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...
- 信用评分卡Credit Scorecards (1-7)
欢迎关注博主主页,学习python视频资源,还有大量免费python经典文章 python风控评分卡建模和风控常识 https://study.163.com/course/introductio ...
- python德国信用评分卡建模(附代码AAA推荐)
欢迎关注博主主页,学习python视频资源,还有大量免费python经典文章 python信用评分卡建模视频系列教程(附代码) 博主录制 https://study.163.com/course/i ...
随机推荐
- [离散时间信号处理学习笔记] 10. z变换与LTI系统
我们前面讨论了z变换,其实也是为了利用z变换分析LTI系统. 利用z变换得到LTI系统的单位脉冲响应 对于用差分方程描述的LTI系统而言,z变换将十分有用.有如下形式的差分方程: $\displays ...
- 洛谷 P2921 [USACO08DEC]在农场万圣节Trick or Treat on the Farm
题目描述 每年,在威斯康星州,奶牛们都会穿上衣服,收集农夫约翰在N(1<=N<=100,000)个牛棚隔间中留下的糖果,以此来庆祝美国秋天的万圣节. 由于牛棚不太大,FJ通过指定奶牛必须遵 ...
- Codeforces1071C Triple Flips 【构造】【Four Russians】
题目分析: 这种题目显然可以先考虑哪些无解.我们发现我们不考虑操作次数的时候,我们可以选择连续的三个进行异或操作. 这样我们总能使得一个序列转化为$000...000xy$的形式.换句话说,对于$00 ...
- LOJ2269 [SDOI2017] 切树游戏 【FWT】【动态DP】【树链剖分】【线段树】
题目分析: 好题.本来是一道好的非套路题,但是不凑巧的是当年有一位国家集训队员正好介绍了这个算法. 首先考虑静态的情况.这个的DP方程非常容易写出来. 接着可以注意到对于异或结果的计数可以看成一个FW ...
- 【CTSC2017】【BZOJ4903】吉夫特 卢卡斯定理 DP
题目描述 给你一个长度为\(n\)的数列\(a\),求有多少个长度\(\geq 2\)的不上升子序列\(a_{b_1},a_{b_2},\ldots,a_{b_k}\)满足 \[ \prod_{i=2 ...
- 【BZOJ2144】Throw 数论
题目大意 给你三个数\(a,b,c\),每次你可以选择一个数\(s_1\),再选择一个数\(s_2\),把\(s_1\)变成\(2s_2-s_1\),但要求\(s_3\)不在\(s_1\)到\(2s_ ...
- 【BZOJ2655】calc DP 数学 拉格朗日插值
题目大意 一个序列\(a_1,\ldots,a_n\)是合法的,当且仅当: 长度为给定的\(n\). \(a_1,\ldots,a_n\)都是\([1,m]\)中的整数. \(a_1, ...
- 【刷题】BZOJ 1413 [ZJOI2009]取石子游戏
Description 在研究过Nim游戏及各种变种之后,Orez又发现了一种全新的取石子游戏,这个游戏是这样的: 有n堆石子,将这n堆石子摆成一排.游戏由两个人进行,两人轮流操作,每次操作者都可以从 ...
- mysql 单表卡死
由于单表数据量过大导致的更新操作处于卡死状态,无法打开也无法修改. 此时需要命令行模式连接数据库,注意点:此处连接需要相同的账号 1. $ SHOW PROCESSLIST; 2. $ kill 37 ...
- hdu 1907 (尼姆博弈)
题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=1907 Problem Description Little John is playing very ...