Logistic Regression：银行贷款申请审批实例

问题定义

这是一个贷款的审批问题，假设你是一个银行的贷款审批员，现在有客户需要一定额度的贷款，他们填写了个人的信息（信息在datas.txt中给出），你需要根据他们的信息，建立一个分类模型，判断是否可以给他们贷款。

请根据所给的信息，建立分类模型，评价模型，同时将模型建立过程简单介绍一下，同时对各特征进行简单的解释说明。

Dataset

用户id，年龄，性别，申请金额，职业类型，教育程度，婚姻状态，房屋类型，户口类型，贷款用途，公司类型，薪水，贷款标记：0不放贷，1同意放贷

Data preprocessing

在对数据进行建模时，用户ID是没有用的。在描述用户信息的几个维度数据中，年龄，申请金额，薪水是连续值，剩下的是离散值。

通过观察发现有些数据存在数据缺失的情况，需要对这些数据进行处理，比如直接删除或者通过缺失值补全。

The Logit Function

The Logistic Regression

Model Data

 #逻辑回归模型

 #对银行客户是否放贷进行分类

 import pandas

 import numpy

 import matplotlib.pyplot as plt

 from sklearn.linear_model import  LogisticRegression

 from sklearn.metrics import roc_curve, roc_auc_score

 data = pandas.read_csv("datas.csv")

 data = data.dropna()

 # Randomly shuffle our data for the training and test set

 admissions = data.loc[numpy.random.permutation(data.index)]

 # train with 700 and test with the following 300, split dataset

 num_train = 14968

 data_train = admissions[:num_train]

 data_test = admissions[num_train:]

 # Fit Logistic regression to admit with features using the training set

 logistic_model = LogisticRegression()

 logistic_model.fit(data_train[['Age','Gender','AppAmount','Occupation',

                                'Education','Marital','Property','Residence',

                                'LoanUse','Company','Salary']], data_train['Label'])

 # Print the Models Coefficients

 print(logistic_model.coef_)

 # .predict() using a threshold of 0.50 by default

 predicted = logistic_model.predict(data_train[['Age','Gender','AppAmount','Occupation',

                                'Education','Marital','Property','Residence',

                                'LoanUse','Company','Salary']])

 # The average of the binary array will give us the accuracy

 accuracy_train = (predicted == data_train['Label']).mean()

 # Print the accuracy

 print("Accuracy in Training Set = {s}".format(s=accuracy_train))

 # Predicted to be admitted

 predicted = logistic_model.predict(data_test[['Age','Gender','AppAmount','Occupation',

                                'Education','Marital','Property','Residence',

                                'LoanUse','Company','Salary']])

 # What proportion of our predictions were true

 accuracy_test = (predicted == data_test['Label']).mean()

 print("Accuracy in Test Set = {s}".format(s=accuracy_test))

 # Predict the chance of label from those in the training set

 train_probs = logistic_model.predict_proba(data_train[['Age','Gender','AppAmount','Occupation',

                                'Education','Marital','Property','Residence',

                                'LoanUse','Company','Salary']])[:,1]

 test_probs = logistic_model.predict_proba(data_test[['Age','Gender','AppAmount','Occupation',

                                'Education','Marital','Property','Residence',

                                'LoanUse','Company','Salary']])[:,1]

 # Compute auc for training set

 auc_train = roc_auc_score(data_train["Label"], train_probs)

 # Compute auc for test set

 auc_test = roc_auc_score(data_test["Label"], test_probs)

 # Difference in auc values

 auc_diff = auc_train - auc_test

 # Compute ROC Curves

 roc_train = roc_curve(data_train["Label"], train_probs)

 roc_test = roc_curve(data_test["Label"], test_probs)

 # Plot false positives by true positives

 plt.plot(roc_train[0], roc_train[1])

 plt.plot(roc_test[0], roc_test[1])

Logistic Regression：银行贷款申请审批实例的更多相关文章

Logistic Regression vs Decision Trees vs SVM: Part II
This is the 2nd part of the series. Read the first part here: Logistic Regression Vs Decision Trees ...
统计学习方法笔记 Logistic regression
logistic distribution 设X是连续随机变量,X服从逻辑斯谛分布是指X具有下列分布函数和密度函数: 式中,μ为位置参数,γ>0为形状参数. 密度函数是脉冲函数分布函数是一条S ...
Machine Learning - 第3周（Logistic Regression、Regularization）
Logistic regression is a method for classifying data into discrete outcomes. For example, we might u ...
【机器学习】Octave 实现逻辑回归 Logistic Regression
ex2data1.txt ex2data2.txt 本次算法的背景是,假如你是一个大学的管理者,你需要根据学生之前的成绩(两门科目)来预测该学生是否能进入该大学. 根据题意,我们不难分辨出这是一种二分 ...
机器学习之LinearRegression与Logistic Regression逻辑斯蒂回归(三)
一评价尺度 sklearn包含四种评价尺度 1 均方差(mean-squared-error) 2 平均绝对值误差(mean_absolute_error) 3 可释方差得分(explained_v ...
Python机器学习算法 — 逻辑回归（Logistic Regression）
逻辑回归--简介逻辑回归(Logistic Regression)就是这样的一个过程:面对一个回归或者分类问题,建立代价函数,然后通过优化方法迭代求解出最优的模型参数,然后测试验证我们这个求解的模型 ...
通俗地说逻辑回归【Logistic regression】算法（二）sklearn逻辑回归实战
前情提要: 通俗地说逻辑回归[Logistic regression]算法(一) 逻辑回归模型原理介绍上一篇主要介绍了逻辑回归中,相对理论化的知识,这次主要是对上篇做一点点补充,以及介绍sklear ...
机器学习-非线性回归(Logistic Regression)及应用
1. 概率 1.1 定义:概率(Probability):对一件事情发生的可能性的衡量. 1.2 范围:0 <= P <= 1 1.3 计算方法: 1.3.1 根据个人置信 1.3.2 根 ...
机器学习——逻辑回归（Logistic Regression）
1 前言虽然该机器学习算法名字里面有"回归",但是它其实是个分类算法.取名逻辑回归主要是因为是从线性回归转变而来的. logistic回归,又叫对数几率回归. 2 回归模型 2. ...

随机推荐

Kafka发送消息失败原因
Kafka发送消息方法如下: Properties properties = new Properties(); properties.put("zookeeper.connect" ...
2016 ACM/ICPC Asia Regional Qingdao Online 1005 Balanced Game
Time Limit: 3000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)Total Submission( ...
markdown 自定义一个锚点
//自定义锚点 s "m[": function mlink( text ) { var orig = String(text); // Inline content is pos ...
Openjudge-NOI题库-垂直直方图
题目描述 Description 写一个程序从输入文件中去读取四行大写字母(全都是大写的,每行不超过72个字符),然后用柱状图输出每个字符在输入文件中出现的次数.严格地按照输出样例来安排你的输出格式. ...
Style绑定
目的 style绑定可以添加或者移除DOM元素的样式值.这非常有用,例如,当值为负数时将颜色变为红色. (注:如果要修改CSS整个类,请使用css绑定) <div data-bind=" ...
Eclipse主题设置
1. 内部编辑区域主题 Eclipse黑色主题包下载主题包解压到Eclipse安装目录下的dropins目录,重启Eclipse,Windows—>Preferences—>Genera ...
for clean
static(修饰变量方,法:静态块:静态内部类:静态导包) final transient 作用foreach 原理 volatile底层原理实现集合LIST MAP SET 实现类的底层原理优, ...
we are happy 把空格换成 %20 剑指offer P44
public class StringReplace { public static void replaceSpace(String[] str, int length) { if(str == n ...
QT自绘标题和边框
在QT中如果想要自绘标题和边框,一般步骤是: 1) 在创建窗口前设置Qt::FramelessWindowHint标志,设置该标志后会创建一个无标题.无边框的窗口. 2)在客户区域的顶部创建一个自绘标 ...
jQuery 3D canvas 旋转木马(跑马灯)效果插件 - cloud carousel
插件名称-cloud carousel 最新版本-1.0.5 支持ie6-ie9,firefox,chrome,opera,safari等 1.引入jquery1.4.2.js 和CloudCarou ...