Python 建模步骤
#%%
#载入数据 、查看相关信息
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder print('第一步:加载、查看数据') file_path = r'D:\train\201905data\liwang.csv' band_data = pd.read_csv(file_path,encoding='UTF-8') band_data.info() band_data.shape #%%
#
print('第二步:清洗、处理数据,某些数据可以使用数据库处理数据代替') #数据清洗:缺失值处理:丢去、
#查看缺失值
band_data.isnull().sum band_data = band_data.dropna()
#band_data = band_data.drop(['state'],axis=1)
# 去除空格
band_data['voice_mail_plan'] = band_data['voice_mail_plan'].map(lambda x: x.strip())
band_data['intl_plan'] = band_data['intl_plan'].map(lambda x: x.strip())
band_data['churned'] = band_data['churned'].map(lambda x: x.strip())
band_data['voice_mail_plan'] = band_data['voice_mail_plan'].map({'no':0, 'yes':1})
band_data.intl_plan = band_data.intl_plan.map({'no':0, 'yes':1}) for column in band_data.columns:
if band_data[column].dtype == type(object):
le = LabelEncoder()
band_data[column] = le.fit_transform(band_data[column]) #band_data = band_data.drop(['phone_number'],axis=1)
#band_data['churned'] = band_data['churned'].replace([' True.',' False.'],[1,0])
#band_data['intl_plan'] = band_data['intl_plan'].replace([' yes',' no'],[1,0])
#band_data['voice_mail_plan'] = band_data['voice_mail_plan'].replace([' yes',' no'],[1,0]) #%%
# 模型 [重复、调优]
print('第三步:选择、训练模型') x = band_data.drop(['churned'],axis=1)
y = band_data['churned'] from sklearn import model_selection
train,test,t_train,t_test = model_selection.train_test_split(x,y,test_size=0.3,random_state=1) from sklearn import tree
model = tree.DecisionTreeClassifier(max_depth=2)
model.fit(train,t_train) fea_res = pd.DataFrame(x.columns,columns=['features'])
fea_res['importance'] = model.feature_importances_ t_name= band_data['churned'].value_counts()
t_name.index import graphviz import os
os.environ["PATH"] += os.pathsep + r'D:\software\developmentEnvironment\graphviz-2.38\release\bin' dot_data= tree.export_graphviz(model,out_file=None,feature_names=x.columns,max_depth=2,
class_names=t_name.index.astype(str),
filled=True, rounded=True,
special_characters=False)
graph = graphviz.Source(dot_data)
#graph
graph.render("dtr") #%%
print('第四步:查看、分析模型') #结果预测
res = model.predict(test) #混淆矩阵
from sklearn.metrics import confusion_matrix
confmat = confusion_matrix(t_test,res)
print(confmat) #分类指标 https://blog.csdn.net/akadiao/article/details/78788864
from sklearn.metrics import classification_report
print(classification_report(t_test,res)) #%%
print('第五步:保存模型') from sklearn.externals import joblib
joblib.dump(model,r'D:\train\201905data\mymodel.model') #%%
print('第六步:加载新数据、使用模型')
file_path_do = r'D:\train\201905data\do_liwang.csv' deal_data = pd.read_csv(file_path_do,encoding='UTF-8') #数据清洗:缺失值处理 deal_data = deal_data.dropna()
deal_data['voice_mail_plan'] = deal_data['voice_mail_plan'].map(lambda x: x.strip())
deal_data['intl_plan'] = deal_data['intl_plan'].map(lambda x: x.strip())
deal_data['churned'] = deal_data['churned'].map(lambda x: x.strip())
deal_data['voice_mail_plan'] = deal_data['voice_mail_plan'].map({'no':0, 'yes':1})
deal_data.intl_plan = deal_data.intl_plan.map({'no':0, 'yes':1}) for column in deal_data.columns:
if deal_data[column].dtype == type(object):
le = LabelEncoder()
deal_data[column] = le.fit_transform(deal_data[column])
#数据清洗 #加载模型
model_file_path = r'D:\train\201905data\mymodel.model'
deal_model = joblib.load(model_file_path)
#预测
res = deal_model.predict(deal_data.drop(['churned'],axis=1)) #%%
print('第七步:执行模型,提供数据')
result_file_path = r'D:\train\201905data\result_liwang.csv' deal_data.insert(1,'pre_result',res)
deal_data[['state','pre_result']].to_csv(result_file_path,sep=',',index=True,encoding='UTF-8')
Python 建模步骤的更多相关文章
- Python学习步骤如何安排?
一.清楚学习目标 无论是学习什么知识,都要有一个对学习目标的清楚认识. 只有这样才能朝着目标持续前进,少走弯路,从学习中得到不断的提升,享受python学习计划的过程. 二.基本python 知识学习 ...
- Linux系统下升级Python版本步骤(suse系统)
Linux系统下升级Python版本步骤(suse系统) http://blog.csdn.net/lifengling1234/article/details/53536493
- 决策树python建模中的坑 :ValueError: Expected 2D array, got 1D array instead:
决策树python建模中的坑 代码 #coding=utf-8 from sklearn.feature_extraction import DictVectorizerimport csvfrom ...
- odoo 14 python 单元测试步骤
# odoo 14 python 单元测试步骤 # 一.在模块根目录创建tests目录 # 二.在tests目录下创建__init__.py文件 # 三.继承TransactionCase(Singl ...
- 逻辑回归--美国挑战者号飞船事故_同盾分数与多头借贷Python建模实战
python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...
- Python机器学习步骤
推荐学习顺序 学习机器学习得有个步骤, 下面大家就能按照自己所需, 来探索这个网站. 图中请找到 "Start", 然后依次沿着箭头, 看看有没有不了解/没学过的地方, 接着, 就 ...
- 正态分布-python建模
sklearn实战-乳腺癌细胞数据挖掘 https://study.163.com/course/introduction.htm?courseId=1005269003&utm_campai ...
- T分布在医药领域应用-python建模
sklearn实战-乳腺癌细胞数据挖掘 https://study.163.com/course/introduction.htm?courseId=1005269003&utm_campai ...
- 下载及安装Python详细步骤
安装python分三个步骤: *下载python *安装python *检查是否安装成功 1.下载Python (1)python下载地址https://www.python.org/download ...
随机推荐
- Avito Cool Challenge 2018-B. Farewell Party(思维)
time limit per test 1 second memory limit per test 256 megabytes input standard input output standar ...
- hdu2466-Shell Pyramid
先预处理一下层和行所对应的数,然后二分三个答案,注意细节 #include<cstdio> #define inf 0x3f3f3f3f ; typedef __int64 LL; u ...
- CodeForces 731C C - Socks 并查集
Description Arseniy is already grown-up and independent. His mother decided to leave him alone for m ...
- Kaggle八门神器(一):竞赛神器之XGBoost介绍
Xgboost为一个十分有效的机器学习模型,在各种竞赛中均可以看到它的身影,同时Xgboost在工业届也有着广泛的应用,本文以Titanic数据集为研究对象,简单地探究Xgboost模型建模过程,同时 ...
- 《从0到1学习Flink》—— 如何自定义 Data Source ?
前言 在 <从0到1学习Flink>-- Data Source 介绍 文章中,我给大家介绍了 Flink Data Source 以及简短的介绍了一下自定义 Data Source,这篇 ...
- HTML5利用FormData对象实现显示进度条的文件上传
摘自:https://blog.csdn.net/q1056843325/article/details/53759963 自己做是按这个实现的,兼容性还不错 完整简约的解决方案 下面的代码清单是包括 ...
- IQueryable和IEnumerable的区别
- 【踩坑】Nginx上配置ssl证书实现https访问
昨天开始为域名挂上ssl证书,使得可以以https去访问服务器.按照网上所介绍的配置Nginx,然而一直访问不了网站. 第二天排查了一早上,发现不单要配置Nginx,阿里云上安全组要开启443端口,并 ...
- 【Design Patterns】监听器模式
监听器模式 监听器模式其实是观察者模式中的一种,两者都有关于回调的设计. 与观察者模式不同的是,观察者模式中存在的角色为观察者(Observer)和被观察者(Observable) 而监听器模式中存在 ...
- equals()方法详解
Java语言中equals()方法的使用可以说比较的频繁,但是如果轻视equals()方法,一些意想不到的错误就会产生.哈哈,说的有点严重了~ 先谈谈equals()方法的出身.equals()方法在 ...