Python 建模步骤

#%%

#载入数据 、查看相关信息

import pandas as pd

import numpy as np

from  sklearn.preprocessing import LabelEncoder

print('第一步：加载、查看数据')

file_path = r'D:\train\201905data\liwang.csv'

band_data = pd.read_csv(file_path,encoding='UTF-8')

band_data.info()

band_data.shape

#%%

#

print('第二步：清洗、处理数据，某些数据可以使用数据库处理数据代替')

#数据清洗:缺失值处理：丢去、

#查看缺失值

band_data.isnull().sum

band_data = band_data.dropna()

#band_data = band_data.drop(['state'],axis=1)

# 去除空格

band_data['voice_mail_plan'] = band_data['voice_mail_plan'].map(lambda x: x.strip())

band_data['intl_plan'] = band_data['intl_plan'].map(lambda x: x.strip())

band_data['churned'] = band_data['churned'].map(lambda x: x.strip())

band_data['voice_mail_plan'] = band_data['voice_mail_plan'].map({'no':0, 'yes':1})

band_data.intl_plan = band_data.intl_plan.map({'no':0, 'yes':1})

for column in band_data.columns:

    if band_data[column].dtype == type(object):

        le = LabelEncoder()

        band_data[column] = le.fit_transform(band_data[column])

#band_data = band_data.drop(['phone_number'],axis=1)

#band_data['churned'] = band_data['churned'].replace([' True.',' False.'],[1,0])

#band_data['intl_plan'] = band_data['intl_plan'].replace([' yes',' no'],[1,0])

#band_data['voice_mail_plan'] = band_data['voice_mail_plan'].replace([' yes',' no'],[1,0])

#%%

# 模型  [重复、调优]

print('第三步：选择、训练模型')

x = band_data.drop(['churned'],axis=1)

y = band_data['churned']

from sklearn import model_selection

train,test,t_train,t_test = model_selection.train_test_split(x,y,test_size=0.3,random_state=1)

from sklearn import tree

model = tree.DecisionTreeClassifier(max_depth=2)

model.fit(train,t_train)

fea_res = pd.DataFrame(x.columns,columns=['features'])

fea_res['importance'] = model.feature_importances_

t_name= band_data['churned'].value_counts()

t_name.index

import graphviz

import os

os.environ["PATH"] += os.pathsep + r'D:\software\developmentEnvironment\graphviz-2.38\release\bin'

dot_data= tree.export_graphviz(model,out_file=None,feature_names=x.columns,max_depth=2,

                         class_names=t_name.index.astype(str),

                         filled=True, rounded=True,

                         special_characters=False)

graph = graphviz.Source(dot_data)

#graph

graph.render("dtr")

#%%

print('第四步：查看、分析模型')

#结果预测

res = model.predict(test)

#混淆矩阵

from sklearn.metrics import confusion_matrix

confmat = confusion_matrix(t_test,res)

print(confmat)

#分类指标 https://blog.csdn.net/akadiao/article/details/78788864

from sklearn.metrics import classification_report

print(classification_report(t_test,res))

#%%

print('第五步：保存模型')

from sklearn.externals import joblib

joblib.dump(model,r'D:\train\201905data\mymodel.model')

#%%

print('第六步：加载新数据、使用模型')

file_path_do = r'D:\train\201905data\do_liwang.csv'

deal_data = pd.read_csv(file_path_do,encoding='UTF-8')

#数据清洗:缺失值处理

deal_data = deal_data.dropna()

deal_data['voice_mail_plan'] = deal_data['voice_mail_plan'].map(lambda x: x.strip())

deal_data['intl_plan'] = deal_data['intl_plan'].map(lambda x: x.strip())

deal_data['churned'] = deal_data['churned'].map(lambda x: x.strip())

deal_data['voice_mail_plan'] = deal_data['voice_mail_plan'].map({'no':0, 'yes':1})

deal_data.intl_plan = deal_data.intl_plan.map({'no':0, 'yes':1})

for column in deal_data.columns:

    if deal_data[column].dtype == type(object):

        le = LabelEncoder()

        deal_data[column] = le.fit_transform(deal_data[column])

#数据清洗

#加载模型

model_file_path = r'D:\train\201905data\mymodel.model'

deal_model = joblib.load(model_file_path)

#预测

res = deal_model.predict(deal_data.drop(['churned'],axis=1))

#%%

print('第七步：执行模型，提供数据')

result_file_path = r'D:\train\201905data\result_liwang.csv'

deal_data.insert(1,'pre_result',res)

deal_data[['state','pre_result']].to_csv(result_file_path,sep=',',index=True,encoding='UTF-8')

Python 建模步骤的更多相关文章

Python学习步骤如何安排？
一.清楚学习目标无论是学习什么知识,都要有一个对学习目标的清楚认识. 只有这样才能朝着目标持续前进,少走弯路,从学习中得到不断的提升,享受python学习计划的过程. 二.基本python 知识学习 ...
Linux系统下升级Python版本步骤（suse系统）
Linux系统下升级Python版本步骤(suse系统) http://blog.csdn.net/lifengling1234/article/details/53536493
决策树python建模中的坑：ValueError: Expected 2D array, got 1D array instead:
决策树python建模中的坑代码 #coding=utf-8 from sklearn.feature_extraction import DictVectorizerimport csvfrom ...
odoo 14 python 单元测试步骤
# odoo 14 python 单元测试步骤 # 一.在模块根目录创建tests目录 # 二.在tests目录下创建__init__.py文件 # 三.继承TransactionCase(Singl ...
逻辑回归--美国挑战者号飞船事故_同盾分数与多头借贷Python建模实战
python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...
Python机器学习步骤
推荐学习顺序学习机器学习得有个步骤, 下面大家就能按照自己所需, 来探索这个网站. 图中请找到 "Start", 然后依次沿着箭头, 看看有没有不了解/没学过的地方, 接着, 就 ...
正态分布-python建模
sklearn实战-乳腺癌细胞数据挖掘 https://study.163.com/course/introduction.htm?courseId=1005269003&utm_campai ...
T分布在医药领域应用-python建模
sklearn实战-乳腺癌细胞数据挖掘 https://study.163.com/course/introduction.htm?courseId=1005269003&utm_campai ...
下载及安装Python详细步骤
安装python分三个步骤: *下载python *安装python *检查是否安装成功 1.下载Python (1)python下载地址https://www.python.org/download ...

随机推荐

Python web前端 11 form 和 ajax
Python web前端 11 form 和 ajax 一.打开服务器将handlers.py.httpd.py和libs.py三个文件放入新文件夹中,双击打开httpd.py文件即可二.ajax ...
Codeforces Round 56-C. Mishka and the Last Exam(思维+贪心)
time limit per test 2 seconds memory limit per test 256 megabytes input standard input output standa ...
服务器配置，负载均衡时需配置MachineKey
服务器配置,负载均衡时需配置MachineKey https://blog.csdn.net/liuqiao0327/article/details/54018922 Asp.Net应用程序中为什么要 ...
NET Core项目
在IIS上部署你的ASP.NET Core项目概述与ASP.NET时代不同,ASP.NET Core不再是由IIS工作进程(w3wp.exe)托管,而是使用自托管Web服务器(Kestrel) ...
JS——三种嵌入页面的方式
一行间事件二页面script标签嵌入三外部引入 <!DOCTYPE html> <html lang="en"> <head> < ...
Java thymeleaf模板获取资源文件的内容
我们在某些时候可能需要获取配置文件properties中的配置信息,而不需要用Java传给模板,在模板中就可以直接获取我们需要在resources/下定义国际化配置文件即可,注意名称必须中messa ...
linux mount命令详解（iso文件挂载）
挂载命令: mount [-t vfstype] [-o options] device dir mount 是挂载命令 -t + 类型 -o + 属性 device iso的文件 dir 挂 ...
Android 模仿苹果虚拟悬浮按钮（自动靠边、可浮现任何界面上）
由于最近小蔡的手机音量键坏了,调节音量有点麻烦,突发奇想,想自己实现一个快捷键来调节音量.在忘上参考了一些代码,总结出一般本章,分享给大家. 首先按钮要想实现悬浮在任何界面,那么必须是要写在服务里面 ...
如何修改Ruby的gem源(gem sources)
Ruby环境下的gem sources地址默认是国外网络地址,所以在使用gem的过程中经常会出现找不到资源的Error.那么如何解决这种Error?方法很简单:要么就多次尝试执行gem命令,要么就修改 ...
acdream 小晴天老师系列——我有一个数列！（ST算法）
小晴天老师系列——我有一个数列! Time Limit: 20000/10000MS (Java/Others) Memory Limit: 128000/64000KB (Java/Others)S ...

Python 建模步骤

Python 建模步骤的更多相关文章

随机推荐

热门专题