R语言决策树分类模型
rm(list=ls())
gc() memory.limit(4000)
library(corrplot)
library(rpart)
data_health<-read.csv("D:/smart_data0608/smart_data_section_good_15.txt",header=FALSE,sep="\t",na.strings="None")#读健康数据
data_fault<-read.csv("D:/smart_data0608/smart_data_section_failTrainSet_last24h.txt",header=FALSE,sep="\t",na.strings="None")#读故障数据-训练数据
data_fault_test<-read.csv("D:/smart_data0608/smart_data_section_failTestSet_last24h.txt",header=FALSE,sep="\t",na.strings="None")#读故障数据—测试数据 colnames(data_health) <- c("id","serial_number","update_time","smart_health_status","current_drive_temperature","drive_trip_temperature","elements_in_grown_defect_list","manufactured_time","cycle_count","load_unload_count","load_unload_count","load_unload_cycles","blocks_sent_to_initiator","blocks_received_from_initiator","blocks_read_from_cache","num_commands_size_not_larger_than_segment_size ","num_commands_size_larger_than_segment_size","num_hours_powered_up","num_minutes_next_test","read_corrected_ecc_fast","read_corrected_ecc_delayed","read_corrected_re","read_total_errors_corrected","read_correction_algo_invocations","read_gigabytes_processed","read_total_uncorrected_errors","write_corrected_ecc_fast","write_corrected_ecc_delayed","write_corrected_re","write_total_errors_corrected","write_correction_algo_invocations","write_gigabytes_processed","write_total_uncorrected_errors","verify_corrected_ecc_fast","verify_corrected_ecc_delayed","verify_corrected_re","verify_total_errors_corrected","verify_correction_algo_invocations","verify_gigabytes_processed","verify_total_uncorrected_errors","non_medium_error_count") #列改名 colnames(data_fault) <- c("id","serial_number","update_time","smart_health_status","current_drive_temperature","drive_trip_temperature","elements_in_grown_defect_list","manufactured_time","cycle_count","load_unload_count","load_unload_count","load_unload_cycles","blocks_sent_to_initiator","blocks_received_from_initiator","blocks_read_from_cache","num_commands_size_not_larger_than_segment_size ","num_commands_size_larger_than_segment_size","num_hours_powered_up","num_minutes_next_test","read_corrected_ecc_fast","read_corrected_ecc_delayed","read_corrected_re","read_total_errors_corrected","read_correction_algo_invocations","read_gigabytes_processed","read_total_uncorrected_errors","write_corrected_ecc_fast","write_corrected_ecc_delayed","write_corrected_re","write_total_errors_corrected","write_correction_algo_invocations","write_gigabytes_processed","write_total_uncorrected_errors","verify_corrected_ecc_fast","verify_corrected_ecc_delayed","verify_corrected_re","verify_total_errors_corrected","verify_correction_algo_invocations","verify_gigabytes_processed","verify_total_uncorrected_errors","non_medium_error_count") #列改名 colnames(data_fault_test) <- c("id","serial_number","update_time","smart_health_status","current_drive_temperature","drive_trip_temperature","elements_in_grown_defect_list","manufactured_time","cycle_count","load_unload_count","load_unload_count","load_unload_cycles","blocks_sent_to_initiator","blocks_received_from_initiator","blocks_read_from_cache","num_commands_size_not_larger_than_segment_size ","num_commands_size_larger_than_segment_size","num_hours_powered_up","num_minutes_next_test","read_corrected_ecc_fast","read_corrected_ecc_delayed","read_corrected_re","read_total_errors_corrected","read_correction_algo_invocations","read_gigabytes_processed","read_total_uncorrected_errors","write_corrected_ecc_fast","write_corrected_ecc_delayed","write_corrected_re","write_total_errors_corrected","write_correction_algo_invocations","write_gigabytes_processed","write_total_uncorrected_errors","verify_corrected_ecc_fast","verify_corrected_ecc_delayed","verify_corrected_re","verify_total_errors_corrected","verify_correction_algo_invocations","verify_gigabytes_processed","verify_total_uncorrected_errors","non_medium_error_count") #列改名 data_health$label <- 0
data_fault$label <- 1
data_fault_test$label <- 1 #决策树
n <- nrow(data_fault)
dataNewTraining<-rbind(data_fault,data_health[sample(1:(nrow(data_health[1:(nrow(data_health)*0.7),])),n*20),])
dataNewTest<-rbind(data_fault_test,data_health[-(1:(nrow(data_health)*0.7)),]) pdf(file='D:/smart_data0608/smartDT_last24h.pdf',family="GB1")
dt <- rpart(label~ current_drive_temperature + elements_in_grown_defect_list + read_corrected_ecc_fast + read_corrected_ecc_delayed + read_corrected_re + read_total_errors_corrected + read_correction_algo_invocations + read_gigabytes_processed + read_total_uncorrected_errors + write_corrected_ecc_fast + write_corrected_ecc_delayed + write_corrected_re + write_total_errors_corrected + write_correction_algo_invocations + write_gigabytes_processed + write_total_uncorrected_errors,data = dataNewTraining, method = "class")
plot(dt,main="smartDT");text(dt)
dev.off() rawPredictScore = predict(dt,dataNewTest)
predictScore <- data.frame(rawPredictScore)
predictScore$label <- 2
predictScore[predictScore$X0 > predictScore$X1,][,"label"]=0
predictScore[predictScore$X0 <= predictScore$X1,][,"label"]=1 write.table(data.frame(predictScore$label,dataNewTest$label,dataNewTest$update_time,dataNewTest$serial_number), file="D:/smart_data0608/smartTestSetWithSerNO_last24h.txt",row.names= F ,col.names= F ,sep="\t")
分类结果:
//smartTestSetWithSerNO_last24h
健康样本数/健康判为故障样本数:583670/978
健康磁盘数/健康判为故障磁盘数:4150/12
健康样本预测率为:0.9983243956345195
健康盘预测率为:0.9971084337349397
--------------------------------
故障样本数/故障判为故障样本数:170/169
故障磁盘数/故障判为故障磁盘数:11/11
故障样本预测率为:0.9941176470588236
故障盘预测率为:1.0
R语言决策树分类模型的更多相关文章
- R语言︱LDA主题模型——最优主题...
R语言︱LDA主题模型——最优主题...:https://blog.csdn.net/sinat_26917383/article/details/51547298#comments
- 基于R语言的ARIMA模型
A IMA模型是一种著名的时间序列预测方法,主要是指将非平稳时间序列转化为平稳时间序列,然后将因变量仅对它的滞后值以及随机误差项的现值和滞后值进行回归所建立的模型.ARIMA模型根据原序列是否平稳以及 ...
- R语言︱决策树族——随机森林算法
每每以为攀得众山小,可.每每又切实来到起点,大牛们,缓缓脚步来俺笔记葩分享一下吧,please~ --------------------------- 笔者寄语:有一篇<有监督学习选择深度学习 ...
- R语言与分类算法的绩效评估(转)
关于分类算法我们之前也讨论过了KNN.决策树.naivebayes.SVM.ANN.logistic回归.关于这么多的分类算法,我们自然需要考虑谁的表现更加的优秀. 既然要对分类算法进行评价,那么我们 ...
- R语言︱LDA主题模型——最优主题数选取(topicmodels)+LDAvis可视化(lda+LDAvis)
每每以为攀得众山小,可.每每又切实来到起点,大牛们,缓缓脚步来俺笔记葩分享一下吧,please~ --------------------------- 笔者寄语:在自己学LDA主题模型时候,发现该模 ...
- Spark 决策树--分类模型
package Spark_MLlib import org.apache.spark.ml.Pipeline import org.apache.spark.ml.classification.{D ...
- R语言的ARIMA模型预测
R通过RODBC连接数据库 stats包中的st函数建立时间序列 funitRoot包中的unitrootTest函数检验单位根 forecast包中的函数进行预测 差分用timeSeries包中di ...
- Redhat 5.8系统安装R语言作Arima模型预测
请见Github博客:http://wuxichen.github.io/Myblog/timeseries/2014/09/02/RJavaonLinux.html
- 不知道怎么改的尴尬R语言的ARIMA模型预测
数据还有很多没弄好,程序还没弄完全好. > read.xlsx("H:/ProjectPaper/论文/1.xlsx","Sheet1") > it ...
随机推荐
- MVC HtmlHelper
HTML扩展类的所有方法都有2个参数: 以textbox为例子 public static string TextBox( this HtmlHelper htmlHelper, string nam ...
- jq 判断输入数字
jq 判断输入数字 <input id="N_source" name="N_source" type="text" valu ...
- iframe子页面与父页面通信
同域下父子页面的通信 父页面: <!DOCTYPE html> <html> <head lang="en"> <meta charset ...
- Operating System Concepts with java 项目: Shell Unix 和历史特点
线程间通信,fork(),waitpid(),signal,捕捉信号,用c执行shell命令,共享内存,mmap 实验要求: 1.简单shell: 通过c实现基本的命令行shell操作,实现两个函数, ...
- HDU 5439 Aggregated Counting
题目大意: 由1开始不断往数组中添加数 就是按照当前所在位置所在的数表示的个数添加这个数目的数 1 2 2 3 3 后面因为要填4,而4号位置为3,说明之后要填3个4 问题就是给定一个n,找到n出现的 ...
- java基础之类与继承 详解
Java:类与继承 对于面向对象的程序设计语言来说,类毫无疑问是其最重要的基础.抽象.封装.继承.多态这四大特性都离不开类,只有存在类,才能体现面向对象编程的特点,今天我们就来了解一些类与继承的相关知 ...
- c#读取文本文档实践2-计算商品价格
商品 数量 单价英语 66 100语文 66 80数学 66 100化学 66 40物理 66 60 上面是文本文档中读入的数据. using System; using System.Collect ...
- iOS 端的 UI 聊天组件ChatKit及代码实现
ChatKit 是一个免费且开源的 UI 聊天组件,自带云服务器,自带推送,支持消息漫游,消息永久存储.底层聊天服务基于LeanCloud(原名 AVOS ) 的 IM 实时通信服务「LeanMess ...
- SharePoint 2010 BCS - 简单实例(二)外部列表创建
博客地址 http://blog.csdn.net/foxdave 接上篇 由于图片稍多篇幅过长影响阅读,所以分段来写. 添加完数据源之后,我们需要为我们要放到SharePoint上的数据表定义操作, ...
- Visual Studio 2013 如何关闭调试而不关闭IIS Express
在VS主面板打开:工具->选项->调试->编辑继续 取消选中[启用"编辑并继续"] 就OK了 (英文版的请对应相应的操作) 不过这是针对所有的调试,如果你想针 ...