使用weka进行Cross-validation实验
Generating cross-validation folds (Java approach)
文献:
http://weka.wikispaces.com/Generating+cross-validation+folds+%28Java+approach%29
This article describes how to generate train/test splits for cross-validation using
the Weka API directly.
The following variables are given:
Instances data = ...; // contains the full dataset we wann
create train/test sets from
int seed = ...; // the seed for
randomizing the data
int folds = ...; // the number of
folds to generate, >=2
Randomize the data
First, randomize
your data:
Random rand = new Random(seed); // create seeded number generator
randData = new
Instances(data); // create copy of
original data
randData.randomize(rand); // randomize data
with number generator
In case your data
has a nominal class and you wanna perform stratified cross-validation:
randData.stratify(folds);
Generate the folds
Single run
Next thing that we
have to do is creating the train and the test set:
for
(int n = 0; n < folds; n++) {
Instances train = randData.trainCV(folds, n);
Instances test = randData.testCV(folds, n);
// further
processing, classification, etc.
...
}
Note:
- the above code is used by the weka.filters.supervised.instance.StratifiedRemoveFolds filter
- the weka.classifiers.Evaluation class and the Explorer/Experimenter
would use this method for obtaining the train set:
Instances train = randData.trainCV(folds, n, rand);
Multiple runs
The example above
only performs one run of a cross-validation. In case you want to run 10 runs of
10-fold cross-validation, use the following loop:
Instances data = ...; // our dataset again, obtained from
somewhere
int runs = 10;
for
(int i = 0; i < runs; i++) {
seed = i+1; // every run gets a
new, but defined seed value
// see:
randomize the data
...
// see: generate
the folds
...
}
一个简单的小实验:
继续对上一节中的红酒和白酒进行分类。分类器没有变化,只是增加了重复试验过程
package assignment2;
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
import weka.core.Utils;
import weka.classifiers.Classifier;
import weka.classifiers.Evaluation;
import weka.classifiers.trees.J48;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.Remove;
import java.io.FileReader;
import java.util.Random;
public class cv_rw {
public static Instances getFileInstances(String filename) throws Exception{
FileReader frData =new FileReader(filename);
Instances data = new Instances(frData);
int length= data.numAttributes();
String[] options = new String[2];
options[0]="-R";
options[1]=Integer.toString(length);
Remove remove =new Remove();
remove.setOptions(options);
remove.setInputFormat(data);
Instances newData= Filter.useFilter(data, remove);
return newData;
}
public static void main(String[] args) throws Exception {
// loads data and set class index
Instances data = getFileInstances("D://Weka_tutorial//WineQuality//RedWhiteWine.arff");
// System.out.println(instances);
data.setClassIndex(data.numAttributes()-1);
// classifier
// String[] tmpOptions;
// String classname;
// tmpOptions = Utils.splitOptions(Utils.getOption("W", args));
// classname = tmpOptions[0];
// tmpOptions[0] = "";
// Classifier cls = (Classifier) Utils.forName(Classifier.class, classname, tmpOptions);
//
// // other options
// int runs = Integer.parseInt(Utils.getOption("r", args));//重复试验
// int folds = Integer.parseInt(Utils.getOption("x", args));
int runs=1;
int folds=10;
J48 j48= new J48();
// j48.buildClassifier(instances);
// perform cross-validation
for (int i = 0; i < runs; i++) {
// randomize data
int seed = i + 1;
Random rand = new Random(seed);
Instances randData = new Instances(data);
randData.randomize(rand);
// if (randData.classAttribute().isNominal()) //没看懂这里什么意思,往高手回复,万分感谢
// randData.stratify(folds);
Evaluation eval = new Evaluation(randData);
for (int n = 0; n < folds; n++) {
Instances train = randData.trainCV(folds, n);
Instances test = randData.testCV(folds, n);
// the above code is used by the StratifiedRemoveFolds filter, the
// code below by the Explorer/Experimenter:
// Instances train = randData.trainCV(folds, n, rand);
// build and evaluate classifier
Classifier j48Copy = Classifier.makeCopy(j48);
j48Copy.buildClassifier(train);
eval.evaluateModel(j48Copy, test);
}
// output evaluation
System.out.println();
System.out.println("=== Setup run " + (i+1) + " ===");
System.out.println("Classifier: " + j48.getClass().getName());
System.out.println("Dataset: " + data.relationName());
System.out.println("Folds: " + folds);
System.out.println("Seed: " + seed);
System.out.println();
System.out.println(eval.toSummaryString("=== " + folds + "-fold Cross-validation run " + (i+1) + "===", false));
}
}
}
运行程序得到实验结果:
=== Setup run 1 ===
Classifier:
weka.classifiers.trees.J48
Dataset:
RedWhiteWine-weka.filters.unsupervised.instance.Randomize-S42-weka.filters.unsupervised.instance.Randomize-S42-weka.filters.unsupervised.attribute.Remove-R13
Folds: 10
Seed: 1
=== 10-fold Cross-validation run
1===
Correctly Classified Instances 6415 98.7379 %
Incorrectly Classified
Instances 82 1.2621 %
Kappa statistic 0.9658
Mean absolute error 0.0159
Root mean squared error 0.1109
Relative absolute error 4.2898 %
Root relative squared error 25.7448 %
Total Number of Instances 6497
使用weka进行Cross-validation实验的更多相关文章
- 交叉验证(cross validation)
转自:http://www.vanjor.org/blog/2010/10/cross-validation/ 交叉验证(Cross-Validation): 有时亦称循环估计, 是一种统计学上将数据 ...
- Cross Validation(交叉验证)
交叉验证(Cross Validation)方法思想 Cross Validation一下简称CV.CV是用来验证分类器性能的一种统计方法. 思想:将原始数据(dataset)进行分组,一部分作为训练 ...
- S折交叉验证(S-fold cross validation)
S折交叉验证(S-fold cross validation) 觉得有用的话,欢迎一起讨论相互学习~Follow Me 仅为个人观点,欢迎讨论 参考文献 https://blog.csdn.net/a ...
- 交叉验证(Cross Validation)简介
参考 交叉验证 交叉验证 (Cross Validation)刘建平 一.训练集 vs. 测试集 在模式识别(pattern recognition)与机器学习(machine lea ...
- cross validation笔记
preface:做实验少不了交叉验证,平时常用from sklearn.cross_validation import train_test_split,用train_test_split()函数将数 ...
- cross validation
k-folder cross-validation:k个子集,每个子集均做一次测试集,其余的作为训练集.交叉验证重复k次,每次选择一个子集作为测试集,并将k次的平均交叉验证识别正确率作为结果.优点:所 ...
- 交叉验证(Cross Validation)方法思想简介
以下简称交叉验证(Cross Validation)为CV.CV是用来验证分类器的性能一种统计分析方法,基本思想是把在某种意义下将原始数据(dataset)进行分组,一部分做为训练集(train ...
- 交叉验证(Cross Validation)原理小结
交叉验证是在机器学习建立模型和验证模型参数时常用的办法.交叉验证,顾名思义,就是重复的使用数据,把得到的样本数据进行切分,组合为不同的训练集和测试集,用训练集来训练模型,用测试集来评估模型预测的好坏. ...
- 交叉验证 Cross validation
来源:CSDN: boat_lee 简单交叉验证 hold-out cross validation 从全部训练数据S中随机选择s个样例作为训练集training set,剩余的作为测试集testin ...
- Cross Validation done wrong
Cross Validation done wrong Cross validation is an essential tool in statistical learning 1 to estim ...
随机推荐
- 深入理解SQL注入绕过WAF与过滤机制
知己知彼,百战不殆 --孙子兵法 [目录] 0x0 前言 0x1 WAF的常见特征 0x2 绕过WAF的方法 0x3 SQLi Filter的实现及Evasion 0x4 延伸及测试向量示例 0x5 ...
- uva 1421
稍微有点新意的二分 #include<cstdio> #include<cstring> #include<algorithm> #include<cmath ...
- c++中的隐藏、重载、覆盖(重写)
转自c++中的隐藏.重载.覆盖(重写) 1 重载与覆盖 成员函数被重载的特征: (1)相同的范围(在同一个类中): (2)函数名字相同: (3)参数不同: (4)virtual关键字可有可无. 覆盖是 ...
- BCB6编译LUA5.15成功!
由于想要在一个原生应用里提供脚本功能,而Python的发布不能不说是一件麻烦事.因为所需要的脚本功能很简单,所以决定试试传说中的Lua. 第一步,下载源码.虽然Lua有提供二进制版本下载,但是因为我是 ...
- 221. Maximal Square
题目: Given a 2D binary matrix filled with 0's and 1's, find the largest square containing all 1's and ...
- hdr_beg(host) hdr_reg(host) hdr_dom(host)
case 1 测试hdr_beg(host) 的情况 acl zjtest7_com hdr_beg(host) -i zjtest7.com use_backend zjtest7_com if z ...
- Linux上程序执行的入口--Main
main()函数,想必大家都不陌生了,从刚开始写程序的时候,大家便开始写main(),我们都知道main是程序的入口.那main作为一个函数,又是谁调用的它,它是怎么被调用的,返回给谁,返回的又是什么 ...
- css揭秘之按钮的实现技巧
<html> <title>css</title> <style> button{ padding: .3em .8em; border: 1px so ...
- JSON 之JAVA 解析
一. JSON (JavaScript Object Notation)一种简单的数据格式,比xml更轻巧. Json建构于两种结构: 1.“名称/值”对的集合(A collection ...
- hdu 4642 Fliping game(博弈)
题目:http://acm.hdu.edu.cn/showproblem.php?pid=4642 题意:给定一个棋盘,0表示向下,1表示向上,选一个x,y, 然后翻转从x,y 到n,m.的所有硬币, ...