调用weka模拟实现 “主动学习“ 算法

主动学习：

主动学习的过程：需要分类器与标记专家进行交互。一个典型的过程：

（1）基于少量已标记样本构建模型

（2）从未标记样本中选出信息量最大的样本，交给专家进行标记

（3）将这些样本与之前样本进行融合，并构建模型

（4）重复执行步骤（2）和步骤（3），直到stopping criterion（不存在未标记样本或其他条件）满足为止

模拟思路：

1. 将数据分为label 和 unlabel数据集

2. 将 unlabel 分为100个一组，每组样本数组分别求出熵值，按照熵值排序，取前5个样本，添加到 label样本之中

package demo;

import java.io.FileReader;

import java.util.ArrayList;

import java.util.Collections;

import java.util.Random;

import weka.classifiers.Evaluation;

import weka.classifiers.bayes.NaiveBayes;

import weka.core.Instance;

import weka.core.Instances;

import weka.core.converters.ConverterUtils.DataSource;

//将测试用例，按照熵值进行排序

class InstanceSort implements Comparable<InstanceSort>{

    public Instance instance;

    public double entropy;

    public InstanceSort( Instance instance, double entropy){

        this.instance = instance;

        this.entropy = entropy;

    }

    @Override

    public int compareTo(InstanceSort o) {

        // TODO Auto-generated method stub

        if (this.entropy < o.entropy){

            return 1;

        }else if ( this.entropy > o.entropy){

            return -1;

        }

        return 0;

    }

}

public class ActiveLearning {

    public static Instances getInstances( String fileName) throws Exception{

        Instances data = new Instances (new FileReader(fileName));

        data.setClassIndex(data.numAttributes()-1);

        return data;

    }

    //计算熵

    public static double computeEntropy(double predictValue){

        double entropy = 0.0;

        if ( 1-predictValue < 0.000000001d || predictValue < 0.000000001d){

            return 0;

        }else {

            return -predictValue*(Math.log(predictValue)/Math.log(2.0d))-(1-predictValue)*(Math.log(1-predictValue)/Math.log(2.0d));

        }

    }

    public static void classify(Instances train, Instances test) throws Exception{

        NaiveBayes classifier = new NaiveBayes();

        //训练模型

        classifier.buildClassifier(train);

        //评价模型

        Evaluation eval = new Evaluation(test);

        eval.evaluateModel(classifier, test);

        System.out.println(eval.toClassDetailsString());

    }

    //不确定采样

    public static Instances uncertaintySample(Instances labeled, Instances unlabeled, int start, int end) throws Exception{

        //用有标签的先训练模型

        NaiveBayes classifier = new NaiveBayes();

        classifier.buildClassifier(labeled);

        //按照熵进行排序

        ArrayList <InstanceSort> l = new ArrayList<InstanceSort>();

        for (int i = start; i < end; i++) {

            double result = classifier.classifyInstance(unlabeled.instance(i));

            double entropy =  computeEntropy (result);

            InstanceSort is = new InstanceSort(unlabeled.instance(i), entropy);

            l.add(is);

        }

        //按照熵值进行排序

        Collections.sort(l);

        DataSource source = new DataSource("NASA//pc1.arff");

        Instances A = source.getDataSet();

        Instances chosenInstances = new Instances(A, 0);

        //每100个里面选择5个熵值最小的实例

        for(int i = 0; i < 5; i++){

            chosenInstances.add(l.get(i).instance);

        }

        return chosenInstances;

    }

    //采样

    public static void sample( Instances instances, Instances test) throws Exception{

        Random rand = new Random(1023);

        instances.randomize(rand);

        instances.stratify(10);

        Instances unlabeled = instances.trainCV(10, 0);

        Instances labeled = instances.testCV(10, 0);

        int iterations = unlabeled.numInstances() / 100 +1;

        for ( int i=0; i< iterations-1 ; i++){

            //每100个里面选择5个熵值最小的实例

            //100个一组

            Instances resultInstances = uncertaintySample(labeled, unlabeled, i*100, (i+1)*100);

            for (int j = 0; j < resultInstances.numInstances(); j++){

                labeled.add(resultInstances.instance(j));

            }

            classify(labeled, test);

        }

        Instances resultInstances = uncertaintySample(labeled, unlabeled, (iterations-1)*100, unlabeled.numInstances());

        for (int j = 0; j < resultInstances.numInstances(); j++){

            labeled.add(resultInstances.instance(j));

        }

        classify(labeled, test);    

    }

    public static void main(String[] args)  throws Exception{

        // TODO Auto-generated method stub

        Instances instances = getInstances("NASA//pc1.arff");

        //10-fold cross validation

        Random rand = new Random(1023);

        instances.randomize(rand);

        instances.stratify(10);

        Instances train = instances.trainCV(10, 0);

        Instances test = instances.testCV(10, 0);

//        System.out.println(train.numInstances());

//        System.out.println(test.numInstances());

        sample(train,test);

    }

}

调用weka模拟实现 “主动学习“ 算法的更多相关文章

简要介绍Active Learning(主动学习)思想框架，以及从IF（isolation forest）衍生出来的算法：FBIF（Feedback-Guided Anomaly Discovery）
1. 引言本文所讨论的内容为笔者对外文文献的翻译,并加入了笔者自己的理解和总结,文中涉及到的原始外文论文和相关学习链接我会放在reference里,另外,推荐读者朋友购买 Stephen Boyd的 ...
【主动学习】Variational Adversarial Active Learning
本文记录了博主阅读ICCV2019一篇关于主动学习论文的笔记,第一篇博客,以后持续更新哈哈论文题目:<Variational AdVersarial Active Learning> 原 ...
主动学习(Active Learning)
主动学习简介在某些情况下,没有类标签的数据相当丰富而有类标签的数据相当稀少,并且人工对数据进行标记的成本又相当高昂.在这种情况下,我们可以让学习算法主动地提出要对哪些数据进行标注,之后我们要将这些数 ...
主动学习——active learning
阅读目录 1. 写在前面 2. 什么是active learning? 3. active learning的基本思想 4. active learning与半监督学习的不同 5. 参考文献 1. ...
Active Learning 主动学习
Active Learning 主动学习 2015年09月30日 14:49:29 qrlhl 阅读数 21374 文章标签: 算法机器学习更多分类专栏: 机器学习版权声明:本文为博主原创文 ...
第一周-调用weka算法进行数据挖掘
第一周-调用weka算法进行数据挖掘简单数据集data.txt @relation weather @attribute outlook {sunny, overcast, rainy} @attr ...
机器学习：eclipse中调用weka的Classifier分类器代码Demo
weka中实现了很多机器学习算法,不管实验室研究或者公司研发,都会或多或少的要使用weka,我的理解是weka是在本地的SparkML,SparkML是分布式的大数据处理机器学习算法,数据量不是很大的 ...
[C#][算法] 用菜鸟的思维学习算法 -- 马桶排序、冒泡排序和快速排序
用菜鸟的思维学习算法 -- 马桶排序.冒泡排序和快速排序 [博主]反骨仔 [来源]http://www.cnblogs.com/liqingwen/p/4994261.html 目录马桶排序(令人 ...
调用WEKA包进行kmeans聚类（java）
所用数据文件:data1.txt @RELATION data1 @ATTRIBUTE one REAL @ATTRIBUTE two REAL @DATA 0.184000 0.482000 0.1 ...

随机推荐

Internet History, Technology and Security (Week2)
Week2. History: The First Internet - NSFNet coursera address Supercomputers Justify a National Netwo ...
解决getOutputStream() has alerady been called for this response
在用tomcat启动一个web项目(SpringBoot)的时候报错: getOutputStream() has alerady been called for this response 但是如果 ...
docker搭建redis未授权访问漏洞环境
这是redis未授权访问漏洞环境,可以使用该环境练习重置/etc/passwd文件从而重置root密码环境我已经搭好放在了docker hub 可以使用命令docker search ju5ton1 ...
webpack命令局部运行的几种方法
webpack命令局部运行的几种方法 1. 第一种,先全局安装webpack 命令:npm install -g webpack 然后再在项目内安装命令:npm install webpack ...
ubuntu16.04 关闭防火墙的方法
开启防火墙 ufw enable 关闭防火墙 ufw disable
JS贪吃蛇小游戏
效果图展示: 具体实现代码如下: (1)html部分 !DOCTYPE html> <html> <head> <meta charset="utf-8& ...
testdisk修复文件系统
故障修复步骤: 1. 检查磁盘分区级文件系统确实不在: 2. 云主机内部下载testdisk工具修复 yum install testdisk -y 3. 执行命令testdisk /dev/vdc进 ...
Mac 安装nodejs
原文链接:http://blog.csdn.net/u010053344/article/details/50545304 Mac 安装nodejs 这几日因为需求需要又临时用到nodejs,之前安装 ...
java中的==操作符和equals函数
基本规则 “==”操作符的使用需要分成两种情况判值类型相等这一点很好理解,两个值类型代表的数值相等,则“==”表达式返回true “==”可以用与不同值类型的比较,语言会自动进行类型转换判引用类 ...
redis压力测试工具-----redis-benchmark
redis做压测可以用自带的redis-benchmark工具,使用简单压测命令:redis-benchmark -h 127.0.0.1 -p 6379 -c 50 -n 10000 压测需要一段 ...

调用weka模拟实现 “主动学习“ 算法

调用weka模拟实现 “主动学习“ 算法的更多相关文章

随机推荐

热门专题