ICIC Express Letters                          ICIC
International ⓒ2010 ISSN 1881-803X

Volume4, Number5,
October 2010                                                pp.1–6

 

A Novel Multi-label Classification Based on PCA and ML-KNN

Di Wu, Dapeng Zhang, Fengqin Yang, Xu Zhou and Tieli Sun*

School of Computer
Science and Information Technology

Northeast Normal University

Changchun, 130117, P. R. China

suntl@nenu.edu.cn

ReceivedDecember
2010; accepted February 2011

Abstract.Multi-label Classification problems are omnipresent.ML-KNN
is a multi-label lazy learning approach. The feature of high dimensionsand redundancy of the dataset is not considered by ML-KNN, so the classificationresult is hard to be improved further. Principal Component Analysis (PCA) is apopular and powerful technique
for feature extraction and dimensionalityreduction. In this paper, a novel multi-label classification algorithm based onPCA and ML-KNN (named PCA-ML-KNN) is proposed. Experiments on two benchmarkdatasets for multi-label learning show that, PCA processes the
dataset in anoptimized manner, eliminating the need of huge dataset for ML-KNN, andPCA-ML-KNN achieves better performance than ML-KNN.

Keywords:Multi-label classification, ML-KNN, Dimension reduction,Feature
extraction, Principal Component Analysis (PCA)

1.Introduction.Multi-label classification is arousing more and more attention and is increasingly required by many applications in
widefields, such as protein function classification, music categorization and semantic scene classification. During the past decade, several multi-label learning algorithms have been proposed, like the multi-label decision tree based learning algorithm [1,2]
, the support vector machine based multi-labellearning algorithm [3], the ML-KNN algorithm [4,5], etc.. ML-KNN is derived from the traditional K-nearest neighbor (KNN) algorithm and is presented by Zhang and others. Several empirical studies demonstrated that
the dataset for Multi-label classification is bulky, and has the characteristic of high dimensions and redundancy. These features pose a serious obstac1e to any attempt to extract pertinent information, thus make it difficult to improve the multi-label classification
algorithms.

PCA is a technique of data analysis [6]. In fact it is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly
correlated variables into a set of values of uncorrelated variables called principal components. The most important application of PCA isto simplify the original data. PCA can effectively identify the most important elements in the dataset, eliminate noise
and redundancy. Another advantage ofPCA is that it has no parameter restrictions, and can be applied to variousfields.

In this paper, a novel multi-label classification algorithm based on PCA and ML-KNN is proposed for improving the classification performance. PCA is adopted to
reduce dataset dimensionality and noise. This isthe first procedure for the classification. Then ML-KNN method is used for rest processing. To verify the effectiveness of PCA-ML-KNN, two datasets, e.g. Sceneand Enron are used, and the experiments report excellent
performance.

......


*Corresponding
author

版权声明:本文博主原创文章,博客,未经同意不得转载。

A Novel Multi-label Classification Based on PCA and ML-KNN的更多相关文章

  1. Multi label 多标签分类问题(Pytorch,TensorFlow,Caffe)

    适用场景:一个输入对应多个label,或输入类别间不互斥 调用函数: 1. Pytorch使用torch.nn.BCEloss 2. Tensorflow使用tf.losses.sigmoid_cro ...

  2. [Tensorflow] Cookbook - Object Classification based on CIFAR-10

    Convolutional Neural Networks (CNNs) are responsible for the major breakthroughs in image recognitio ...

  3. 《Benign and maligenant breast tumors classification based on region growing and CNN segmentation》翻译阅读与理解

    注明:本人英语水平有限,翻译不当之处,请以英文原版为准,不喜勿喷,另,本文翻译只限于学术交流,不涉及任何版权问题,若有不当侵权或其他任何除学术交流之外的问题,请留言本人,本人立刻删除,谢谢!! 另:欢 ...

  4. Automatic Annotation of Airborne Images by Label Propagation Based on a Bayesian-CRF Model

    贝叶斯+全连接条件场,无人机和航片数据,通过标注航片数据自动生成无人机标注数据,具体不懂

  5. Hyperspectral Images Classification Based on Dense Convolutional Networks with Spectral-Wise Attention Mechanism

    借鉴了DenseNet的思想,用了空洞卷积而不是池化,使得特征图不会缩小,因此每个dense连接都可以直接连,最后一层是包括了前面所有层的特征图. 此外还加入了channel-wise的注意力,对每个 ...

  6. {ICIP2014}{收录论文列表}

    This article come from HEREARS-L1: Learning Tuesday 10:30–12:30; Oral Session; Room: Leonard de Vinc ...

  7. ECCV 2014 Results (16 Jun, 2014) 结果已出

    Accepted Papers     Title Primary Subject Area ID 3D computer vision 93 UPnP: An optimal O(n) soluti ...

  8. A great tutorial with Jupyter notebook for ML beginners

    An end to end implementation of a Machine Learning pipeline SPANDAN MADAN Visual Computing Group, Ha ...

  9. [C2P3] Andrew Ng - Machine Learning

    ##Advice for Applying Machine Learning Applying machine learning in practice is not always straightf ...

随机推荐

  1. sql中的CHARINDEX和暂时表

    update #temp set #temp.Recycle=case when UnionA.num>0 then 1 else 0 end from (select GradeID,sum( ...

  2. C++编程命名规范

    原地址:http://www.cnblogs.com/joinclear/archive/2013/02/21/2921422.html C++编程命名规范 0前言 根据多年工作经验和其它命名规范整理 ...

  3. Oracle 11g服务详解

    装上Oracle之后大家都会感觉到我们的电脑慢了下来,如何提高计算机的速度呢?我们应该打开必要的服务,关闭没有用的服务.下面是Oracle服务的详解: Oracle ORCL VSS Writer S ...

  4. 《深入理解OSGi:Equinox原理、应用与最佳实践》笔记_2_建立开发环境

    本文对应书本5.1.3的内容 书本中通过CVS下载的源码 但是笔者实践的时候发现无法下载...地址已经失效了(也许是笔者的失误输错地址所致) 可以用git下载 地址是: http://git.ecli ...

  5. POJ 2676/2918 数独(dfs)

    思路:记录每行每列每一个宫已经出现的数字就可以.数据比較弱 另外POJ 3074 3076 必须用剪枝策略.但实现较麻烦,还是以后学了DLX再来做吧 //Accepted 160K 0MS #incl ...

  6. PHP, Python, Node.js 哪个比较适合写爬虫?

    PHP, Python, Node.js 哪个比较适合写爬虫? 1.对页面的解析能力2.对数据库的操作能力(mysql)3.爬取效率4.代码量推荐语言时说明所需类库或者框架,谢谢.比如:python+ ...

  7. hdu3974(线段树+dfs)

    题目连接:http://acm.hdu.edu.cn/showproblem.php?pid=3974 题意:给定点的上下级关系,规定如果给i分配任务a,那么他的所有下属.都停下手上的工作,开始做a. ...

  8. Home · chineking/cola Wiki

    Home · chineking/cola Wiki Home Cola Cola是一个分布式的爬虫框架,用户只需编写几个特定的函数,而无需关注分布式运行的细节.任务会自动分配到多台机器上,整个过程对 ...

  9. Google Earth数据存储、管理、表现及开发机制

    Google Earth数据存储.管理.表现及开发机制 一.    Google Earth(Map)介绍 1.1    Google Earth介绍 在众多的地理信息服务提供商中,Google是较早 ...

  10. poj 2010 Moo University - Financial Aid (贪心+线段树)

    转载请注明出处,谢谢http://blog.csdn.net/ACM_cxlove?viewmode=contents    by---cxlove 骗一下访问量.... 题意大概是:从c个中选出n个 ...