A Novel Multi-label Classification Based on PCA and ML-KNN
|
A Novel Multi-label Classification Based on PCA and ML-KNN
Di Wu, Dapeng Zhang, Fengqin Yang, Xu Zhou and Tieli Sun*
School of Computer
Science and Information Technology
Northeast Normal University
Changchun, 130117, P. R. China
suntl@nenu.edu.cn
ReceivedDecember
2010; accepted February 2011
Abstract.Multi-label Classification problems are omnipresent.ML-KNN
is a multi-label lazy learning approach. The feature of high dimensionsand redundancy of the dataset is not considered by ML-KNN, so the classificationresult is hard to be improved further. Principal Component Analysis (PCA) is apopular and powerful technique
for feature extraction and dimensionalityreduction. In this paper, a novel multi-label classification algorithm based onPCA and ML-KNN (named PCA-ML-KNN) is proposed. Experiments on two benchmarkdatasets for multi-label learning show that, PCA processes the
dataset in anoptimized manner, eliminating the need of huge dataset for ML-KNN, andPCA-ML-KNN achieves better performance than ML-KNN.
Keywords:Multi-label classification, ML-KNN, Dimension reduction,Feature
extraction, Principal Component Analysis (PCA)
1.Introduction.Multi-label classification is arousing more and more attention and is increasingly required by many applications in
widefields, such as protein function classification, music categorization and semantic scene classification. During the past decade, several multi-label learning algorithms have been proposed, like the multi-label decision tree based learning algorithm [1,2]
, the support vector machine based multi-labellearning algorithm [3], the ML-KNN algorithm [4,5], etc.. ML-KNN is derived from the traditional K-nearest neighbor (KNN) algorithm and is presented by Zhang and others. Several empirical studies demonstrated that
the dataset for Multi-label classification is bulky, and has the characteristic of high dimensions and redundancy. These features pose a serious obstac1e to any attempt to extract pertinent information, thus make it difficult to improve the multi-label classification
algorithms.
PCA is a technique of data analysis [6]. In fact it is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly
correlated variables into a set of values of uncorrelated variables called principal components. The most important application of PCA isto simplify the original data. PCA can effectively identify the most important elements in the dataset, eliminate noise
and redundancy. Another advantage ofPCA is that it has no parameter restrictions, and can be applied to variousfields.
In this paper, a novel multi-label classification algorithm based on PCA and ML-KNN is proposed for improving the classification performance. PCA is adopted to
reduce dataset dimensionality and noise. This isthe first procedure for the classification. Then ML-KNN method is used for rest processing. To verify the effectiveness of PCA-ML-KNN, two datasets, e.g. Sceneand Enron are used, and the experiments report excellent
performance.
*Corresponding
author
版权声明:本文博主原创文章,博客,未经同意不得转载。
A Novel Multi-label Classification Based on PCA and ML-KNN的更多相关文章
- Multi label 多标签分类问题(Pytorch,TensorFlow,Caffe)
适用场景:一个输入对应多个label,或输入类别间不互斥 调用函数: 1. Pytorch使用torch.nn.BCEloss 2. Tensorflow使用tf.losses.sigmoid_cro ...
- [Tensorflow] Cookbook - Object Classification based on CIFAR-10
Convolutional Neural Networks (CNNs) are responsible for the major breakthroughs in image recognitio ...
- 《Benign and maligenant breast tumors classification based on region growing and CNN segmentation》翻译阅读与理解
注明:本人英语水平有限,翻译不当之处,请以英文原版为准,不喜勿喷,另,本文翻译只限于学术交流,不涉及任何版权问题,若有不当侵权或其他任何除学术交流之外的问题,请留言本人,本人立刻删除,谢谢!! 另:欢 ...
- Automatic Annotation of Airborne Images by Label Propagation Based on a Bayesian-CRF Model
贝叶斯+全连接条件场,无人机和航片数据,通过标注航片数据自动生成无人机标注数据,具体不懂
- Hyperspectral Images Classification Based on Dense Convolutional Networks with Spectral-Wise Attention Mechanism
借鉴了DenseNet的思想,用了空洞卷积而不是池化,使得特征图不会缩小,因此每个dense连接都可以直接连,最后一层是包括了前面所有层的特征图. 此外还加入了channel-wise的注意力,对每个 ...
- {ICIP2014}{收录论文列表}
This article come from HEREARS-L1: Learning Tuesday 10:30–12:30; Oral Session; Room: Leonard de Vinc ...
- ECCV 2014 Results (16 Jun, 2014) 结果已出
Accepted Papers Title Primary Subject Area ID 3D computer vision 93 UPnP: An optimal O(n) soluti ...
- A great tutorial with Jupyter notebook for ML beginners
An end to end implementation of a Machine Learning pipeline SPANDAN MADAN Visual Computing Group, Ha ...
- [C2P3] Andrew Ng - Machine Learning
##Advice for Applying Machine Learning Applying machine learning in practice is not always straightf ...
随机推荐
- 浅谈Swift语法
Apple 在2014年6月的WWDC公布了一款新型的开发语言,很多美国程序猿的价值观貌似和我们非常大的不同,在公布的时候我们能够听到,场下的欢呼声是接连不断的.假设换作我们,特别是像有Objecti ...
- 无状态TCP的ip_conntrack
Linux的ip_conntrack实现得过于沉重和精细.而实际上有时候,根本不需要在conntrack中对TCP的状态进行跟踪,只把它当UDP好了,我们的需求就是让系统可以将一个数据包和一个五元组标 ...
- 性能测试开源小工具——http_load介绍
淘测试 性能测试开源小工具——http_load介绍 meizhu 发表于:2009-07-02 浏览:3552次 评论:1次 所属分类: 性能测试 性能测试开源小工具——http_load介绍 ht ...
- 架设FLASH视频流server心得
什么样的情况下才使用FMS?有下面几种情形的时候,你可能须要用到FMS 1.须要通过Flash Player 播放视频,而视频是以流的方式,而不是http渐进式下载的方式进行播放的时候.渐进式下载就是 ...
- 搭建php环境时解决jpeg6 make: ./libtool:命令未找到
搭建php环境时解决jpeg6 make: ./libtool:命令未找到 [root@bogon jpeg-6b]# make; make install ./libtool --mode=comp ...
- cocos2d_x_05_Box2D物理引擎
一.认识Box2D 帮助文档,共69页 二.创建一个物理世界 先导入主头文件 #include <Box2D/Box2D.h> 三.物理世界一览 像素转成米 的比例因子 就是32 三.运动 ...
- WPF-20:richtextbox相关操作(转)
WPF中的richtextbox与winform中的richtextbox的使用不同,看看下面的基本操作: 一.取出richTextBox里面的内容 (1)将richTextBox的内容以字符串的形 ...
- Visual Studio使用正则表达式快速统计总共代码行数
原文:Visual Studio使用正则表达式快速统计总共代码行数 按CTRL+SHIFT+F,勾上支持正则表达式,然后输入搜索内容: <span style="font-family ...
- oracle ebs 12.20 安装成功其过程失败日记及总结(1)
由于公司业务须要,须要安装oracle ebs进行 form 开发,所以就開始了痛苦oracle ebs安装之过程.刚開始是在vm中win2003 server 中安装ebs,,不知是我自已的水平太差 ...
- Cantor的数表 【找规律】
小俞同学,近期勤学苦练数学,对一种数表产生了兴趣. 数表例如以下: 1/1 1/2 1/3 1/4 1/5 2/1 2/2 2/3 2/4 3/1 3/2 3/3 4/1 4/2 5/1 她冥思苦相了 ...