windows下使用libsvm3.2

一、官方介绍

libsvm主页：https://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html

libsvm介绍文档：http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf

官方关于更有效地使用libsvm的使用说明：http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf （很有必要看）

数据库：https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/

关于二分类的实例：https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html

关于多分类实例：https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html

常见问答：http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html （这里能够帮你解决好多疑惑）

有用工具列表：https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/ （guide提到的liblinear在此）

二、须要软件

libsvm-3.20：http://www.csie.ntu.edu.tw/~cjlin/libsvm/libsvm-3.20.zip
python-2.7.10：https://www.python.org/ftp/python/2.7.10/python-2.7.10.msi（调用python工具时使用）
gnuplot5.0.1：http://jaist.dl.sourceforge.net/project/gnuplot/gnuplot/5.0.1/gp501-win32-mingw.exe（用画图展示整个搜索最佳參数过程）

三、训练过程说明
——（以后输入命令以.bat格式存储就可以使用）

1、提取数据形式的特征：（类别标签　特征序号：特征值）
1 1:2.111 2:3.567 3:-0.125
．．．
0 1:2.156 2:3.259 3:0.258
．．．
分别将训练样本数据和測试样本数据存成名为train的文件和名为test的文件（仅为了方便区分）

2、对特征数据进行缩放（提高运算效率）
svm-scale -l -1 -u 1 -s rangetrain >train.scale （-1~1表示缩放范围 -l表low -u表up -s表save 将变换后区间存为range train是原始特征数据 train.scale是缩放后的数据）
svm-scale -r range1test>test.scale（-r 表read 将test的数据按同一range进行缩放）
说明：区间[0,1]和[-1,1]的效果是一样的，仅仅是[0,1]的运算效率更高

3、寻找最优c、g參数
python grid.pytrain.scale（运算结束后,会提供最优參数c和g.比方运算结果是2.0 1.0 96.8922，96.8922为交叉验证准确率）

4、使用最优參数进行训练
svm-train -c 2 -g1train.scale（会生成一个名为train.scale.model文件,文件參数说明见兴许补充说明.这里我们使用了默认核函数RBF。一般RBF是效果最好的）

5、拿训练结果进行測试
svm-predict test.scale train.scale.model test.predict（得预測结果test.predict文件以及正确率）

四、补充说明：

1、改动交叉验证
svm-scale -l -1 -u 1 train >train.scale

svm-train -v 6 train.scale（交叉验证是为了得到更好的參数）

python grid.pytrain.scale
svm-train -c 2 -g 2 train.scale

2、关于/libsvm-3.20/tools/中的easy.py和grid.py
安装完python和gnuplot后,将E:\Program Files\Python,F:\libsvm-3.20\windows,E:\Program Files\gnuplot\bin三个目录加入到系统路径里面,改动上两个py文件里关于libsvm的路径和gnuplot的路径.
easy.py中：gnuplot_exe = r"e:\Program Files\gnuplot\bin\gnuplot.exe"
grid.py中：#svmtrain_pathname = r'f:\libsvm-3.20\windows\svm-train.exe'
self.gnuplot_pathname = r'e:\Program Files\gnuplot\bin\gnuplot.exe'
能够依照guide.pdf，用easy.py測试guide中的实例。guide中实验数据链接：http://www.csie.ntu.edu.tw/~cjlin/papers/guide/data/

3、关于model文件里的參数说明
svm_type c_svc （svc表用SVM作分类器,svr表用SVM作回归,c_svc 表用异常值惩处因子C进行不全然分类）
kernel_type rbf （径向基核,对于大多数情况都是一个较好的选择:d(x,y) = exp(-gamma*|x-y|2））
gamma 0.03125 （核函数的參数）
nr_class 2 （类别数）
total_sv 287 （支持向量总数）
rho 102.102 （判决函数的常数项b）
label 1 0（类标签）
nr_sv 144 143（各个类中落在边界上的向量个数）
SV（SV以下枚举了全部的支持向量）
8192 1:-1 2:-0.688314 3:0.595954 4:0.416735

．．．

4、svmscale.exe參数说明

"-l lower
: x scaling lower limit (default -1)\n"
"-u upper : x scaling upper limit (default +1)\n"
"-y y_lower y_upper : y scaling limits (default: no y scaling)\n"
"-s save_filename : save
scaling parameters to save_filename\n"
"-r restore_filename : restore scaling parameters from restore_filename\n"

5、svmtrain.exe的參数列表

"-s svm_type : set type of SVM (default 0)\n"
" 0 -- C-SVC(multi-class classification)\n"
" 1 -- nu-SVC(multi-class classification)\n"
" 2 -- one-class SVM\n"
" 3 -- epsilon-SVR(regression)\n"
" 4 -- nu-SVR(regression)\n"
"-t kernel_type : set type of kernel function (default 2)\n"
" 0 -- linear: u'*v\n"
" 1 -- polynomial: (gamma*u'*v + coef0)^degree\n"
" 2 -- radial basis function: exp(-gamma*|u-v|^2)\n"
" 3 -- sigmoid: tanh(gamma*u'*v + coef0)\n"
" 4 -- precomputed kernel (kernel values in training_set_file)\n"
"-d degree : set degree in kernel function (default 3)\n"
"-g gamma : set gamma in kernel function (default 1/num_features)\n"
"-r coef0 : set coef0 in kernel function (default 0)\n"
"-c cost : set the parameter C of C-SVC, epsilon-SVR, and nu-SVR (default 1)\n"
"-n nu : set the parameter nu of nu-SVC, one-class SVM, and nu-SVR (default 0.5)\n"
"-p epsilon : set the epsilon in loss function of epsilon-SVR (default 0.1)\n"
"-m cachesize : set cache memory size in MB (default 100)\n"
"-e epsilon : set tolerance of termination criterion (default 0.001)\n"
"-h shrinking : whether to use the shrinking heuristics, 0 or 1 (default 1)\n"
"-b probability_estimates : whether to train a SVC or SVR model for probability estimates, 0 or 1 (default 0)\n"
"-wi weight : set the parameter C of class i to weight*C, for C-SVC (default 1)\n"
"-v n: n-fold cross validation mode\n"
"-q : quiet mode (no outputs)\n"

6、经常使用FAQ

Q1: Is there a program to check if my data are in the correct format?
The svm-train program in libsvm conducts only a simple check of the input data. To do a detailed check, after libsvm 2.85, you can use the python script tools/checkdata.py. See tools/README for details.

Q2: The output of training C-SVM is like the following. What do they mean?
optimization finished, #iter = 219
nu = 0.431030
obj = -100.877286, rho = 0.424632
nSV = 132, nBSV = 107
Total nSV = 132
obj is the optimal objective value of the dual SVM problem. rho is the bias term in the decision function sgn(w^Tx - rho). nSV and nBSV are number of support vectors and bounded support vectors (i.e., alpha_i = C). nu-svm is a somewhat equivalent form of C-SVM
where C is replaced by nu. nu simply shows the corresponding parameter. More details are in libsvm document.

Q3: Should I use float or double to store numbers in the cache ?
We have float as the default as you can store more numbers in the cache. In general this is good enough but for few difficult cases (e.g. C very very large) where solutions are huge numbers, it might be possible that the numerical precision is not enough using
only float.

Q4: Does it make a big difference if I scale each attribute to [0,1] instead of [-1,1]?

For the linear scaling method, if the RBF kernel is used and parameter selection is conducted, there is no difference. Assume Mi and mi are respectively the maximal and minimal values of the ith attribute. Scaling to [0,1] means
x'=(x-mi)/(Mi-mi)
For [-1,1],
x''=2(x-mi)/(Mi-mi)-1.
In the RBF kernel,
x'-y'=(x-y)/(Mi-mi), x''-y''=2(x-y)/(Mi-mi).
Hence, using (C,g) on the [0,1]-scaled data is the same as (C,g/2) on the [-1,1]-scaled data.
Though the performance is the same, the computational time may be different. For data with many zero entries, [0,1]-scaling keeps the sparsity of input data and hence may save the time.

Q5: My data are unbalanced. Could libsvm handle such problems?

Yes, there is a -wi options. For example, if you use
> svm-train -s 0 -c 10 -w1 1 -w-1 5 data_file
the penalty for class "-1" is larger. Note that this -w option is for C-SVC only.

Q6: How can I use OpenMP to parallelize LIBSVM on a multicore/shared-memory computer?
It is very easy if you are using GCC 4.2 or after.
In Makefile, add -fopenmp to CFLAGS.
In class SVC_Q of svm.cpp, modify the for loop of get_Q to:
#pragma omp parallel for private(j)
for(j=start;j<len;j++)
In the subroutine svm_predict_values of svm.cpp, add one line to the for loop:
#pragma omp parallel for private(i)
for(i=0;i<l;i++)
kvalue[i] = Kernel::k_function(x,model->SV[i],model->param);
For regression, you need to modify class SVR_Q instead. The loop in svm_predict_values is also different because you need a reduction clause for the variable sum:
#pragma omp parallel for private(i) reduction(+:sum)
for(i=0;i<model->l;i++)
sum += sv_coef[i] * Kernel::k_function(x,model->SV[i],model->param);
Then rebuild the package. Kernel evaluations in training/testing will be parallelized. An example of running this modification on an 8-core machine using the data set ijcnn1:
8 cores:
%setenv OMP_NUM_THREADS 8
%time svm-train -c 16 -g 4 -m 400 ijcnn1
27.1sec
1 core:
%setenv OMP_NUM_THREADS 1
%time svm-train -c 16 -g 4 -m 400 ijcnn1
79.8sec
For this data, kernel evaluations take 80% of training time. In the above example, we assume you use csh. For bash, use
export OMP_NUM_THREADS=8
instead.
For Python interface, you need to add the -lgomp link option:
$(CXX) -lgomp -shared -dynamiclib svm.o -o libsvm.so.$(SHVER)
For MS Windows, you need to add /openmp in CFLAGS of Makefile.win

Q7: How could I know which training instances are support vectors?

It's very simple. Since version 3.13, you can use the function
void svm_get_sv_indices(const struct svm_model *model, int *sv_indices)
to get indices of support vectors. For example, in svm-train.c, after
model = svm_train(&prob, &param);
you can add
int nr_sv = svm_get_nr_sv(model);
int *sv_indices = Malloc(int, nr_sv);
svm_get_sv_indices(model, sv_indices);
for (int i=0; i<nr_sv; i++)
printf("instance %d is a support vector\n", sv_indices[i]);
If you use matlab interface, you can directly check
model.sv_indices

Q8: After doing cross validation, why there is no model file outputted ?
Cross validation is used for selecting good parameters. After finding them, you want to re-train the whole data without the -v option.

Q9: How do I choose the kernel?
In general we suggest you to try the RBF kernel first. A recent result by Keerthi and Lin ( download paper here) shows that if RBF is used with model selection, then there is no need to consider the linear kernel. The kernel matrix using sigmoid may not be
positive definite and in general it's accuracy is not better than RBF. (see the paper by Lin and Lin ( download paper here). Polynomial kernels are ok but if a high degree is used, numerical difficulties tend to happen (thinking about dth power of (<1) goes
to 0 and (>1) goes to infinity).

Q10: I press the "load" button to load data points but why svm-toy does not draw them ?

The program svm-toy assumes both attributes (i.e. x-axis and y-axis values) are in (0,1). Hence you want to scale your data to between a small positive number and a number less than but very close to 1. Moreover, class labels must be 1, 2, or 3 (not 1.0, 2.0
or anything else).

Q11：Feature selection tool
This is a simple python script (download here) to use F-score for selecting features. To run it, please put it in the sub-directory "tools" of LIBSVM.
Usage: ./fselect.py training_file [testing_file]
Output files: .fscore shows importance of features, .select gives the running log, and .pred gives testing results.
More information about this implementation can be found in Y.-W. Chen and C.-J. Lin,Combining SVMs with various feature selection strategies. To appear in the
book "Feature extraction, foundations and applications." 2005. This implementation is still preliminary. More comments are very welcome.

Q12：Weights for data instances
Users can give a weight to each data instance. For LIBSVM users, please download thezip file (MATLAB and Python interfaces are included). You
must store weights in a separated file and specify -W your_weight_file. This setting is different from earlier versions where weights are in the first column of training data.
1)Training/testing sets are the same as those for standard LIBSVM/LIBLINEAR.
2)We do not support weights for test data.
3)All solvers are supported.
4)Matlab/Python interfaces for both LIBSVM/LIBLIENAR are supported.

Q13：Binary-class Cross Validation with Different Criteria

參考文档：https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/eval/index.html

windows下使用libsvm3.2的更多相关文章

【转】Windows下使用libsvm中的grid.py和easy.py进行参数调优
libsvm中有进行参数调优的工具grid.py和easy.py可以使用,这些工具可以帮助我们选择更好的参数,减少自己参数选优带来的烦扰. 所需工具:libsvm.gnuplot 本机环境:Windo ...
在windows下安装gulp —— 基于 Gulp 的前端集成解决方案（一）
相关连接导航在windows下安装gulp —— 基于 Gulp 的前端集成解决方案(一) 执行 $Gulp 时发生了什么 —— 基于 Gulp 的前端集成解决方案(二) 常用 Gulp 插件汇总 ...
让 windows 下的命令行程序 cmd.exe 用起来更顺手
在 Windows 下使用 Larave 框架做开发,从 Composer 到 artisan 总是避免不了和 cmd.exe 打交道,系统默认的命令行界面却是不怎么好看,且每行显示的字符数是做了限制 ...
Windows下Visual studio 2013 编译 Audacity
编译的Audacity版本为2.1.2,由于实在windows下编译,其源代码可以从Github上取得 git clone https://github.com/audacity/audacity. ...
Windows下Nginx配置SSL实现Https访问（包含证书生成）
Vincent.李 Windows下Nginx配置SSL实现Https访问(包含证书生成) Windows下Nginx配置SSL实现Https访问(包含证书生成) 首先要说明为什么要实现https ...
关于Linux和Windows下部署mysql.data.dll的注册问题
mysql ado.net connector下载地址: http://dev.mysql.com/downloads/connector/net/ 选择版本: Generally Available ...
windows下配置apache+php环境
PHP安装由于windows下php扩展5.6的多余7.0,故以php5.6为开发环境.如果对扩展要求不高,可以使用php7,安装过程类似. 约定: 环境安装目录: D:/phpsetup/ |-- ...
windows下获取IP地址的两种方法
windows下获取IP地址的两种方法: 一种可以获取IPv4和IPv6,但是需要WSAStartup: 一种只能取到IPv4,但是不需要WSAStartup: 如下: 方法一:(可以获取IPv4和I ...
Windows下PowerShell监控Keepalived
一.背景某数据库服务器为CentOS,想要监控Keepalived的VIP是否有问题,通过邮件进行报警,但这台机器不能上外网,现在只能在Windows下通过PowerShell来完成发邮件预警. 二 ...

随机推荐

MySQL DROP TABLE操作以及 DROP 大表时的注意事项
语法: 删表 DROP TABLE Syntax DROP [TEMPORARY] TABLE [IF EXISTS] tbl_name [, tbl_name] ... [RESTRICT | CA ...
2015.04.19,外语,读书笔记-《Word Power Made Easy》 11 “如何辱骂敌人” SESSION 29
1.the French drillmaster 法国国王路易十五手下的Jean Martinet将军,是Infantry(['infәntri] n. 步兵)的检察长,是一个非常严格的drillma ...
DB-SQL-MySQL-杂项-调优：Mysql千万以上数据优化、SQL优化方法
ylbtech-DB-SQL-MySQL-杂项-调优:Mysql千万以上数据优化.SQL优化方法 1.返回顶部 1. 1,单库表别太多,一般保持在200以下为宜 2,尽量避免SQL中出现运算,例如se ...
.NET序列化工具Jil、Json.NET和Protobuf的简单测评
前一段时间逛园子的时候发现有人比较了Jil.Json.NET和Protobuf的性能,一时好奇,也做了个测试,这里记录下来,以供查阅. 前期准备依赖类库的话,可以通过Nuget在公共组件库总下载,这 ...
Sumblime Text3中使用vue-cli创建vue项目，代码不高亮，解决
问题如下:在Sumblime Text3中打开vue-cli常见的项目,代码一片灰色解决如下: 第一步:下载文件Vue components 链接 GitHub - vuejs/vue-synta ...
洛谷P3707 [SDOI2017]相关分析(线段树)
题目描述 Frank对天文学非常感兴趣,他经常用望远镜看星星,同时记录下它们的信息,比如亮度.颜色等等,进而估算出星星的距离,半径等等. Frank不仅喜欢观测,还喜欢分析观测到的数据.他经常分析两个 ...
UNP学习笔记3——基本UDP套接字编程
1 概述 TCP和UDP网络编程存在一些本质的差异,主要是由于传输层的差别:UDP是无连接的不可靠的数据报协议,而TCP是面向连接的字节流协议. 下图是典型的UDP客户端和服务器之间的通信流程.客户不 ...
TESTUSERB 仅能对TESTUSERA 用户下的某些表增删改查、有些表仅能对某些列update，查询TESTUSERB 用户权限，获取批量赋予语句。
TESTUSERB 仅能对TESTUSERA 用户下的某些表增删改查.有些表仅能对某些列update,查询TESTUSERB 用户权限,获取批量赋予语句. select 'grant '|| PRIV ...
Django路由URL
URL配置(URLconf)就像Django所支撑网站的目录.URL与要为该URL调用的视图函数之间的映射表. URLconf配置样式: from django.conf.urls import u ...
11、E-commerce in Your Inbox:Product Recommendations at Scale-----产品推荐（prod2vec和user2vec)
一.摘要本文提出一种方法,将神经语言模型应用在用户购买时间序列上,将产品嵌入到低维向量空间中.结果,具有相似上下文(即,其周围购买)的产品被映射到嵌入空间中附近的向量. 二.模型: 低维项目向量表示 ...

windows下使用libsvm3.2

4、svmscale.exe參数说明

windows下使用libsvm3.2的更多相关文章

随机推荐

热门专题