Learn LIBSVM---a practical Guide to SVM classification
想学习一下SVM,所以找到了LIBSVM--A Library for Support Vector Machines,首先阅读了一下网站提供的A practical guide to SVM classification.
写一写个人认为主要的精华的东西。
SVMs is:a technique for data classification
Goal is:to produce a model (based on training data) which predicts the target values of the test data given only the test data attributes.
Kernels:four basic kernels
Proposed Procedure:
1.transform data to the format of an SVM package
first have to convert categorical attributes into numeric data.We recommend using m numbers to represent an m-category attribute and only one of the m numbers is one,and others are zeros. for example {red,green,blue} can be represented as (0,0,1),(0,1,0)and(1,0,0).
2.conduct simple scaling on the data
Note:It's importance to use the same scaling factors for training and testing sets.
3.consider the RBF kernel K(x,y) = e-r||x-y||2
4.use cross-validation to find the best parameter C and r
The cross-validation produce can prevent the overfitting problem.We recommend a "grid-search" on C and r using cross-validation.Various pairs of (C,r)values are tried and the one with the best cross-validation accuarcy is picked.Use a coarse grid to make a better region on the grid,a finer grid search on that region can be conducted.
For very large data sets a feasible approach is to randomly choose a subset of the data set,conduct grid-search on them,and then do a better-region-only grid-search on the completly data set.
5.use the best parameter C and r to train the whole training set
6.Test
When to use Linear but not RBF Kernel ?
If the number of features is large, one may not need to map data to a higher dimensional space. That is, the nonlinear mapping does not improve the performance.Using the linear kernel is good enough, and one only searches for the parameter C.
C.1 Number of instances number of features
when the number of features is very large, one may not need to map the data.
C.2 Both numbers of instances and features are large
Such data often occur in document classication.LIBLINEAR is much faster than LIBSVM to obtain a model with comparable accuracy.LIBLINEAR is efficient for large-scale document classication.
C.3 Number of instances number of features
As the number of features is small, one often maps data to higher dimensional spaces(i.e., using nonlinear kernels).
Learn LIBSVM---a practical Guide to SVM classification的更多相关文章
- [笔记]A Practical Guide to Support Vector Classication
<A Practical Guide to Support Vector Classication>是一篇libSVM使用入门教程以及一些实用技巧. 1. Basic Kernels: ( ...
- A Practical Guide to Support Vector Classication
<A Practical Guide to Support Vector Classication>是一篇libSVM使用入门教程以及一些实用技巧. 1. Basic Kernels: ( ...
- A Practical Guide to Distributed Scrum - 分布式Scrum的实用指南 - 读书笔记
最近读了这本IBM出的<A Practical Guide to Distributed Scrum>(分布式Scrum的实用指南),书中的章节结构比较清楚,是针对Scrum项目进行,一个 ...
- 信号处理的好书Digital Signal Processing - A Practical Guide for Engineers and Scientists
诚心给大家推荐一本讲信号处理的好书<Digital Signal Processing - A Practical Guide for Engineers and Scientists>[ ...
- 【SVM】A Practical Guide to Support Vector Classication
零.简介 一般认为,SVM比神经网络要简单. 优化目标:
- Putting Apache Kafka To Use: A Practical Guide to Building a Stream Data Platform-part 1
转自: http://www.confluent.io/blog/stream-data-platform-1/ These days you hear a lot about "strea ...
- The Practical Guide to Empathy Maps: 10-Minute User Personas
That’s where the empathy map comes in. When created correctly, empathy maps serve as the perfect lea ...
- Putting Apache Kafka To Use: A Practical Guide to Building a Stream Data Platform-part 2
转自: http://confluent.io/blog/stream-data-platform-2 http://www.infoq.com/cn/news/2015/03/ap ...
- Parsing techniques: a practical guide下载
轮子哥隆重推荐的书,一行代码.一句公式都没有,但是却什么都讲明白了的:<Parsing Techniques>.第一版官网免费下载,第二版多出来的东西你们用不上不用看了.全书只讲parsi ...
随机推荐
- Starship Troopers(HDU 1011 树形DP)
题意: 给定n个定点和m个士兵,n个定点最终构成一棵树,每个定点有一定x个bugs和y个value,每20个bug需要消耗一个士兵,不足20也消耗一个,然后最终收获y个value,只有父节点被占领后子 ...
- JDBC的使用——Statement
JDBC是Java最基本的数据库操作途径,虽然现在有了更高端的Hibernate和JPA,但是其实它们的底层还是用的这些最基本的JDBC.而且,如果开发一个小型的应用程序,使用Hibernate不免有 ...
- C语言简单strcat和strcmp的实现
对于C标准库中的字符串处理函数应该平常用的比较多:简单实现strcat和strcmp _strcpy: char *_strcpy(char *dest, char *src) { char *buf ...
- Lake Counting (POJ No.2386)
有一个大小为N*M的园子,雨后积起了水,八连通的积水被认为是链接在一起的求出园子里一共有多少水洼? *** *W* *** /** *进行深度优先搜索,从第一个W开始,将八个方向可以到达的 W修改为 ...
- Python关于eval与json在字典转换方面的性能比较
背景介绍 因为python中有eval()方法,可以很方便的将一些字符串类型与字典等数据结构之间进行转换, 所以公司的数据处理同事在保存一些特殊数据时就直接将字典的字符串保存在数据库中. 在程序中读取 ...
- C#listbox使用方法
1. 属性列表: SelectionMode 组件中条目的选择类型,即多选(Multiple).单选(Single) Rows 列表框中显示总共多少行 Sel ...
- Elasticlunr.js 简单介绍
Elasticlunr.js 项目地址:http://elasticlunr.com/ 代码地址:https://github.com/weixsong/elasticlunr.js 文档地址:htt ...
- 修改linux共享内存大小
这是实际linux系统显示的实际数据: beijibing@bjb-desktop:/proc/sys/kernel$ cat shmmax 33554432 beijibing@bjb-deskt ...
- Linux以及Android开发中的小技巧和长繁命令记录收集
不断更新收集中.... 201407161654 ssh以nx_guest的身份登录到172.24.221.137,然后在172.24.221.137与172.24.61.252的8080port建立 ...
- LR实战之Discuz开源论坛——网页细分图结果分析(Web Page Diagnostics)
续LR实战之Discuz开源论坛项目,之前一直是创建虚拟用户脚本(Virtual User Generator)和场景(Controller),现在,终于到了LoadRunner性能测试结果分析(An ...