Learn LIBSVM---a practical Guide to SVM classification
想学习一下SVM,所以找到了LIBSVM--A Library for Support Vector Machines,首先阅读了一下网站提供的A practical guide to SVM classification.
写一写个人认为主要的精华的东西。
SVMs is:a technique for data classification
Goal is:to produce a model (based on training data) which predicts the target values of the test data given only the test data attributes.
Kernels:four basic kernels
Proposed Procedure:
1.transform data to the format of an SVM package
first have to convert categorical attributes into numeric data.We recommend using m numbers to represent an m-category attribute and only one of the m numbers is one,and others are zeros. for example {red,green,blue} can be represented as (0,0,1),(0,1,0)and(1,0,0).
2.conduct simple scaling on the data
Note:It's importance to use the same scaling factors for training and testing sets.
3.consider the RBF kernel K(x,y) = e-r||x-y||2
4.use cross-validation to find the best parameter C and r
The cross-validation produce can prevent the overfitting problem.We recommend a "grid-search" on C and r using cross-validation.Various pairs of (C,r)values are tried and the one with the best cross-validation accuarcy is picked.Use a coarse grid to make a better region on the grid,a finer grid search on that region can be conducted.
For very large data sets a feasible approach is to randomly choose a subset of the data set,conduct grid-search on them,and then do a better-region-only grid-search on the completly data set.
5.use the best parameter C and r to train the whole training set
6.Test
When to use Linear but not RBF Kernel ?
If the number of features is large, one may not need to map data to a higher dimensional space. That is, the nonlinear mapping does not improve the performance.Using the linear kernel is good enough, and one only searches for the parameter C.
C.1 Number of instances number of features
when the number of features is very large, one may not need to map the data.
C.2 Both numbers of instances and features are large
Such data often occur in document classication.LIBLINEAR is much faster than LIBSVM to obtain a model with comparable accuracy.LIBLINEAR is efficient for large-scale document classication.
C.3 Number of instances number of features
As the number of features is small, one often maps data to higher dimensional spaces(i.e., using nonlinear kernels).
Learn LIBSVM---a practical Guide to SVM classification的更多相关文章
- [笔记]A Practical Guide to Support Vector Classication
<A Practical Guide to Support Vector Classication>是一篇libSVM使用入门教程以及一些实用技巧. 1. Basic Kernels: ( ...
- A Practical Guide to Support Vector Classication
<A Practical Guide to Support Vector Classication>是一篇libSVM使用入门教程以及一些实用技巧. 1. Basic Kernels: ( ...
- A Practical Guide to Distributed Scrum - 分布式Scrum的实用指南 - 读书笔记
最近读了这本IBM出的<A Practical Guide to Distributed Scrum>(分布式Scrum的实用指南),书中的章节结构比较清楚,是针对Scrum项目进行,一个 ...
- 信号处理的好书Digital Signal Processing - A Practical Guide for Engineers and Scientists
诚心给大家推荐一本讲信号处理的好书<Digital Signal Processing - A Practical Guide for Engineers and Scientists>[ ...
- 【SVM】A Practical Guide to Support Vector Classication
零.简介 一般认为,SVM比神经网络要简单. 优化目标:
- Putting Apache Kafka To Use: A Practical Guide to Building a Stream Data Platform-part 1
转自: http://www.confluent.io/blog/stream-data-platform-1/ These days you hear a lot about "strea ...
- The Practical Guide to Empathy Maps: 10-Minute User Personas
That’s where the empathy map comes in. When created correctly, empathy maps serve as the perfect lea ...
- Putting Apache Kafka To Use: A Practical Guide to Building a Stream Data Platform-part 2
转自: http://confluent.io/blog/stream-data-platform-2 http://www.infoq.com/cn/news/2015/03/ap ...
- Parsing techniques: a practical guide下载
轮子哥隆重推荐的书,一行代码.一句公式都没有,但是却什么都讲明白了的:<Parsing Techniques>.第一版官网免费下载,第二版多出来的东西你们用不上不用看了.全书只讲parsi ...
随机推荐
- oc语言--内存管理
一.基本原理 1.什么是内存管理 1> 移动设备的内存及其有限,每个app所能占用的内存是有限制的 2> 当app所占用的内存较多时,系统就会发出内存警告,这是需要回收一些不需要的内存空间 ...
- XmlDocument,XDocument相互转换
XmlDocument,XDocument相互转换 using System; using System.Xml; using System.Xml.Linq; namespace MyTest { ...
- python 递归展开嵌套的序列(生成器用法)
任何使用yield语句的函数都称为生成器.调用生成器函数将创建一个对象,该对象通过连续调用next()方法(在python3中是__next__())生成结果序列. next()调用使生成器函数一直运 ...
- UESTC_传输数据 2015 UESTC Training for Graph Theory<Problem F>
F - 传输数据 Time Limit: 3000/1000MS (Java/Others) Memory Limit: 65535/65535KB (Java/Others) Submit ...
- 《MATLAB数据分析与挖掘实战》赠书活动
<MATLAB数据分析与挖掘实战>是泰迪科技在数据挖掘领域探索10余年经验总结与华南师大.韩山师院.广东工大.广技师 等高校资深讲师联合倾力打造的巅峰之作.全书以实践和实用为宗旨,深度 ...
- sendto() 向广播地址发包返回errno 13, Permission denied错误
http://blog.csdn.net/guanghua2_0beta/article/details/52483916 sendto() 向广播地址发包返回errno 13, Permission ...
- IOS 网络判断
Reachability *connectionNetWork= [Reachability reachabilityForInternetConnection] ; int status = [co ...
- Ice_cream's world I
Ice_cream's world I Time Limit : 3000/1000ms (Java/Other) Memory Limit : 32768/32768K (Java/Other) ...
- 理解java设计模式之观察者模式
在生活实际中,我们经常会遇到关注一个事物数据变化的情况,例如生活中的温度记录仪,当温度变化时,我们观察它温度变化的曲线,温度记录日志等.对于这一类问题,很接近java设计模式里面的“观察者模式”,它适 ...
- Android平台抓取native crash log
Android开发中,在Java层可以方便的捕获crashlog,但对于 Native 层的 crashlog 通常无法直接获取,只能通过系统的logcat来分析crash日志. 做过 Linux 和 ...