想学习一下SVM,所以找到了LIBSVM--A Library for Support Vector Machines,首先阅读了一下网站提供的A practical guide to SVM classification.

写一写个人认为主要的精华的东西。

SVMs is:a technique for data classification  

Goal is:to produce a model (based on training data) which predicts the target values of the test data given only the test data attributes.

Kernels:four basic kernels

Proposed Procedure:

1.transform data to the format of an SVM package

  first have to convert categorical attributes into numeric data.We recommend using m numbers to represent an m-category attribute and only one of the m numbers is one,and others are zeros. for example {red,green,blue} can be represented as (0,0,1),(0,1,0)and(1,0,0).

2.conduct simple scaling on the data

  Note:It's importance to use the same scaling factors for training and testing sets.

3.consider the RBF kernel K(x,y) = e-r||x-y||2

4.use cross-validation to find the best parameter C and r

  The cross-validation produce can prevent the overfitting problem.We recommend a "grid-search" on C and r using cross-validation.Various pairs of (C,r)values are tried and the one with the best cross-validation accuarcy is picked.Use a coarse grid to make a better region on the grid,a finer grid search on that region can be conducted.

  For very large data sets a feasible approach is to randomly choose a subset of the data set,conduct grid-search on them,and then do a better-region-only grid-search on the completly data set.

5.use the best parameter C and r to train the whole training set

6.Test

When to use Linear but not RBF Kernel ?

  If the number of features is large, one may not need to map data to a higher dimensional space. That is, the nonlinear mapping does not improve the performance.Using the linear kernel is good enough, and one only searches for the parameter C.

  C.1 Number of instances number of features  

    when the number of features is very large, one may not need to map the data.

  C.2 Both numbers of instances and features are large

    Such data often occur in document classication.LIBLINEAR is much faster than LIBSVM to obtain a model with comparable accuracy.LIBLINEAR is efficient for large-scale document classication.

  C.3 Number of instances number of features

    As the number of features is small, one often maps data to higher dimensional spaces(i.e., using nonlinear kernels).

Learn LIBSVM---a practical Guide to SVM classification的更多相关文章

  1. [笔记]A Practical Guide to Support Vector Classi cation

    <A Practical Guide to Support Vector Classication>是一篇libSVM使用入门教程以及一些实用技巧. 1. Basic Kernels: ( ...

  2. A Practical Guide to Support Vector Classi cation

    <A Practical Guide to Support Vector Classication>是一篇libSVM使用入门教程以及一些实用技巧. 1. Basic Kernels: ( ...

  3. A Practical Guide to Distributed Scrum - 分布式Scrum的实用指南 - 读书笔记

    最近读了这本IBM出的<A Practical Guide to Distributed Scrum>(分布式Scrum的实用指南),书中的章节结构比较清楚,是针对Scrum项目进行,一个 ...

  4. 信号处理的好书Digital Signal Processing - A Practical Guide for Engineers and Scientists

    诚心给大家推荐一本讲信号处理的好书<Digital Signal Processing - A Practical Guide for Engineers and Scientists>[ ...

  5. 【SVM】A Practical Guide to Support Vector Classi cation

    零.简介 一般认为,SVM比神经网络要简单. 优化目标:

  6. Putting Apache Kafka To Use: A Practical Guide to Building a Stream Data Platform-part 1

    转自: http://www.confluent.io/blog/stream-data-platform-1/ These days you hear a lot about "strea ...

  7. The Practical Guide to Empathy Maps: 10-Minute User Personas

    That’s where the empathy map comes in. When created correctly, empathy maps serve as the perfect lea ...

  8. Putting Apache Kafka To Use: A Practical Guide to Building a Stream Data Platform-part 2

    转自: http://confluent.io/blog/stream-data-platform-2          http://www.infoq.com/cn/news/2015/03/ap ...

  9. Parsing techniques: a practical guide下载

    轮子哥隆重推荐的书,一行代码.一句公式都没有,但是却什么都讲明白了的:<Parsing Techniques>.第一版官网免费下载,第二版多出来的东西你们用不上不用看了.全书只讲parsi ...

随机推荐

  1. Javascript 文件的同步加载与异步加载

    HTML 4.01 的script属性 charset: 可选.指定src引入代码的字符集,大多数浏览器忽略该值.defer: boolean, 可选.延迟脚本执行,相当于将script标签放入页面b ...

  2. A(51)和C(51)相互调用

    C语言是一种编译型程序设计语言,它兼顾了多种高级语言的特点,并可以调用汇编语言的子程序.用C语言设计开发微控制器程序已成为一种必然的趋势.Franklin C51是一种专门针对Intel 8051系列 ...

  3. 用JSON 和 Google 实现全文翻译

    unit Unit1; interface uses Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms ...

  4. < IOS > 论苹果数据持久化。

    最近做的音乐播放器,用了太多的数据存储.在各种APP中无可避免的要用到数据存储.在IOS中,给了很多办法进行数据持久化.但是万宗不离其变,都是要存储到本地中,IOS提供了沙盒机制,沙盒有多大呢???这 ...

  5. LeeCode(Database)-Duplicate Emails

    Write a SQL query to find all duplicate emails in a table named Person. +----+---------+ | Id | Emai ...

  6. Installing — pylibmc 1.2.3 documentation

    Installing - pylibmc 1.2.3 documentation libmemcached

  7. hdu 5630 Rikka with Chess

    来自官方题解: AC代码: #pragma comment(linker, "/STACK:1024000000,1024000000") #include<iostream ...

  8. 一起talk GDB吧(第七回:GDB监视功能)

    各位看官们.大家好.上一回中我们说的是GDB改动程序执行环境的功能.而且说了怎样使用GDB改动变量 的值.这一回中.我们继续介绍GDB的调试功能:监视功能.当然了,我们也会介绍怎样使用GDB的监视功 ...

  9. python3 module中__init__.py的需要注意的地方

    网上关于__init__.py的作用的资料到处都是,我在此就不再啰嗦哪些了. 若有需要.请各位看官去搜搜即可. 最近刚开始用Python3 就遇到了这个比较有意思的事情 闲言少叙,下面要介绍的是pyt ...

  10. 【Java基础】foreach循环

    从一个小程序说起: class lesson6foreach { public static void main(String[] args) { int array[]={2,3,1,5,4,6}; ...