Traditionally, many classification problems try to solve the two or multi-class situation. The goal of the machine learning application is to distinguish test data between a number of classes, using training data. But what if you only have data of one class and the goal is to test new data and found out whether it is alike or not like the training data? A method for this task, which gained much popularity the last two decades, is the One-Class Support Vector Machine. This (quite lengthly) blog post will give an introduction to this technique and will show the two main approaches.

Just one class?

First look at our problem situation; we would like to determine whether (new) test data is member of a specific class, determined by our training data, or is not. Why would we want this? Imagine a factory type of setting; heavy machinery under constant surveillance of some advanced system. The task of the controlling system is to determine when something goes wrong; the products are below quality, the machine produces strange vibrations or something like a temperature that rises. It is relatively easy to gather training data of situations that are OK; it is just the normal production situation. But on the other side, collection example data of a faulty system state can be rather expensive, or just impossible. If a faulty system state could be simulated, there is no way to guarantee that all the faulty states are simulated and thus recognized in a traditional two-class problem.

To cope with this problem, one-class classification problems (and solutions) are introduced. By just providing the normal training data, an algorithm creates a (representational) model of this data. If newly encountered data is too different, according to some measurement, from this model, it is labeled as out-of-class. We will look in the application of Support Vector Machines to this one-class problem.

Basic concepts of Support Vector Machines

Let us first take a look at the traditional two-class support vector machine. Consider a data set Ω={(x1,y1),(x2,y2),…,(xn,yn)}; points xi∈Rd in a (for instance two-dimensional) space where xi is the i-th input data point and yi∈{−1,1} is the i-th output pattern, indicating the class membership.

A very nice property of SVMs is that it can create a non-linear decision boundary by projecting the data through a non-linear function ϕ to a space with a higher dimension. This means that data points which can’t be separated by a straight line in their original space I are “lifted” to a feature space F where there can be a “straight” hyperplane that separates the data points of one class from an other. When that hyperplane would be projected back to the input space I, it would have the form of a non-linear curve. The following video illustrates this process; the blue dots (in the white circle) can not be linearly separated from the red dots. By using a polynomial kernel for projection (later more on that) all the dots are lifted into the third dimension, in which a hyperplane can be used for separation. When the intersection of the plane with the space is projected back to the two dimensional space, a circular boundary arises.

The hyperplane is represented with the equation wTx+b=0, with w∈F and b∈R. The hyperplane that is constructed determines the margin between the classes; all the data points for the class −1 are on one side, and all the data points for class 1 on the other. The distance from the closest point from each class to the hyperplane is equal; thus the constructed hyperplane searches for the maximal margin (“separating power”) between the classes. To prevent the SVM classifier from over-fitting with noisy data (or to create a soft margin), slack variables ξi are introduced to allow some data points to lie within the margin, and the constant C>0 determines the trade-off between maximizing the margin and the number of training data points within that margin (and thus training errors). The objective function of the SVM classifier is the following minimization formulation:

minw, b, ξi∥w∥22+C∑i=1nξi subject to: yi(wTϕ(xi)+b)≥1−ξiξi≥0 for all i=1,…,n for all i=1,…,n

When this minimization problem (with quadratic programming) is solved using Lagrange multipliers, it gets really interesting. The decision function (classification) rule for a data point x then becomes:

f(x)=sgn(∑i=1nαiyiK(x,xi)+b)

Here αi are the Lagrange multipliers; every αi>0 is weighted in the decision function and thus “supports” the machine; hence the name Support Vector Machine. Since SVMs are considered to be sparse, there will be relatively few Lagrange multipliers with a non-zero value.

Kernel Function

The function K(x,xi)=ϕ(x)Tϕ(xi) is known as the kernel function. Since the outcome of the decision function only relies on the dot-product of the vectors in the feature space F (i.e. all the pairwise distances for the vectors), it is not necessary to perform an explicit projection to that space (as was done in the above video). As long as a function K has the same results, it can be used instead. This is known as the kernel trick and it is what gives SVMs such a great power with non-linear separable data points; the feature space F can be of unlimited dimension and thus the hyperplane separating the data can be very complex. In our calculations though, we avoid that complexity.

Popular choices for the kernel function are linear, polynomial, sigmoidal but mostly the Gaussian Radial Base Function:

K(x,x′)=exp(−∥x−x′∥22σ2)

where σ∈R is a kernel parameter and ∥x−x′∥ is the dissimilarity measure.

With this set of formulas and concepts we are able to classify a set of data point into two classes with a non-linear decision function. But, we are interested in the case of a single class of data. Roughly there are two different approaches, which we will discuss in the next two sections.

One-Class SVM according to Schölkopf

The Support Vector Method For Novelty Detection by Schölkopf et al. basically separates all the data points from the origin (in feature space F) and maximizes the distance from this hyperplane to the origin. This results in a binary function which captures regions in the input space where the probability density of the data lives. Thus the function returns +1 in a “small” region (capturing the training data points) and −1 elsewhere.

The quadratic programming minimization function is slightly different from the original stated above, but the similarity is still clear:

minw, ξi, ρ12∥w∥2+1νn∑i=1nξi−ρ subject to: (w⋅ϕ(xi))≥ρ−ξiξi≥0 for all i=1,…,n for all i=1,…,n

In the previous formulation the parameter C decided the smoothness. In this formula it is the parameter ν that characterizes the solution;

  1. it sets an upper bound on the fraction of outliers (training examples regarded out-of-class) and,
  2. it is a lower bound on the number of training examples used as Support Vector.

Due to the importance of this parameter, this approach is often referred to as ν-SVM.

Again by using Lagrange techniques and using a kernel function for the dot-product calculations, the decision function becomes:

f(x)=sgn((w⋅ϕ(xi))−ρ)=sgn(∑i=1nαiK(x,xi)−ρ)

This method thus creates a hyperplane characterized by w and ρ which has maximal distance from the origin in feature space F and separates all the data points from the origin. An other method is to create a circumscribing hypersphere around the data in feature space. This following section will show that approach.

One-Class SVM according to Tax and Duin

The method of Support Vector Data Description by Tax and Duin (SVDD) takes a spherical, instead of planar, approach. The algorithm obtains a spherical boundary, in feature space, around the data. The volume of this hypersphere is minimized, to minimize the effect of incorporating outliers in the solution.

The resulting hypersphere is characterized by a center a and a radius R>0 as distance from the center to (any support vector on) the boundary, of which the volume R2 will be minimized. The center ais a linear combination of the support vectors (that are the training data points for which the Lagrange multiplier is non-zero). Just as the traditional formulation, it could be required that all the distances from data points xi to the center are strict less than R, but to create a soft margin again slack variables ξiwith penalty parameter C are used. The minimization problem then becomes:

minR, aR2+C∑i=1nξi subject to: ∥xi−a∥2≤R2+ξiξi≥0 for all i=1,…,n for all i=1,…,n

After solving this by introduction Lagrange multipliers αi, a new data point z can be tested to be in or out of class. It is considered in-class when the distance to the center is smaller than or equal to the radius, by using the Gaussian kernel as a distance function over two data points:

∥z−x∥2=∑i=1nαiexp(−∥z−xi∥2σ2)≥−R2/2+CR

You can see the similarity with the traditional two-class method, the algorithm by Schölkopf and Tax and Duin. So far the theoretical fundamentals of Support Vector Machines. Lets take a very quick look to some applications of this method.

Applications (in Matlab)

A very good and much used library for SVM-classification is LibSVM, which can be used for Matlab. Out of the box it supports one-class SVM following the method of Schölkopf. Also available in the LibSVM tools is the method for SVDD, following the algorithm of Tax and Duin.

To give a nice visual clearification of how the kernel mapping (to feature space F works), I created asmall Matlab script that lets you create two data sets, red and blue dots (note: this simulates a two-class example). After clicking, you are able to inspect the data after being projected to the three-dimensional space. The data will then result in a shape like the following image.

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374
% Demo to visualize the mapping with a Gaussian Radial Basis Function,
% especially in the context of Support Vector Machines.
%
% When this script is executed, first a collection of red points can be
% clicked on the graph.
% After that, the blue points can be generated.
% Then the user must provide a gamma value.
% The final graph can be rotated to inspect the 3D-space (use the "turn"
% icon next the hand in the toolbar)
%
% Created by Roemer Vlasveld (roemer.vlasveld@gmail.com)
%
% The blog post where this is used:
% http://rvlasveld.github.io/blog/2013/07/12/introduction-to-one-class-support-vector-machines/
%
% Please feel free to use this script to your need.
 
figure;
axis([-10 10 -10 10])
hold on
grid on;
% Initially, the list of points is empty.
red = [];
blue = [];
 
% Loop, picking up the points for the red class.
disp('---')
disp('Click in the graph for the red points, e.g. in a wide circular form')
disp('Left mouse button picks points.')
disp('Right mouse button picks last point.')
but = 1;
n = 0;
while but == 1
[xi,yi,but] = ginput(1);
plot(xi,yi,'ro')
n = n+1;
red(:,n) = [xi;yi];
end
 
disp('Finished collection red points')
disp('---')
 
% Loop again, picking up the points for the blue class
disp('Now click in the graph for the blue points, e.g. in a smaller circular form')
disp('Left mouse button picks points.')
disp('Right mouse button picks last point.')
but = 1;
n = 0;
while but == 1
[xi,yi,but] = ginput(1);
plot(xi,yi,'bo')
n = n+1;
blue(:,n) = [xi;yi];
end
 
disp('Finished collection blue points')
disp('---')
 
sigma = input('sigma = ? (default value: 1): ');
if isempty(sigma)
sigma = 1;
end
 
project = @(data, sigma) sum(exp(-(squareform( pdist(data, 'euclidean') .^ 2) ./ ( 2*sigma^2))));
 
blue_z = project(blue', gamma);
red_z = project(red', gamma);
 
clf;
hold on;
grid on;
scatter3(red(1,:), red(2,:), red_z, 'r');
scatter3(blue(1,:), blue(2,:), blue_z, 'b');

Application to change detection

As a conclusion to this post I will give a look at the perspective from which I am using one-class SVMs in my current research for my master thesis (which is performed at the Dutch research company Dobots). My goal is to detect change points in a time series data; also known as novelty detection. One-class SVMs have already been applied to novelty detection for time series data. I will apply it specifically to accelerometer data, collection by smartphone sensors. My theory is that when the change points in the time series are explicitly discovered, representing changes in the activity performed by the user, the classification algorithms should perform better. Probably in a next post I will take a further look at an algorithm for novelty detection using one-class Support Vector Machines.

Update: GitHub repository

Currently I am using the SVDD method by Tax and Duin to implement change detection and temporal segmentation for accelerometer data. I am using the Matlab dd_tools package, created by Tax, for the incremental version of SVDD. You can use my implementation and fork it from the oc_svm github repository. Most of the functions are documented, but it is under heavy development and thus the precise workings differ from time to time. I am planning to write a good readme, but if you are interested I advise you to look at the apply_inc_svdd.m file, which creates the SVM classifier and extracts properties from the constructed model.

by Roemer Vlasveld Jul 12th, 2013 posted in change detectionclassificationmachine learningmatlabnovelty detectionsupport vector machinesvm

Introduction to One-class Support Vector Machines的更多相关文章

  1. Support Vector Machines for classification

    Support Vector Machines for classification To whet your appetite for support vector machines, here’s ...

  2. Machine Learning - 第7周(Support Vector Machines)

    SVMs are considered by many to be the most powerful 'black box' learning algorithm, and by posing构建 ...

  3. Ng第十二课:支持向量机(Support Vector Machines)(三)

    11 SMO优化算法(Sequential minimal optimization) SMO算法由Microsoft Research的John C. Platt在1998年提出,并成为最快的二次规 ...

  4. 【原】Coursera—Andrew Ng机器学习—课程笔记 Lecture 12—Support Vector Machines 支持向量机

    Lecture 12 支持向量机 Support Vector Machines 12.1 优化目标 Optimization Objective 支持向量机(Support Vector Machi ...

  5. 【Supervised Learning】支持向量机SVM (to explain Support Vector Machines (SVM) like I am a 5 year old )

    Support Vector Machines 引言 内核方法是模式分析中非常有用的算法,其中最著名的一个是支持向量机SVM 工程师在于合理使用你所拥有的toolkit 相关代码 sklearn-SV ...

  6. (原创)Stanford Machine Learning (by Andrew NG) --- (week 7) Support Vector Machines

    本栏目内容来源于Andrew NG老师讲解的SVM部分,包括SVM的优化目标.最大判定边界.核函数.SVM使用方法.多分类问题等,Machine learning课程地址为:https://www.c ...

  7. Andrew Ng机器学习编程作业:Support Vector Machines

    作业: machine-learning-ex6 1. 支持向量机(Support Vector Machines) 在这节,我们将使用支持向量机来处理二维数据.通过实验将会帮助我们获得一个直观感受S ...

  8. Coursera 机器学习 第7章 Support Vector Machines 学习笔记

    7 Support Vector Machines7.1 Large Margin Classification7.1.1 Optimization Objective支持向量机(SVM)代价函数在数 ...

  9. 机器学习 Support Vector Machines 1

    引言 这一讲及接下来的几讲,我们要介绍supervised learning 算法中最好的算法之一:Support Vector Machines (SVM,支持向量机).为了介绍支持向量机,我们先讨 ...

随机推荐

  1. pycharm专业版激活码

    K71U8DBPNE-eyJsaWNlbnNlSWQiOiJLNzFVOERCUE5FIiwibGljZW5zZWVOYW1lIjoibGFuIHl1IiwiYXNzaWduZWVOYW1lIjoiI ...

  2. 接口自动化学习--mock

    好久没有写学习的总结,都正月十二了,但还是要来个新年快乐鸭. 一直都在看imooc的一套java接口自动化实战课程,现在看到了尾部了,然后想到之前那些testng,mock,httpclient等都没 ...

  3. 推荐一个娱乐化学习python的网站

    https://py.checkio.org/ 这个网站通过解决一些小任务引导初学者了解和使用python来处理一些实际需求.在coding的过程中还可以通过查看提示,帮助完成任务. 不过需要一点英文 ...

  4. ES6的promise函数用法讲解

    总结:Promise函数的出现极大的解决了Js中的异步调用代码逻辑编写太过复杂的问题,Promise对象让异步调用函数的流程显得更加的优雅,也更容易编写. 举例: 1. 异步调用: 假设现在我的一个页 ...

  5. 2019网易笔试题C++--丰收

    题目描述 又到了丰收的季节,恰好小易去牛牛的果园里游玩. 牛牛常说他多整个果园的每个地方都了如指掌,小易不太相信,所以他想考考牛牛. 在果园里有N堆苹果,每堆苹果的数量为ai,小易希望知道从左往右数第 ...

  6. 常用函数-filter、map、reduce、sorted

    常用函数 filter map reduce sorted和列表自带sort 待续... 一.filter函数 1.说明 filter()函数接收一个函数 f 和一个可迭代对象,这个函数 f 的作用是 ...

  7. webpack入门指南-step04

    一.建立项目 建一个文件夹,然后新建一个package.json的文件在项目根目录下 如果你使用git管理你的这个项目的话,建议你新建一个.gitignore文件,不要让git提交一些node依赖的模 ...

  8. “Hello World!”团队第七周召开的第六次会议

    博客内容: 一.会议时间 二.会议地点 三.会议成员 四.会议内容 五.todo list 六.会议照片 七.燃尽图 八 .功能说明书 一.会议时间 2017年12月6日  11:20-12:00 二 ...

  9. 对cnblogs.com的用户体验

    1.你是什么样的用户, 有什么样的心理, 对cnblogs 的期望值是什么? 我们是计算机专业学生,是奔向神奇的代码世界的旅人.希望在cnblogs上找到自己感兴趣的技术,并学到更多的知识,提升自己的 ...

  10. Javascript实现大整数加法

    记得之前面试还被问到过用两个字符串实现两个大整数相加,当时还特别好奇好好的整数相加,为什么要用字符串去执行.哈哈,感觉当时自己还是很无知的,面试官肯定特别的无奈.今天在刷算法的时候,无意中看到了为什么 ...