Abstract

We describe and analyze a simple and effective iterative algorithm for solving the optimization problem cast by Support Vector Machines (SVM). Our method alternates between stochastic gradient descent steps and projection steps. We prove that the number of iterations required to obtain a solution of accuracy is . In contrast, previous analyses of stochastic gradient descent methods require iterations. As in previous devised SVM solvers, the number of iterations also scales linearly with , where is the regularization parameter of SVM. For a linear kernel, the total run-time of our method is , where is a bound on the number of non-zero features in each example. Since the run-time does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach can seamlessly be adapted to employ non-linear kernels while working solely on the primal objective function. We demonstrate the efficiency and applicability of our approach by conducting experiments on large text classification problems, comparing our solver to existing state-of-the art SVM solvers. For example, it takes less than 5 seconds for our solver to converge when solving a text classification problem from Reuters Corpus Volume 1 (RCV 1) with training examples.

1. Introduction

Support Vector Machines (SVMs) are effective and popular classification learning tool. The task of learning a support vector machine is cast as a constrained quadratic programming. However, in its native form, it is in fact an unconstrained empirical loss minimization with a penalty term for the norm of the classifier that is being learned. Formally, given a training set , where and , we would like to find the minimizer of the problem

(1)

where

(1)

We denote the objective function of Eq. (1) by . An optimization method finds an -accurate solution if . The original SVM problem also includes a bias term, . We omit the bias throughout the first sections and defer the description of an extension which employs a bias term to Sec. 4.

We describe and analyze in this paper a simple iterative algorithm, called Pegasos, for solving Eq. (1). The algorithm performs iterations and also requires an additional parameter , whose role is explained in the sequel. Pegasos alternates between stochastic subgradient descent steps and projection steps. The parameter determines the number of examples from the algorithm uses on each iteration for estimating the subgradient. When , Pegasos reduces to a variant of the subgradient projection method. We show that in this case the number of iterations that is required in order to achieve an -accurate solution is . At the other extreme, when , we recover a variant of the stochastic (sub) gradient method. In the stochastic case, we analyze the probability of obtaining a good approximate solution. Specifically, we show that with probability of at least our algorithm finds an -accurate solution using only iterations, while each iteration involves a single inner product between and . This rate of convergence does not depend on the size of the training set and thus our algorithm is especially suited for large datasets.

2. The Pegasos Algorithm

In this section we describe the Pegasos algorithm for solving the optimization problem given in Eq. (1). The algorithm receives as input two parameters: - the number of iterations to perform; - the number of examples to use for calculating sub-gradients. Initially, we set to any vector whose norm is at most . On iteration of the algorithm, we first choose a set of size . Then, we replace the objective in Eq. (1) with an approximate objective function,

.

Note that we overloaded our original definition of as the original objective can be denoted either as or as . We interchangeably use both notations depending on the context. Next, we set the learning rate and define to be the set of examples for which suffers a non-zero loss. We now perform a two-step update as follows. We scale by and for all examples we add to the vector . We denote the resulting vector by . This step can be also written as , where

(1)

The definition of the hinge-loss implies that is a sub-gradient of at . Last, we set to be the projection of onto the set

(1)

That is, is obtained by scaling by . As we show in our analysis below, the optimal solution of SVM is in the set . Informally speaking, we can always project back onto the set as we only get closer to the optimum. The output of Pegasos is the last vector .

Note that if we choose on each round then we obtain the sub-gradient projection method. On the other extreme, if we choose to contain a single randomly selected example, then we recover a variant of the stochastic gradient method. In general, we allow to be a set of examples sampled i.i.d. from .

We conclude this section with a short discussion of implementation details when the instances are sparse, namely, when each instance has very few non-zero elements. In this case, we can represent as a triplet where is a dense vector and are scalars. The vector is defined through the triplet as follows: and stores the squared norm of , . Using this representation, it is easily verified that the total number of operations required for performing one iteration of Pegasos with is , where is the number of non-zero elements in .

3. Analysis

In this section we analyze the convergence properties of Pegasos. Throughout this section we denote

(1)

Recall that on each iteration of the algorithm, we focus on an instantaneous objective function

Pegasos: Primal Estimated sub-GrAdient Solver for SVM的更多相关文章

  1. [转] 从零推导支持向量机 (SVM)

    原文连接 - https://zhuanlan.zhihu.com/p/31652569 摘要 支持向量机 (SVM) 是一个非常经典且高效的分类模型.但是,支持向量机中涉及许多复杂的数学推导,并需要 ...

  2. 损失函数(Loss Function) -1

    http://www.ics.uci.edu/~dramanan/teaching/ics273a_winter08/lectures/lecture14.pdf Loss Function 损失函数 ...

  3. 常见数据挖掘算法的Map-Reduce策略(2)

           接着上一篇文章常见算法的mapreduce案例(1)继续挖坑,本文涉及到算法的基本原理,文中会大概讲讲,但具体有关公式的推导还请大家去查阅相关的文献文章.下面涉及到的数据挖掘算法会有:L ...

  4. 逻辑回归原理_挑战者飞船事故和乳腺癌案例_Python和R_信用评分卡(AAA推荐)

    sklearn实战-乳腺癌细胞数据挖掘(博客主亲自录制视频教程) https://study.163.com/course/introduction.htm?courseId=1005269003&a ...

  5. 基于MNIST数据的softmax regression

    跟着tensorflow上mnist基本机器学习教程联系 首先了解sklearn接口: sklearn.linear_model.LogisticRegression In the multiclas ...

  6. [Scikit-learn] 1.1 Generalized Linear Models - Logistic regression & Softmax

    二分类:Logistic regression 多分类:Softmax分类函数 对于损失函数,我们求其最小值, 对于似然函数,我们求其最大值. Logistic是loss function,即: 在逻 ...

  7. Factorization Machine算法

    参考: http://stackbox.cn/2018-12-factorization-machine/ https://baijiahao.baidu.com/s?id=1641085157432 ...

  8. LibLinear(SVM包)使用说明之(一)README

    转自:http://blog.csdn.net/zouxy09/article/details/10947323/ LibLinear(SVM包)使用说明之(一)README zouxy09@qq.c ...

  9. SVM应用

    我在项目中应用的SVM库是国立台湾大学林智仁教授开发的一套开源软件,主要有LIBSVM与LIBLINEAR两个,LIBSVM是对非线性数据进行分类,大家也比较熟悉,LIBLINEAR是对线性数据进行分 ...

随机推荐

  1. iOS 开发之使用safari对webview进行调试

    转自:http://www.tuicool.com/articles/ZBFnUbz 使用safari对webview进行调试 时间 2016-02-25 14:35:20  陈斌彬的技术博客 原文  ...

  2. SCOI2009粉刷匠

    Description windy有 N 条木板需要被粉刷. 每条木板被分为 M 个格子. 每个格子要被刷成红色或蓝色. windy每次粉刷,只能选择一条木板上一段连续的格子,然后涂上一种颜色. 每个 ...

  3. Python_Day7_面向对象学习

    1.面向对象编程介绍 2.为什么要用面向对象进行开发? 3.面向对象的特性:封装.继承.多态 4.类.方法. 面向过程 VS 面向对象 编程范式 编程是程序员用特定的语法+数据结构+算法组成的代码来告 ...

  4. centos5.11 repo 安装mysql5.7

    http://dev.mysql.com/doc/refman/5.7/en/linux-installation-yum-repo.html mysql yum repo 安装说明 http://d ...

  5. RedHat下Bugzilla的安装和配置

    Bugzilla 是一个开源的缺陷跟踪系统(Bug-Tracking System). OS:RedHat Linux 软件类型:开源 架构:B/S server端模块开发语言:perl(c/c++) ...

  6. java selenium后报错Element not found in the cache元素定位要重新赋值之前的定义

    习惯上把定位的元素在操作之前就定位好, 例如: WebElement element1=driver.findElement(...);      ----------declaration1 Web ...

  7. 使用scrapy创建工程

    前提:先创建一个文件夹用来存放爬虫工程 创建项目命令: scrapy startproject <project_name> 例子: scrapy startproject myproje ...

  8. mybatis入门总结

    背景: 最近“大胆地”把原本一个通过简单的JDBC连接数据库进行修改和查找操作的小项目改成用mybatis了.. 周四得到任务,周一要完成的,说是要添加查询条件和添加查询字段,修改的字段也多了几个,才 ...

  9. angularJs,ionic字符串操作

    1.首先我们需要把一段"文本或字符串"中的我们想进行操作的"字符串","字"筛选出来,代码如下: app.filter('replaceCo ...

  10. ldconfig和ldd用法

    一.ldconfig ldconfig是一个动态链接库管理命令,为了让动态链接库为系统所共享,还需运行动态链接库的管理命令--ldconfig. ldconfig 命令的用途,主要是在默认搜寻目录(/ ...