Spark0.9.0机器学习包MLlib-Optimization代码阅读
package org.apache.spark.mllib.optimization
import org.jblas.DoubleMatrix
/**
* Class used to compute the gradient for a loss function, given a single data point.
*/
abstract class Gradient extends Serializable {
/**
* Compute the gradient and loss given the features of a single data point.
* @param data - Feature values for one data point. Column matrix of size dx1
* where d is the number of features.
* @param label - Label for this data item.
* @param weights - Column matrix containing weights for every feature.
* @return A tuple of 2 elements. The first element is a column matrix containing the computed
* gradient and the second element is the loss computed at this data point.
*/
def compute(data: DoubleMatrix, label: Double, weights: DoubleMatrix):
(DoubleMatrix, Double)
}
可以从上面的注释上看出compute的参数data是一个样本的特征(d*1维度),label就是一个double型变量,该数据点(a single data point)的标签,weights就是特征变量的回归系数也是d*1维度,该函数返回2个东西,第1个是该样本点下计算的梯度,第2个该样本点下的损失loss
/**
* Compute gradient and loss for a logistic loss function, as used in binary classification.
* See also the documentation for the precise formulation.
*/
class LogisticGradient extends Gradient {
override def compute(data: DoubleMatrix, label: Double, weights: DoubleMatrix):
(DoubleMatrix, Double) = {
val margin: Double = -1.0 * data.dot(weights)
val gradientMultiplier = (1.0 / (1.0 + math.exp(margin))) - label
val gradient = data.mul(gradientMultiplier)
val loss =
if (label > 0) {
math.log(1 + math.exp(margin))
} else {
math.log(1 + math.exp(margin)) - margin
}
(gradient, loss)
}
}
我们知道对于log-loss的表达式loss=-[y*log(g(wx))+(1-y)*log(1-g(wx))], 其中g(wx)=1/(1+exp(-wx)),二分类(0,1),对这个loss进行求w偏导,d(loss)/d(w)=[g(wx)-y] * x (为书写方便,用d代表偏导的符号了),具体的表达式推导请移步http://www.cnblogs.com/kobedeshow/p/3340240.html
/**
* Compute gradient and loss for a Least-squared loss function, as used in linear regression.
* This is correct for the averaged least squares loss function (mean squared error)
* L = 1/n ||A weights-y||^2
* See also the documentation for the precise formulation.
*/
class LeastSquaresGradient extends Gradient {
override def compute(data: DoubleMatrix, label: Double, weights: DoubleMatrix):
(DoubleMatrix, Double) = {
val diff: Double = data.dot(weights) - label
val loss = diff * diff
val gradient = data.mul(2.0 * diff)
(gradient, loss)
}
}
/**
* Compute gradient and loss for a Hinge loss function, as used in SVM binary classification.
* See also the documentation for the precise formulation.
* NOTE: This assumes that the labels are {0,1}
*/
class HingeGradient extends Gradient {
override def compute(data: DoubleMatrix, label: Double, weights: DoubleMatrix):
(DoubleMatrix, Double) = {
val dotProduct = data.dot(weights)
// Our loss function with {0, 1} labels is max(0, 1 - (2y – 1) (f_w(x)))
// Therefore the gradient is -(2y - 1)*x
val labelScaled = 2 * label - 1.0
if (1.0 > labelScaled * dotProduct) {
(data.mul(-labelScaled), 1.0 - labelScaled * dotProduct)
} else {
(DoubleMatrix.zeros(1, weights.length), 0.0)
}
}
}
hinge-loss的二分类(-1,1)的表达式是max(0,1- y * f(x)),代码中映射到(0,1),变成max(0, 1 - (2y – 1) (f(x))),这时候当样本错分的时候(也就是labelScaled * dotProduct<1),梯度是data.mul(-labelScaled),损失是1-labelScaled * dotProduct
/**
* Class used to perform steps (weight update) using Gradient Descent methods.
* For general minimization problems, or for regularized problems of the form
* min L(w) + regParam * R(w),
* the compute function performs the actual update step, when given some
* (e.g. stochastic) gradient direction for the loss L(w),
* and a desired step-size (learning rate).
*
* The updater is responsible to also perform the update coming from the
* regularization term R(w) (if any regularization is used).
*/
abstract class Updater extends Serializable {
/**
* Compute an updated value for weights given the gradient, stepSize, iteration number and
* regularization parameter. Also returns the regularization value regParam * R(w)
* computed using the *updated* weights.
* @param weightsOld - Column matrix of size dx1 where d is the number of features.
* @param gradient - Column matrix of size dx1 where d is the number of features.
* @param stepSize - step size across iterations
* @param iter - Iteration number
* @param regParam - Regularization parameter
*
* @return A tuple of 2 elements. The first element is a column matrix containing updated weights,
* and the second element is the regularization value computed using updated weights.
*/
def compute(weightsOld: DoubleMatrix, gradient: DoubleMatrix, stepSize: Double, iter: Int,
regParam: Double): (DoubleMatrix, Double)
}
compute的参数weightsOld是更新前的变量回归系数(d*1维),gradient是根据指定的损失函数计算出的当前梯度,stepSize 是步长也就是学习速率,iter 迭代次数,regParam 是正则参数值,该函数返回2个东西,第1个是更新后的回归系数,第2个是更新后的regParam * R(w) 值。
/**
* A simple updater for gradient descent *without* any regularization.
* Uses a step-size decreasing with the square root of the number of iterations.
*/
class SimpleUpdater extends Updater {
override def compute(weightsOld: DoubleMatrix, gradient: DoubleMatrix,
stepSize: Double, iter: Int, regParam: Double): (DoubleMatrix, Double) = {
val thisIterStepSize = stepSize / math.sqrt(iter)
val step = gradient.mul(thisIterStepSize)
(weightsOld.sub(step), 0)
}
}
对于梯度下降算法,w -= a*gradient,a是学习率对应代码里面的thisIterStepSize(相当于一开始步长很大,随迭代次数,增加而减小),式子中的a*gradient对应着step,最后,weightsNew=weightsOld.sub(step)
/**
* Updater for L1 regularized problems.
* R(w) = ||w||_1
* Uses a step-size decreasing with the square root of the number of iterations.
* Instead of subgradient of the regularizer, the proximal operator for the
* L1 regularization is applied after the gradient step. This is known to
* result in better sparsity of the intermediate solution.
* The corresponding proximal operator for the L1 norm is the soft-thresholding
* function. That is, each weight component is shrunk towards 0 by shrinkageVal.
* If w > shrinkageVal, set weight component to w-shrinkageVal.
* If w < -shrinkageVal, set weight component to w+shrinkageVal.
* If -shrinkageVal < w < shrinkageVal, set weight component to 0.
* Equivalently, set weight component to signum(w) * max(0.0, abs(w) - shrinkageVal)
*/
class L1Updater extends Updater {
override def compute(weightsOld: DoubleMatrix, gradient: DoubleMatrix,
stepSize: Double, iter: Int, regParam: Double): (DoubleMatrix, Double) = {
val thisIterStepSize = stepSize / math.sqrt(iter)
val step = gradient.mul(thisIterStepSize)
// Take gradient step
val newWeights = weightsOld.sub(step)
// Apply proximal operator (soft thresholding)
val shrinkageVal = regParam * thisIterStepSize
(0 until newWeights.length).foreach { i =>
val wi = newWeights.get(i)
newWeights.put(i, signum(wi) * max(0.0, abs(wi) - shrinkageVal))
}
(newWeights, newWeights.norm1 * regParam)
}
}
加了正则项之后,前几步都一样,然后关键是对后面的处理(后面的理论暂时还不太理解,可以参考http://freemind.pluskid.org/machine-learning/sparsity-and-some-basics-of-l1-regularization/),还是说代码步骤吧,变量shrinkageVal =regParam * thisIterStepSize(注意:要*thisIterStepSize,因为w -= a*gradient 里面的gradient包括L(w)还包括正则的R(w)),然后对加正则前更新的newWeights,上遍历每一个元素,直接对该元素赋值newWeights.put(i, signum(wi) * max(0.0, abs(wi) - shrinkageVal)),对应着代码注释的红体部分。
/**
* Updater for L2 regularized problems.
* R(w) = 1/2 ||w||^2
* Uses a step-size decreasing with the square root of the number of iterations.
*/
class SquaredL2Updater extends Updater {
override def compute(weightsOld: DoubleMatrix, gradient: DoubleMatrix,
stepSize: Double, iter: Int, regParam: Double): (DoubleMatrix, Double) = {
val thisIterStepSize = stepSize / math.sqrt(iter)
val step = gradient.mul(thisIterStepSize)
// add up both updates from the gradient of the loss (= step) as well as
// the gradient of the regularizer (= regParam * weightsOld)
val newWeights = weightsOld.mul(1.0 - thisIterStepSize * regParam).sub(step)
(newWeights, 0.5 * pow(newWeights.norm2, 2.0) * regParam)
}
}
L2正则项加入后,损失函数变为loss1=loss+1/2 *regParam* ||w||^2,按梯度下降的更新公式:w=w-学习速率 * (d(loss1)/d(w));后面的d(loss1)=d(loss1)/d(w) + d(1/2*regParam*||w||^2) / d(w)了,那么更新公式变成了w=w-学习速率*d(loss)/d(w)-学习速率*d(1/2*regParam*||w|| ^2)/d(w)=(1-学习速率*regParam)*w-学习速率*d(loss)/d(w),这个也就对应了第25行代码的意思
第一部分,定义了GradientDescent 类
package org.apache.spark.mllib.optimization
import org.apache.spark.Logging
import org.apache.spark.rdd.RDD
import org.jblas.DoubleMatrix
import scala.collection.mutable.ArrayBuffer
/**
* Class used to solve an optimization problem using Gradient Descent.
* @param gradient Gradient function to be used.
* @param updater Updater to be used to update weights after every iteration.
*/
class GradientDescent(var gradient: Gradient, var updater: Updater)
extends Optimizer with Logging
{
private var stepSize: Double = 1.0
private var numIterations: Int = 100
private var regParam: Double = 0.0
private var miniBatchFraction: Double = 1.0
/**
* Set the initial step size of SGD for the first step. Default 1.0.
* In subsequent steps, the step size will decrease with stepSize/sqrt(t)
*/
def setStepSize(step: Double): this.type = {
this.stepSize = step
this
}
/**
* Set fraction of data to be used for each SGD iteration.
* Default 1.0 (corresponding to deterministic/classical gradient descent)
*/
def setMiniBatchFraction(fraction: Double): this.type = {
this.miniBatchFraction = fraction
this
}
/**
* Set the number of iterations for SGD. Default 100.
*/
def setNumIterations(iters: Int): this.type = {
this.numIterations = iters
this
}
/**
* Set the regularization parameter. Default 0.0.
*/
def setRegParam(regParam: Double): this.type = {
this.regParam = regParam
this
}
/**
* Set the gradient function (of the loss function of one single data example)
* to be used for SGD.
*/
def setGradient(gradient: Gradient): this.type = {
this.gradient = gradient
this
}
/**
* Set the updater function to actually perform a gradient step in a given direction.
* The updater is responsible to perform the update from the regularization term as well,
* and therefore determines what kind or regularization is used, if any.
*/
def setUpdater(updater: Updater): this.type = {
this.updater = updater
this
}
def optimize(data: RDD[(Double, Array[Double])], initialWeights: Array[Double])
: Array[Double] = {
val (weights, stochasticLossHistory) = GradientDescent.runMiniBatchSGD(
data,
gradient,
updater,
stepSize,
numIterations,
regParam,
miniBatchFraction,
initialWeights)
weights
}
}
该类的输入有2个参数,第一个是前面都是gradient对应了用户需要选哪个损失函数计算梯度,第二个updater 对应了用户选择哪一种正则方式,程序开头都设置了stepSize,numIterations,regParam,miniBatchFraction的默认值最后一个函数optimize,输入RDD数据,跟初始的回归系数weight,返回weights权重
// Top-level method to run gradient descent.
object GradientDescent extends Logging {
/**
* Run stochastic gradient descent (SGD) in parallel using mini batches.
* In each iteration, we sample a subset (fraction miniBatchFraction) of the total data
* in order to compute a gradient estimate.
* Sampling, and averaging the subgradients over this subset is performed using one standard
* spark map-reduce in each iteration.
*
* @param data - Input data for SGD. RDD of the set of data examples, each of
* the form (label, [feature values]).
* @param gradient - Gradient object (used to compute the gradient of the loss function of
* one single data example)
* @param updater - Updater function to actually perform a gradient step in a given direction.
* @param stepSize - initial step size for the first step
* @param numIterations - number of iterations that SGD should be run.
* @param regParam - regularization parameter
* @param miniBatchFraction - fraction of the input data set that should be used for
* one iteration of SGD. Default value 1.0.
*
* @return A tuple containing two elements. The first element is a column matrix containing
* weights for every feature, and the second element is an array containing the
* stochastic loss computed for every iteration.
*/
def runMiniBatchSGD(
data: RDD[(Double, Array[Double])],
gradient: Gradient,
updater: Updater,
stepSize: Double,
numIterations: Int,
regParam: Double,
miniBatchFraction: Double,
initialWeights: Array[Double]) : (Array[Double], Array[Double]) = {
val stochasticLossHistory = new ArrayBuffer[Double](numIterations)
val nexamples: Long = data.count()
val miniBatchSize = nexamples * miniBatchFraction
// Initialize weights as a column vector
var weights = new DoubleMatrix(initialWeights.length, 1, initialWeights:_*)
var regVal = 0.0
for (i <- 1 to numIterations) {
// Sample a subset (fraction miniBatchFraction) of the total data
// compute and sum up the subgradients on this subset (this is one map-reduce)
val (gradientSum, lossSum) = data.sample(false, miniBatchFraction, 42 + i).map {
case (y, features) =>
val featuresCol = new DoubleMatrix(features.length, 1, features:_*)
val (grad, loss) = gradient.compute(featuresCol, y, weights)
(grad, loss)
}.reduce((a, b) => (a._1.addi(b._1), a._2 + b._2))
/**
* NOTE(Xinghao): lossSum is computed using the weights from the previous iteration
* and regVal is the regularization value computed in the previous iteration as well.
*/
stochasticLossHistory.append(lossSum / miniBatchSize + regVal)
val update = updater.compute(
weights, gradientSum.div(miniBatchSize), stepSize, i, regParam)
weights = update._1
regVal = update._2
}
logInfo("GradientDescent.runMiniBatchSGD finished. Last 10 stochastic losses %s".format(
stochasticLossHistory.takeRight(10).mkString(", ")))
(weights.toArray, stochasticLossHistory.toArray)
}
}
该object进行了整个的优化过程,输出是回归系数跟每次迭代的loss,这里实现的是minibatch-sgd的并行,前面的var weights = new DoubleMatrix(initialWeights.length, 1, initialWeights:_*),这个操作是把array型的搞成矩阵型的d*1维矩阵。关键代码for (i <- 1 to numIterations) 里面的,首先data是spark的RDD数据类型,data.sample方法第一个参数指是否又放回的抽样,第二个是抽样比例,第三个是随机种子,data.sample返回抽样后的RDD,然后RDD.map,RDD.reduce操作就是一个完整的map-reduce操作。接着,把得到的gradientSum除以miniBatchSize,扔到updater里面去更新梯度,关于minibatch-sgd的并行策略可以参考我之前的文章《常见数据挖掘算法的Map-Reduce策略(2)》里面的algorithm3。
Spark0.9.0机器学习包MLlib-Optimization代码阅读的更多相关文章
- Spark0.9.0机器学习包MLlib-Classification代码阅读
本章主要讲述MLlib包里面的分类算法实现,目前实现的有LogisticRegression.SVM.NaiveBayes ,前两种算法针对各自的目标优化函数跟正则项,调用了Optimization模 ...
- Spark MLlib 示例代码阅读
阅读前提:有一定的机器学习基础, 本文重点面向的是应用,至于机器学习的相关复杂理论和优化理论,还是多多看论文,初学者推荐Ng的公开课 /* * Licensed to the Apache Softw ...
- spark0.9.0安装
利用周末的时间安装学习了下最近很火的Spark0.9.0(江湖传言,要革hadoop命,O(∩_∩)O),并体验了该框架下的机器学习包MLlib(spark解决的一个重点就是高效的运行迭代算法),下面 ...
- Spark2.0机器学习系列之3:决策树
概述 分类决策树模型是一种描述对实例进行分类的树形结构. 决策树可以看为一个if-then规则集合,具有“互斥完备”性质 .决策树基本上都是 采用的是贪心(即非回溯)的算法,自顶向下递归分治构造. 生 ...
- Ubuntu安装Python机器学习包
1.安装pip $ mkdir ~/.pip $ vi ~/.pip/pip.conf [global] trusted-host=mirrors.aliyun.com index-url=http: ...
- 解决Socket粘包问题——C#代码
解决Socket粘包问题——C#代码 前天晚上,曾经的一个同事问我socket发送消息如果太频繁接收方就会有消息重叠,因为当时在外面,没有多加思考 第一反应还以为是多线程导致的数据不同步导致的,让他加 ...
- spark0.8.0安装与学习
spark0.8.0安装与学习 原文地址:http://www.yanjiuyanjiu.com/blog/20131017/ 环境:CentOS 6.4, Hadoop 1.1.2, J ...
- R语言中的机器学习包
R语言中的机器学习包 Machine Learning & Statistical Learning (机器学习 & 统计学习) 网址:http://cran.r-project ...
- 小姐姐带你一起学:如何用Python实现7种机器学习算法(附代码)
小姐姐带你一起学:如何用Python实现7种机器学习算法(附代码) Python 被称为是最接近 AI 的语言.最近一位名叫Anna-Lena Popkes的小姐姐在GitHub上分享了自己如何使用P ...
随机推荐
- Django——静态文件(如bootstrap)的配置
静态文件如CSS, javascript(如bootstrap), 图片等文件在django中的配置官方文档写的比较模糊,自己通过实验验证后并整理如下,以防遗忘,目前只整理了关于本地开发中的设置方式, ...
- sql CHARINDEX() 与 PATINDEX() LEN() substring() COLLATE RAISERROR
CHARINDEX() 在一个表达式中搜索另一个表达式并返回其起始位置(如果找到). CHARINDEX ( expressionToFind , expressionToSearch [ , st ...
- Spring事务管理之声明式事务管理-基于注解的方式
© 版权声明:本文为博主原创文章,转载请注明出处 案例 - 利用Spring的声明式事务(TransactionProxyFactoryBean)管理模拟转账过程 数据库准备 -- 创建表 CREAT ...
- Html中嵌套其他HTML文件的几种方法(转)
给大家整理了3个方法,一个是HTML的iframe标签,别两个是JS引用.比如要在arr.html文件里引用index.html文件,方法如下. HTML引用方法: <iframe name=& ...
- TransactionScope的用法
using (TransactionScope ts = new TransactionScope()) { Model.user_login_log model = new Model.user_l ...
- [译]GLUT教程 - 笔划字体
Lighthouse3d.com >> GLUT Tutorial >> Fonts >> Stroke Fonts 笔划字体是用线条生成的.跟位图字体相反,笔划字 ...
- shell学习五十七天----linux任务管理,针对上一讲的总结和扩展
linux任务管理 在linux下有两类任务管理,各自是一次性和周期性.一次性是at和batch,周期性又分为系统不论什么和用户任务. 一次性任务: 1.命令格式:at [选项] time 2.选项: ...
- eclipse JVM Tomcat 内存堆栈大小设置
1, 设置Eclipse内存使用情况 修改eclipse根目录下的eclipse.ini文件 -vmargs //虚拟机设置 -Xms40m //初始内存 -Xmx256m //最大内存 -Xmn ...
- 交易应用及网站驱动不兼容Windows 10的解决方案
微软公司于2015年7月29日正式发布了Windows 10操作系统.全球范围内已有数以千万计的用户踊跃地升级到了Windows 10,在新用户享受Windows 10所带来的全新使用体验的同 ...
- Docker-Compose 自动创建的网桥与局域网冲突解决方案
环境: 使用docker-compose.yml 部署应用,docker 默认的网络模式是bridge ,默认网段是172.17.0.1/16 ,不巧的是我们局域网也使用的172.22. xx 网段 ...