1. Introduction


Much like Newton's method is a standard tool for solving unconstrained smooth minimization problems of modest size, proximal algorithms can be viewed as an analogous tool for nonsmooth, constrained, large-scale, or distributed version of these problems. They are very generally applicable, but they turn out to be especially well-suited to problems of recent and widespread interest involving large or high-dimensional datasets.

Proximal methods sit at a higher level of abstraction than classical optimization algorithms like Newton’s method. In the latter, the base operations are low-level, consisting of linear algebra operations and the computation of gradients and Hessians. In proximal algorithms, the base operation is evaluating the proximal operator of a function, which involves solving a small convex optimization problem. These subproblems can be solved with standard methods, but they often admit closedform solutions or can be solved very quickly with simple specialized methods. We will also see that proximal operators and proximal algorithms have a number of interesting interpretations and are connected to many different topics in optimization and applied mathematics.

2. Algorithms


For following convex optimization problem

$$\min_{x}f(x)+g(x)$$

where $f$ is smooth, $g:R^n\rightarrow R\cup \{+\infty\}$ is closed proper convex.

Generally, there are several proximal methods to solve this problem.

  • Proximal Gradient Method

$$x^{k+1}:=prox_{\lambda^kg}(x^k-\lambda^k \nabla f(x^k)$$

which converges with rate $O(1/k)$ when $\nabla f$ is Lipschitz continuous with constant L and step sizes are $ \lambda^k=\lambda\in(0,1/L]$. If $L$ is not known, we can use the following line search:

Typical value of $\beta$ is 1/2, and

$$\hat{f}_{\lambda}(x,y)=f(y)+\nabla f(y)^T(x-y)+(1/2\lambda)||x-y||_{2}^2$$

  • Accelerated Proximal Gradient Method

$$y^{k+1}=x^k+\omega (x^k-x^{k-1})$$

$$x^{k+1}:=prox_{\lambda^kg}(y^{k+1}-\lambda^k \nabla f(y^{k+1}))$$

works for $\omega^k=k/(k+3)$ and similar line search as before.

This method has faster $O(1/k^2)$ convergence rate, originated with Nesterov (1983)

  • ADMM

$$x^{k+1}:=prox_{\lambda f}(z^k-u^k)$$

$$z^{k+1}:=prox_{\lambda g}(x^{k+1}+u^k)$$

$$u^{k+1}:=u^k+x^{k+1}-z^{k+1}$$

basiclly, always works and has $O(1/k)$ rate in general. If $f$ and $g$ are both indicators, get a variation on alternating projections.

This method originates from Gabay, Mercier, Glowinski, Marrocco in 1970s.

3. Example


You are required to solve the following optimization problem

$$\min_{x}\frac{1}{2}x^TAx+b^Tx+c+\gamma||x||_{1}$$

where

$$A=\begin{pmatrix} 2 & 0.25 \\ 0.25 & 0.2 \end{pmatrix},\;b=\begin{pmatrix} 0.5 \\ 0.5 \end{pmatrix},\; c=-1.5, \; \lambda=0.2$$

As for this problem, if $f(x)=\frac{1}{2}x^TAx+b^Tx+c$ and $g(x)=\gamma||x||_{1}$ then

$$\nabla f(x)=Ax+b$$

If $g=||\cdot||_{1}$, then

$$prox_{\lambda f}(v)=(v-\lambda)_{+}-(-v-\lambda)_{+}$$

So the update step is

$$x^{k+1}:=prox_{\lambda^k \gamma||\cdot||_{1}}(x^k-\lambda^k \nabla f(x^k))$$

Finally, the 2D coutour plot of objective function and the trajectory of the value update are showed in following figure.

Additionally, when we use proximal gradient method based on exact line search to optimize the objective function, the result is:

We can find that proximal algorithm can solve this nonsmooth sonvex optimization problem successfully. And method based on exact line search can obtain faster convergence rate than one based on backtracking line search.

If you want to learn proximal algorithms further, you can read the book "Proximal Algorithms" by N. Parikh and S. Boyd, and corresponding website: http://web.stanford.edu/~boyd/papers/prox_algs.html

References


Parikh, Neal, and Stephen P. Boyd. "Proximal Algorithms." Foundations and Trends in optimization 1.3 (2014): 127-239.


 




 

Proximal Algorithms的更多相关文章

  1. Proximal Algorithms 6 Evaluating Proximal Operators

    目录 一般方法 二次函数 平滑函数 标量函数 一般的标量函数 多边形 对偶 仿射集合 半平面 Box Simplex Cones 二阶锥 半正定锥 指数锥 Pointwise maximum and ...

  2. Proximal Algorithms 5 Parallel and Distributed Algorithms

    目录 问题的结构 consensus 更为一般的情况 Exchange 问题 Global exchange 更为一般的情况 Allocation Proximal Algorithms 这一节,介绍 ...

  3. Proximal Algorithms 4 Algorithms

    目录 Proximal minimization 解释 Gradient flow 解释1 最大最小算法 不动点解释 Forward-backward 迭代解释 加速 proximal gradien ...

  4. Proximal Algorithms 3 Interpretation

    目录 Moreau-Yosida regularization 与次梯度的联系 改进的梯度路径 信赖域问题 Proximal Algorithms 这一节,作者总结了一些关于proximal的一些直观 ...

  5. Proximal Algorithms 1 介绍

    目录 定义 解释 图形解释 梯度解释 一个简单的例子 Proximal Algorithms 定义 令\(f: \mathrm{R}^n \rightarrow \mathrm{R} \cup \{+ ...

  6. Proximal Algorithms 7 Examples and Applications

    目录 LASSO proximal gradient method ADMM 矩阵分解 ADMM算法 多时期股票交易 随机最优 Robust and risk-averse optimization ...

  7. Proximal Algorithms 2 Properties

    目录 可分和 基本的运算 不动点 fixed points Moreau decomposition 可分和 如果\(f\)可分为俩个变量:\(f(x, y)=\varphi(x) + \psi(y) ...

  8. Proximal Gradient Descent for L1 Regularization

    [本文链接:http://www.cnblogs.com/breezedeus/p/3426757.html,转载请注明出处] 假设我们要求解以下的最小化问题:                     ...

  9. Matrix Factorization, Algorithms, Applications, and Avaliable packages

    矩阵分解 来源:http://www.cvchina.info/2011/09/05/matrix-factorization-jungle/ 美帝的有心人士收集了市面上的矩阵分解的差点儿全部算法和应 ...

随机推荐

  1. rsyslog编译依赖问题解决

    bit-32-centos6.4测试loganalyzer+mysql+rsyslog   web界面中央日志分析系统.1,报json-c错误wget http://cloud.github.com/ ...

  2. servlet练习1

    1. 编写一个Servlet,当用户请求该Servlet时,显示用户于几点几分从哪个IP(Internet Protocol)地址连线至服务器,以及发出的查询字符串(Query String).查询一 ...

  3. load data

    LOAD DATA INFILE 'D:\GX\\mm.txt' REPLACE INTO TABLE mm FIELDS TERMINATED BY ' ' // 以空格为分隔符插入数据,前提是mm ...

  4. DevExpress ImageComboBoxEdit增加

    Combo_订单类型.Properties.Items.Clear() Select Case Combo_客户名称.EditValue Case "ABC" Combo_订单类型 ...

  5. DDoS攻防战 (一) : 概述

    岁寒 然后知松柏之后凋也 ——论语·子罕 (此图摘自<Web脚本攻击与防御技术核心剖析>一书,作者:郝永清先生)    DDoS,即 Distributed Denial of Servi ...

  6. Java中3种代理总结(示例代码见之前文章)

    1.JDK静态代理 业务接口 接口的实现类 代理类,实现接口,并扩展实现类的功能 ### 2.JDK动态代理 业务接口 实现了业务接口的业务类 实现了InvocationHandler接口的handl ...

  7. this指针的调整

    我们先来看一段代码: #include <iostream> using namespace std; class A { public: int a; A( ) { printf(&qu ...

  8. Haskell语言学习笔记(49)ByteString Text

    Data.ByteString String 是 [Char] 的同义词,在使用上存在List的惰性所带来的性能问题. 在处理大型二进制文件时,可以使用 ByteString 来代替 String. ...

  9. 通过IP地址进行精准定位

    可能会遇到这样的问题,服务器或者系统经常被扫描,通过IP地址我们只能查到某一个市级城市,如下图: 当我们想具体到街道甚至门牌号,该怎么办??? 偶然间发现百度地图有高精度IP定位API的接口,通过该接 ...

  10. XML文件的写,集合XML序列化(写)。XML文件的读,递归遍历

    XML文件:必须要有一个节点.检验xml文件,可以用浏览器打开,能打开表示对,否则错. 处理方法: DOM:XmlDocument文档对象模型 Sax(事件驱动,XmlReader) XmlSeria ...