Proximal Algorithms

1. Introduction

Much like Newton's method is a standard tool for solving unconstrained smooth minimization problems of modest size, proximal algorithms can be viewed as an analogous tool for nonsmooth, constrained, large-scale, or distributed version of these problems. They are very generally applicable, but they turn out to be especially well-suited to problems of recent and widespread interest involving large or high-dimensional datasets.

Proximal methods sit at a higher level of abstraction than classical optimization algorithms like Newton’s method. In the latter, the base operations are low-level, consisting of linear algebra operations and the computation of gradients and Hessians. In proximal algorithms, the base operation is evaluating the proximal operator of a function, which involves solving a small convex optimization problem. These subproblems can be solved with standard methods, but they often admit closedform solutions or can be solved very quickly with simple specialized methods. We will also see that proximal operators and proximal algorithms have a number of interesting interpretations and are connected to many different topics in optimization and applied mathematics.

2. Algorithms

For following convex optimization problem

$$\min_{x}f(x)+g(x)$$

where $f$ is smooth, $g:R^n\rightarrow R\cup \{+\infty\}$ is closed proper convex.

Generally, there are several proximal methods to solve this problem.

Proximal Gradient Method

$$x^{k+1}:=prox_{\lambda^kg}(x^k-\lambda^k \nabla f(x^k)$$

which converges with rate $O(1/k)$ when $\nabla f$ is Lipschitz continuous with constant L and step sizes are $ \lambda^k=\lambda\in(0,1/L]$. If $L$ is not known, we can use the following line search:

Typical value of $\beta$ is 1/2, and

$$\hat{f}_{\lambda}(x,y)=f(y)+\nabla f(y)^T(x-y)+(1/2\lambda)||x-y||_{2}^2$$

Accelerated Proximal Gradient Method

$$y^{k+1}=x^k+\omega (x^k-x^{k-1})$$

$$x^{k+1}:=prox_{\lambda^kg}(y^{k+1}-\lambda^k \nabla f(y^{k+1}))$$

works for $\omega^k=k/(k+3)$ and similar line search as before.

This method has faster $O(1/k^2)$ convergence rate, originated with Nesterov (1983)

ADMM

$$x^{k+1}:=prox_{\lambda f}(z^k-u^k)$$

$$z^{k+1}:=prox_{\lambda g}(x^{k+1}+u^k)$$

$$u^{k+1}:=u^k+x^{k+1}-z^{k+1}$$

basiclly, always works and has $O(1/k)$ rate in general. If $f$ and $g$ are both indicators, get a variation on alternating projections.

This method originates from Gabay, Mercier, Glowinski, Marrocco in 1970s.

3. Example

You are required to solve the following optimization problem

$$\min_{x}\frac{1}{2}x^TAx+b^Tx+c+\gamma||x||_{1}$$

where

$$A=\begin{pmatrix} 2 & 0.25 \\ 0.25 & 0.2 \end{pmatrix},\;b=\begin{pmatrix} 0.5 \\ 0.5 \end{pmatrix},\; c=-1.5, \; \lambda=0.2$$

As for this problem, if $f(x)=\frac{1}{2}x^TAx+b^Tx+c$ and $g(x)=\gamma||x||_{1}$ then

$$\nabla f(x)=Ax+b$$

If $g=||\cdot||_{1}$, then

$$prox_{\lambda f}(v)=(v-\lambda)_{+}-(-v-\lambda)_{+}$$

So the update step is

$$x^{k+1}:=prox_{\lambda^k \gamma||\cdot||_{1}}(x^k-\lambda^k \nabla f(x^k))$$

Finally, the 2D coutour plot of objective function and the trajectory of the value update are showed in following figure.

Additionally, when we use proximal gradient method based on exact line search to optimize the objective function, the result is:

We can find that proximal algorithm can solve this nonsmooth sonvex optimization problem successfully. And method based on exact line search can obtain faster convergence rate than one based on backtracking line search.

If you want to learn proximal algorithms further, you can read the book "Proximal Algorithms" by N. Parikh and S. Boyd, and corresponding website: http://web.stanford.edu/~boyd/papers/prox_algs.html

References

Parikh, Neal, and Stephen P. Boyd. "Proximal Algorithms." Foundations and Trends in optimization 1.3 (2014): 127-239.

Proximal Algorithms的更多相关文章

Proximal Algorithms 6 Evaluating Proximal Operators
目录一般方法二次函数平滑函数标量函数一般的标量函数多边形对偶仿射集合半平面 Box Simplex Cones 二阶锥半正定锥指数锥 Pointwise maximum and ...
Proximal Algorithms 5 Parallel and Distributed Algorithms
目录问题的结构 consensus 更为一般的情况 Exchange 问题 Global exchange 更为一般的情况 Allocation Proximal Algorithms 这一节,介绍 ...
Proximal Algorithms 4 Algorithms
目录 Proximal minimization 解释 Gradient flow 解释1 最大最小算法不动点解释 Forward-backward 迭代解释加速 proximal gradien ...
Proximal Algorithms 3 Interpretation
目录 Moreau-Yosida regularization 与次梯度的联系改进的梯度路径信赖域问题 Proximal Algorithms 这一节,作者总结了一些关于proximal的一些直观 ...
Proximal Algorithms 1 介绍
目录定义解释图形解释梯度解释一个简单的例子 Proximal Algorithms 定义令\(f: \mathrm{R}^n \rightarrow \mathrm{R} \cup \{+ ...
Proximal Algorithms 7 Examples and Applications
目录 LASSO proximal gradient method ADMM 矩阵分解 ADMM算法多时期股票交易随机最优 Robust and risk-averse optimization ...
Proximal Algorithms 2 Properties
目录可分和基本的运算不动点 fixed points Moreau decomposition 可分和如果$f$可分为俩个变量:\(f(x, y)=\varphi(x) + \psi(y) ...
Proximal Gradient Descent for L1 Regularization
[本文链接:http://www.cnblogs.com/breezedeus/p/3426757.html,转载请注明出处] 假设我们要求解以下的最小化问题: ...
Matrix Factorization, Algorithms, Applications, and Avaliable packages
矩阵分解来源:http://www.cvchina.info/2011/09/05/matrix-factorization-jungle/ 美帝的有心人士收集了市面上的矩阵分解的差点儿全部算法和应 ...

随机推荐

quartz 的简单使用
0.依赖:  <dependency> <groupId>org.quartz-scheduler</groupI ...
springsource-tool-suite插件各个历史版本
转自:https://blog.csdn.net/zhen_6137/article/details/79384798 目前spring官网(http://spring.io/tools/sts/al ...
12 python json&pickle&shelve模块
1.什么叫序列化序列化是指把内存里的数据类型转变成字符串,以使其能存储到硬盘或通过网络传输到远程,因为硬盘或网络传输时只能接受bytes(字节) 2.用于序列化的两个模块,json和pickle ...
C#怎么判断字符是不是汉字
.用ASCII码判断在 ASCII码表中,英文的范围是0-,而汉字则是大于127,根据这个范围可以判断,具体代码如下: string text = "我去"; bool res ...
windows下的java项目打jar分别编写在windows与linux下运行的脚本( 本人亲测可用！)
前言: 最近公司做了一个工具,要将这个工具打包成一个可运行的程序,编写start.bat和start.sh在windows和linux下都可以运行. 在网上找了很多资料,最后终于找到一个可靠的资料,记 ...
redis-trib.rb报错:/usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require': cannot load such file -- redis (LoadError)
报错如下: /usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require': cannot load such file -- redis ...
springboot+jsp 遇到的坑
springboot 使用jsp: 1,修改配置文件, spring: mvc: view: prefix: /WEB-INF/jsp/ suffix: .jsp 2,pom 加入: <dep ...
rook 记录
更改rook 集群的配置 https://github.com/rook/rook/blob/master/design/cluster-update.md rook集群升级流程 https://ro ...
js json转对象
使用eval() 读取 for (var i=0;i< response.length; i++) { //alert(response[i].username) html=html+" ...
oracle存储过程-获取错误信息
dbms_output.put_line('code:' || sqlcode); dbms_output.put_line('errm:' || sqlerrm); dbms_output.put_ ...

Proximal Algorithms

Proximal Algorithms的更多相关文章

随机推荐

热门专题