Proximal Algorithms
1. Introduction
Much like Newton's method is a standard tool for solving unconstrained smooth minimization problems of modest size, proximal algorithms can be viewed as an analogous tool for nonsmooth, constrained, large-scale, or distributed version of these problems. They are very generally applicable, but they turn out to be especially well-suited to problems of recent and widespread interest involving large or high-dimensional datasets.
Proximal methods sit at a higher level of abstraction than classical optimization algorithms like Newton’s method. In the latter, the base operations are low-level, consisting of linear algebra operations and the computation of gradients and Hessians. In proximal algorithms, the base operation is evaluating the proximal operator of a function, which involves solving a small convex optimization problem. These subproblems can be solved with standard methods, but they often admit closedform solutions or can be solved very quickly with simple specialized methods. We will also see that proximal operators and proximal algorithms have a number of interesting interpretations and are connected to many different topics in optimization and applied mathematics.
2. Algorithms
For following convex optimization problem
$$\min_{x}f(x)+g(x)$$
where $f$ is smooth, $g:R^n\rightarrow R\cup \{+\infty\}$ is closed proper convex.
Generally, there are several proximal methods to solve this problem.
- Proximal Gradient Method
$$x^{k+1}:=prox_{\lambda^kg}(x^k-\lambda^k \nabla f(x^k)$$
which converges with rate $O(1/k)$ when $\nabla f$ is Lipschitz continuous with constant L and step sizes are $ \lambda^k=\lambda\in(0,1/L]$. If $L$ is not known, we can use the following line search:

Typical value of $\beta$ is 1/2, and
$$\hat{f}_{\lambda}(x,y)=f(y)+\nabla f(y)^T(x-y)+(1/2\lambda)||x-y||_{2}^2$$
- Accelerated Proximal Gradient Method
$$y^{k+1}=x^k+\omega (x^k-x^{k-1})$$
$$x^{k+1}:=prox_{\lambda^kg}(y^{k+1}-\lambda^k \nabla f(y^{k+1}))$$
works for $\omega^k=k/(k+3)$ and similar line search as before.
This method has faster $O(1/k^2)$ convergence rate, originated with Nesterov (1983)
- ADMM
$$x^{k+1}:=prox_{\lambda f}(z^k-u^k)$$
$$z^{k+1}:=prox_{\lambda g}(x^{k+1}+u^k)$$
$$u^{k+1}:=u^k+x^{k+1}-z^{k+1}$$
basiclly, always works and has $O(1/k)$ rate in general. If $f$ and $g$ are both indicators, get a variation on alternating projections.
This method originates from Gabay, Mercier, Glowinski, Marrocco in 1970s.
3. Example
You are required to solve the following optimization problem
$$\min_{x}\frac{1}{2}x^TAx+b^Tx+c+\gamma||x||_{1}$$
where
$$A=\begin{pmatrix} 2 & 0.25 \\ 0.25 & 0.2 \end{pmatrix},\;b=\begin{pmatrix} 0.5 \\ 0.5 \end{pmatrix},\; c=-1.5, \; \lambda=0.2$$
As for this problem, if $f(x)=\frac{1}{2}x^TAx+b^Tx+c$ and $g(x)=\gamma||x||_{1}$ then
$$\nabla f(x)=Ax+b$$
If $g=||\cdot||_{1}$, then
$$prox_{\lambda f}(v)=(v-\lambda)_{+}-(-v-\lambda)_{+}$$
So the update step is
$$x^{k+1}:=prox_{\lambda^k \gamma||\cdot||_{1}}(x^k-\lambda^k \nabla f(x^k))$$
Finally, the 2D coutour plot of objective function and the trajectory of the value update are showed in following figure.

Additionally, when we use proximal gradient method based on exact line search to optimize the objective function, the result is:

We can find that proximal algorithm can solve this nonsmooth sonvex optimization problem successfully. And method based on exact line search can obtain faster convergence rate than one based on backtracking line search.
If you want to learn proximal algorithms further, you can read the book "Proximal Algorithms" by N. Parikh and S. Boyd, and corresponding website: http://web.stanford.edu/~boyd/papers/prox_algs.html
References
Parikh, Neal, and Stephen P. Boyd. "Proximal Algorithms." Foundations and Trends in optimization 1.3 (2014): 127-239.
Proximal Algorithms的更多相关文章
- Proximal Algorithms 6 Evaluating Proximal Operators
目录 一般方法 二次函数 平滑函数 标量函数 一般的标量函数 多边形 对偶 仿射集合 半平面 Box Simplex Cones 二阶锥 半正定锥 指数锥 Pointwise maximum and ...
- Proximal Algorithms 5 Parallel and Distributed Algorithms
目录 问题的结构 consensus 更为一般的情况 Exchange 问题 Global exchange 更为一般的情况 Allocation Proximal Algorithms 这一节,介绍 ...
- Proximal Algorithms 4 Algorithms
目录 Proximal minimization 解释 Gradient flow 解释1 最大最小算法 不动点解释 Forward-backward 迭代解释 加速 proximal gradien ...
- Proximal Algorithms 3 Interpretation
目录 Moreau-Yosida regularization 与次梯度的联系 改进的梯度路径 信赖域问题 Proximal Algorithms 这一节,作者总结了一些关于proximal的一些直观 ...
- Proximal Algorithms 1 介绍
目录 定义 解释 图形解释 梯度解释 一个简单的例子 Proximal Algorithms 定义 令\(f: \mathrm{R}^n \rightarrow \mathrm{R} \cup \{+ ...
- Proximal Algorithms 7 Examples and Applications
目录 LASSO proximal gradient method ADMM 矩阵分解 ADMM算法 多时期股票交易 随机最优 Robust and risk-averse optimization ...
- Proximal Algorithms 2 Properties
目录 可分和 基本的运算 不动点 fixed points Moreau decomposition 可分和 如果\(f\)可分为俩个变量:\(f(x, y)=\varphi(x) + \psi(y) ...
- Proximal Gradient Descent for L1 Regularization
[本文链接:http://www.cnblogs.com/breezedeus/p/3426757.html,转载请注明出处] 假设我们要求解以下的最小化问题: ...
- Matrix Factorization, Algorithms, Applications, and Avaliable packages
矩阵分解 来源:http://www.cvchina.info/2011/09/05/matrix-factorization-jungle/ 美帝的有心人士收集了市面上的矩阵分解的差点儿全部算法和应 ...
随机推荐
- windows10系统右键添加cmd命令
https://blog.csdn.net/Mr_BEelzebub/article/details/78776104 首先,在桌面新建一个文本文档. Windows Registry Editor ...
- 模拟Vue之数据驱动1
一.前言 Vue有一核心就是数据驱动(Data Driven),允许我们采用简洁的模板语法来声明式的将数据渲染进DOM,且数据与DOM是绑定在一起的,这样当我们改变Vue实例的数据时,对应的DOM元素 ...
- 过滤器 ;spring拦截器 切片 小结
1. springMVc的拦截器 实现HandlerInterceptor接口,如下: public class HandlerInterceptor1 implements HandlerInter ...
- HtmlRowCreated关于e.Row.Cells[0]的获取和设置
获取采用: cmd2.Parameters.AddWithValue("@xh", e.GetValue("学号").ToString().Trim()); ...
- 将 MyBatis3 的支持添加到 Spring
http://www.mybatis.org/spring/zh/index.html What is MyBatis-Spring? MyBatis-Spring 会帮助你将 MyBatis 代码无 ...
- UI5-文档-4.15-Nested Views
我们的面板内容变得越来越复杂,现在是时候将面板内容移动到一个单独的视图中了.使用这种方法,应用程序结构更容易理解,应用程序的各个部分可以重用. Preview The panel content is ...
- UI5-文档-4.6-Modules
在SAPUI5中,资源通常被称为模块.在这一步中,我们将上次练习中的警报替换为sap.m库中的适当消息Toast.所需的模块可以异步加载. Preview A message toast displa ...
- ncodeURIComponent() 函数 vue内容
ncodeURIComponent() 函数 编辑 encodeURIComponent() 函数[1] 作用:可把字符串作为URI 组件进行编码.其返回值URIstring 的副本,其中的某些字符 ...
- Hibernate merge和update的区别
今天做了个测试,写了个测试用例来看看merge与update时控制台打印出来的日志有什么不一样.实体bean很简单,就id和name两个字段,接下来分别给出以下几种测试情形的控制台日志内容: 1. 数 ...
- 贝叶斯vs频率派:武功到底哪家强?| 说人话的统计学·协和八(转)
回我们初次见识了统计学理论中的“独孤九剑”——贝叶斯统计学(戳这里回顾),它的起源便是大名鼎鼎的贝叶斯定理. 整个贝叶斯统计学的精髓可以用贝叶斯定理这一条式子来概括: 我们做数据分析,绝大多数情况下希 ...