几种范数的解释 l0-Norm, l1-Norm, l2-Norm, … , l-infinity Norm

from Rorasa's blog

l0-Norm, l1-Norm, l2-Norm, … , l-infinity Norm

13/05/2012 rorasa

I’m working on things related to norm a lot lately and it is time to talk about it. In this post we are going to discuss about a whole family of norm.

What is a norm?

Mathematically a norm is a total size or length of all vectors in a vector space or matrices. For simplicity, we can say that the higher the norm is, the bigger the (value in) matrix or vector is. Norm may come in many forms and many names, including these popular name: Euclidean distance, Mean-squared Error, etc.

Most of the time you will see the norm appears in a equation like this:

$\left \| x \right \|$ where $x$ can be a vector or a matrix.

For example, a Euclidean norm of a vector $a = \begin{bmatrix} 3 \\ -2 \\ 1 \end{bmatrix}$ is $\left \| a \right \|_2=\sqrt{3^2+(-2)^2+1^2}=3.742$ which is the size of vector $a$

The above example shows how to compute a Euclidean norm, or formally called an $l_2$ -norm. There are many other types of norm that beyond our explanation here, actually for every single real number, there is a norm correspond to it (Notice the emphasised word real number, that means it not limited to only integer.)

Formally the $l_p$ -norm of $x$ is defined as:

$\left \| x \right \|_p = \sqrt[p]{\sum_{i}\left | x_i \right |^p}$ where $p \epsilon \mathbb{R}$

That’s it! A p-th-root of a summation of all elements to the p-th power is what we call a norm.

The interesting point is even though every $l_p$ -norm is all look very similar to each other, their mathematical properties are very different and thus their application are dramatically different too. Hereby we are going to look into some of these norms in details.

l0-norm

The first norm we are going to discuss is a $l_0$ -norm. By definition, $l_0$ -norm of $x$ is

$\left \| x \right \|_0 = \sqrt[0]{\sum_{i}x_i^0}$

Strictly speaking, $l_0$ -norm is not actually a norm. It is a cardinality function which has its definition in the form of $l_p$ -norm, though many people call it a norm. It is a bit tricky to work with because there is a presence of zeroth-power and zeroth-root in it. Obviously any $x>0$ will become one, but the problems of the definition of zeroth-power and especially zeroth-root is messing things around here. So in reality, most mathematicians and engineers use this definition of $l_0$ -norm instead:

$\left \| x \right \|_0 = \#(i | x_i \neq 0)$

that is a total number of non-zero elements in a vector.

Because it is a number of non-zero element, there is so many applications that use $l_0$ -norm. Lately it is even more in focus because of the rise of the Compressive Sensing scheme, which is try to find the sparsest solution of the under-determined linear system. The sparsest solution means the solution which has fewest non-zero entries, i.e. the lowest $l_0$ -norm. This problem is usually regarding as a optimisation problem of $l_0$ -norm or $l_0$ -optimisation.

l0-optimisation

Many application, including Compressive Sensing, try to minimise the $l_0$ -norm of a vector corresponding to some constraints, hence called “ $l_0$ -minimisation”. A standard minimisation problem is formulated as:

$min \left \| x \right \|_0$ subject to $Ax = b$

However, doing so is not an easy task. Because the lack of $l_0$ -norm’s mathematical representation, $l_0$ -minimisation is regarded by computer scientist as an NP-hard problem, simply says that it’s too complex and almost impossible to solve.

In many case, $l_0$ -minimisation problem is relaxed to be higher-order norm problem such as $l_1$ -minimisation and $l_2$ -minimisation.

l1-norm

Following the definition of norm, $l_1$ -norm of $x$ is defined as

$\left \| x \right \|_1 = \sum_{i} \left | x_i \right |$

This norm is quite common among the norm family. It has many name and many forms among various fields, namely Manhattan norm is it’s nickname. If the $l_1$ -norm is computed for a difference between two vectors or matrices, that is

$SAD(x_1,x_2) = \left \| x_1-x_2 \right \|_1 = \sum \left | x_{1_i}-x_{2_i} \right |$

it is called Sum of Absolute Difference (SAD) among computer vision scientists.

In more general case of signal difference measurement, it may be scaled to a unit vector by:

$MAE(x_1,x_2) = \frac{1}{n} \left \| x_1-x_2 \right \|_1 = \frac {1} {n} \sum \left | x_{1_i} - x_{2_i} \right |$ where $n$ is a size of $x$ .

which is known as Mean-Absolute Error (MAE).

l2-norm

The most popular of all norm is the $l_2$ -norm. It is used in almost every field of engineering and science as a whole. Following the basic definition, $l_2$ -norm is defined as

$\left \| x \right \|_2 = \sqrt{\sum_{i}x_i^2}$

$l_2$ -norm is well known as a Euclidean norm, which is used as a standard quantity for measuring a vector difference. As in $l_1$ -norm, if the Euclidean norm is computed for a vector difference, it is known as a Euclidean distance:

$\left \| x_1-x_2 \right \|_2 = \sqrt{\sum_i (x_{1_i}-x_{2_i})^2}$

or in its squared form, known as a Sum of Squared Difference (SSD) among Computer Vision scientists:

$SSD(x_1,x_2) = \left \| x_1-x_2 \right \|_2^2 = \sum_i (x_{1_i}-x_{2_i})^2$

It’s most well known application in the signal processing field is the Mean-Squared Error (MSE) measurement, which is used to compute a similarity, a quality, or a correlation between two signals. MSE is

$MSE(x_1,x_2) = \frac{1}{n} \left \| x_1-x_2 \right \|_2^2 = \frac{1}{n} \sum_i (x_{1_i}-x_{2_i})^2$

As previously discussed in $l_0$ -optimisation section, because of many issues from both a computational view and a mathematical view, many $l_0$ -optimisation problems relax themselves to become $l_1$ – and $l_2$ -optimisation instead. Because of this, we will now discuss about the optimisation of $l_2$ .

l2-optimisation

As in $l_0$ -optimisation case, the problem of minimising $l_2$ -norm is formulated by

$min \left \| x \right \|_2$ subject to $Ax = b$

Assume that the constraint matrix $A$ has full rank, this problem is now a underdertermined system which has infinite solutions. The goal in this case is to draw out the best solution, i.e. has lowest $l_2$ -norm, from these infinitely many solutions. This could be a very tedious work if it was to be computed directly. Luckily it is a mathematical trick that can help us a lot in this work.

By using a trick of Lagrange multipliers, we can then define a Lagrangian

$\mathfrak{L}(\boldsymbol{x}) = \left \| \boldsymbol{x} \right \|_2^2+\lambda^{T}(\boldsymbol{Ax}-\boldsymbol{b})$

where $\lambda$ is the introduced Lagrange multipliers. Take derivative of this equation equal to zero to find a optimal solution and get

$\hat{\boldsymbol{x}}_{opt} = -\frac{1}{2} \boldsymbol{A}^{T} \lambda$

plug this solution into the constraint to get

$\boldsymbol{A}\hat{\boldsymbol{x}}_{opt} = -\frac{1}{2}\boldsymbol{AA}^{T}\lambda=\boldsymbol{b}$

$\lambda=-2(\boldsymbol{AA}^{T})^{-1}\boldsymbol{b}$

and finally

$\hat{\boldsymbol{x}}_{opt}=\boldsymbol{A}^{T} (\boldsymbol{AA}^{T})^{-1} \boldsymbol{b}=\boldsymbol{A}^{+} \boldsymbol{b}$

By using this equation, we can now instantly compute an optimal solution of the $l_2$ -optimisation problem. This equation is well known as the Moore-Penrose Pseudoinverse and the problem itself is usually known as Least Square problem, Least Square regression, or Least Square optimisation.

However, even though the solution of Least Square method is easy to compute, it’s not necessary be the best solution. Because of the smooth nature of $l_2$ -norm itself, it is hard to find a single, best solution for the problem.

In contrary, the $l_1$ -optimisation can provide much better result than this solution.

l1-optimisation

As usual, the $l_1$ -minimisation problem is formulated as

$min \left \| x \right \|_1$ subject to $Ax = b$

Because the nature of $l_1$ -norm is not smooth as in the $l_2$ -norm case, the solution of this problem is much better and more unique than the $l_2$ -optimisation.

However, even though the problem of $l_1$ -minimisation has almost the same form as the $l_2$ -minimisation, it’s much harder to solve. Because this problem doesn’t have a smooth function, the trick we used to solve $l_2$ -problem is no longer valid. The only way left to find its solution is to search for it directly. Searching for the solution means that we have to compute every single possible solution to find the best one from the pool of “infinitely many” possible solutions.

Since there is no easy way to find the solution for this problem mathematically, the usefulness of $l_1$ -optimisation is very limited for decades. Until recently, the advancement of computer with high computational power allows us to “sweep” through all the solutions. By using many helpful algorithms, namely the Convex Optimisation algorithm such as linear programming, or non-linear programming, etc. it’s now possible to find the best solution to this question. Many applications that rely on $l_1$ -optimisation, including the Compressive Sensing, are now possible.

There are many toolboxes for $l_1$ -optimisation available nowadays. These toolboxes usually use different approaches and/or algorithms to solve the same question. The example of these toolboxes are l1-magic, SparseLab,ISAL1,

Now that we have discussed many members of norm family, starting from $l_0$ -norm, $l_1$ -norm, and $l_2$ -norm. It’s time to move on to the next one. As we discussed in the very beginning that there can be any l-whatever norm following the same basic definition of norm, it’s going to take a lot of time to talk about all of them. Fortunately, apart from $l_0$ -, $l_1$ – , and $l_2$ -norm, the rest of them usually uncommon and therefore don’t have so many interesting things to look at. So we’re going to look at the extreme case of norm which is a $l_{\infty}$ -norm (l-infinity norm).

l-infinity norm

As always, the definition for $l_{\infty}$ -norm is

$\left \| x \right \|_{\infty} = \sqrt[\infty]{\sum_i x_i^{\infty}}$

Now this definition looks tricky again, but actually it is quite strait forward. Consider the vector $\boldsymbol{x}$ , let’s say if $x_j$ is the highest entry in the vector $\boldsymbol{x}$ , by the property of the infinity itself, we can say that

$x_j^{\infty}\gg x_i^{\infty}$ $\forall i \neq j$

then

$\sum_i x_i^{\infty} = x_j^{\infty}$

then

$\left \| x \right \|_{\infty} = \sqrt[\infty]{\sum_i x_i^{\infty}} = \sqrt[\infty]{x_j^{\infty}} = \left | x_j \right |$

Now we can simply say that the $l_{\infty}$ -norm is

$\left \| x \right \|_{\infty} = max(\left | x_i \right |)$

that is the maximum entries’ magnitude of that vector. That surely demystified the meaning of $l_{\infty}$ -norm

Now we have discussed the whole family of norm from $l_0$ to $l_{\infty}$ , I hope that this discussion would help understanding the meaning of norm, its mathematical properties, and its real-world implication.

Reference and further reading:

Mathematical Norm – wikipedia

Mathematical Norm – MathWorld

Michael Elad – “Sparse and Redundant Representations : From Theory to Applications in Signal and Image Processing” , Springer, 2010.

Linear Programming – MathWorld

Compressive Sensing – Rice University

Edit (15/02/15) : Corrected inaccuracies of the content.

About these ads

（转）几种范数的解释 l0-Norm, l1-Norm, l2-Norm, … , l-infinity Norm的更多相关文章

机器学习中的范数规则化 L0、L1与L2范数核范数与规则项参数选择
http://blog.csdn.net/zouxy09/article/details/24971995 机器学习中的范数规则化之(一)L0.L1与L2范数 zouxy09@qq.com http: ...
paper 126：[转载] 机器学习中的范数规则化之（一）L0、L1与L2范数
机器学习中的范数规则化之(一)L0.L1与L2范数 zouxy09@qq.com http://blog.csdn.net/zouxy09 今天我们聊聊机器学习中出现的非常频繁的问题:过拟合与规则化. ...
机器学习中的范数规则化之（一）L0、L1与L2范数（转）
http://blog.csdn.net/zouxy09/article/details/24971995 机器学习中的范数规则化之(一)L0.L1与L2范数 zouxy09@qq.com http: ...
L0、L1与L2范数、核范数（转）
L0.L1与L2范数.核范数今天我们聊聊机器学习中出现的非常频繁的问题:过拟合与规则化.我们先简单的来理解下常用的L0.L1.L2和核范数规则化.最后聊下规则化项参数的选择问题.这里因为篇幅比较庞大 ...
机器学习中的范数规则化之（一）L0、L1与L2范数非常好，必看
机器学习中的范数规则化之(一)L0.L1与L2范数 zouxy09@qq.com http://blog.csdn.net/zouxy09 今天我们聊聊机器学习中出现的非常频繁的问题:过拟合与规则化. ...
『科学计算』L0、L1与L2范数_理解
『教程』L0.L1与L2范数一.L0范数.L1范数.参数稀疏 L0范数是指向量中非0的元素的个数.如果我们用L0范数来规则化一个参数矩阵W的话,就是希望W的大部分元素都是0,换句话说,让参数W是稀 ...
机器学习中的范数规则化之L0、L1与L2范数
今天看到一篇讲机器学习范数规则化的文章,讲得特别好,记录学习一下.原博客地址(http://blog.csdn.net/zouxy09). 今天我们聊聊机器学习中出现的非常频繁的问题:过拟合与规则化. ...
Machine Learning系列--L0、L1、L2范数
今天我们聊聊机器学习中出现的非常频繁的问题:过拟合与规则化.我们先简单的来理解下常用的L0.L1.L2和核范数规则化.最后聊下规则化项参数的选择问题.这里因为篇幅比较庞大,为了不吓到大家,我将这个五个 ...
机器学习中的范数规则化之 L0、L1与L2范数、核范数与规则项参数选择
装载自:https://blog.csdn.net/u012467880/article/details/52852242 今天我们聊聊机器学习中出现的非常频繁的问题:过拟合与规则化.我们先简单的来理 ...

随机推荐

Skyfree的毕业论文《系统封装与部署的深入研究》
Skyfree的毕业论文 <系统封装与部署的深入研究> https://www.itsk.com/thread-197-1-4.html Skyfree 发表于 2007-9-13 07: ...
安卓手机上运行 PC-E500 程序
目录第1章安卓手机上运行 PC-E500 程序 1 1 PockEmul 1 2 下载 1 3 打包BASIC程序 2 4 配置PC-E500模拟器 5 5 载入e50 ...
OPencv1.0配置vs2010（介于OPencv的经典之作。都是OPencv1.0为基础的。）
首先下载OPencv1.0 我在之前的博客中写了下载的资源http://www.cnblogs.com/xiaochige/p/5990858.html 把OPencv1.0中bin文件夹下的所有内容 ...
并发编程 02—— ConcurrentHashMap
Java并发编程实践目录并发编程 01—— ThreadLocal 并发编程 02—— ConcurrentHashMap 并发编程 03—— 阻塞队列和生产者-消费者模式并发编程 04—— 闭 ...
linux 基本命令操作
1.ls 命令 ls -a 列出所有文件,包括隐藏文件 ls -l 列出文件详细信息 ls -r 列出所有文件包括文件夹查询具体文件可以在命令后面加 |grep 要匹配的字符串,方便我们查找, ...
Java：多线程<一>
程序运行时,其实是CPU在执行程序的进程,为了提高工作效率一个进程可以有多个线程. Java的多线程: 其实我们之前就见过Java的线程,main就是Java的一个线程,还有另一个条线程总是和main ...
Xib与View关联方法
1,在需要实例的地方 //加载一个uiview的作法 [LotteryInvestigationView *lotteryInvestigationView=[[[NSBundle mainBundl ...
CodeForces 688A-Opponents
题意: Arya在学校有n个敌人(一串含有0,1的数字表示),有一个游戏规则,如果当天这n个敌人全部出席("1"代表出席,),即这串数字全部为"1",则Arya ...
java基础之内部类
Java中的内部类共分为四种: 静态内部类static inner class (also called nested class) 成员内部类member inner class 局部内部类loca ...
学习笔记之 prim算法和kruskal算法
~. 最近数据结构课讲到了prim算法,然而一直使用kruskal算法的我还不知prim的思想,实在是寝食难安,于此灯火通明之时写此随笔,以祭奠我睡过去的数据结构课. 一,最小生成树之prim pr ...

（转）几种范数的解释 l0-Norm, l1-Norm, l2-Norm, … , l-infinity Norm

from Rorasa's blog

l0-Norm, l1-Norm, l2-Norm, … , l-infinity Norm

Share this:

（转）几种范数的解释 l0-Norm, l1-Norm, l2-Norm, … , l-infinity Norm的更多相关文章

随机推荐

热门专题