之前,俺也发过不少快速高斯模糊算法.

俺一般认为,只要处理一千六百万像素彩色图片,在2.2GHz的CPU上单核单线程超过1秒的算法,都是不快的.

之前发的几个算法,在俺2.2GHz的CPU上耗时都会超过1秒.

而众所周知,快速高斯模糊有很多实现方法:

1.FIR (Finite impulse response)

https://zh.wikipedia.org/wiki/%E9%AB%98%E6%96%AF%E6%A8%A1%E7%B3%8A

2.SII (Stacked integral images)

http://dx.doi.org/10.1109/ROBOT.2010.5509400

http://arxiv.org/abs/1107.4958

3.Vliet-Young-Verbeek (Recursive filter)

http://dx.doi.org/10.1016/0165-1684(95)00020-E

http://dx.doi.org/10.1109/ICPR.1998.711192

4.DCT (Discrete Cosine Transform)

http://dx.doi.org/10.1109/78.295213

5.box (Box filter)

http://dx.doi.org/10.1109/TPAMI.1986.4767776

6.AM(Alvarez, Mazorra)

http://www.jstor.org/stable/2158018

7.Deriche (Recursive filter)

http://hal.inria.fr/docs/00/07/47/78/PDF/RR-1893.pdf

8.ebox (Extended Box)

http://dx.doi.org/10.1007/978-3-642-24785-9_38

9.IIR (Infinite Impulse Response)

https://software.intel.com/zh-cn/articles/iir-gaussian-blur-filter-implementation-using-intel-advanced-vector-extensions

10.FA (Fast Anisotropic)

http://mathinfo.univ-reims.fr/IMG/pdf/Fast_Anisotropic_Gquss_Filtering_-_GeusebroekECCV02.pdf

......

实现高斯模糊的方法虽然很多,但是作为算法而言,核心关键是简单高效.

目前俺经过实测,IIR是兼顾效果以及性能的不错的方法,也是半径无关(即模糊不同强度耗时基本不变)的实现.

英特尔官方实现的这份:

IIR Gaussian Blur Filter Implementation using Intel® Advanced Vector Extensions [PDF 513KB]
source: gaussian_blur.cpp [36KB]

采用了英特尔处理器的流(SIMD)指令,算法处理速度极其惊人.

俺写算法追求干净整洁,高效简单,换言之就是不采用任何硬件加速方案,实现简单高效,以适应不同硬件环境.

故基于英特尔这份代码,俺对其进行了改写以及优化.

最终在俺2.20GHz的CPU上,单核单线程,不采用流(SIMD)指令,达到了,处理一千六百万像素的彩色照片仅需700毫秒左右.

按照惯例,还是贴个效果图比较直观.

之前也有网友问过这个算法的实现问题.

想了想,还是将代码共享出来,供大家参考学习.

完整代码:

void CalGaussianCoeff(float sigma, float * a0, float * a1, float * a2, float * a3, float * b1, float * b2, float * cprev, float * cnext) {
	float alpha, lamma, k;

	if (sigma < 0.5f)
		sigma = 0.5f;
	alpha = (float)exp((0.726) * (0.726)) / sigma;
	lamma = (float)exp(-alpha);
	*b2 = (float)exp(-2 * alpha);
	k = (1 - lamma) * (1 - lamma) / (1 + 2 * alpha * lamma - (*b2));
	*a0 = k; *a1 = k * (alpha - 1) * lamma;
	*a2 = k * (alpha + 1) * lamma;
	*a3 = -k * (*b2);
	*b1 = -2 * lamma;
	*cprev = (*a0 + *a1) / (1 + *b1 + *b2);
	*cnext = (*a2 + *a3) / (1 + *b1 + *b2);
}

void gaussianHorizontal(unsigned char * bufferPerLine, unsigned char * pRowInitial, unsigned char  * pColumn, int Width, int Height, int Channels, int Nwidth, int a0a1, int a2a3, int b1b2, int    cprev, int cnext)
{
	int HeightStep = Channels*Height;
	int lastWidth = Width - 1;
	if (Channels == 3)
	{
		int prevOut[3];
		prevOut[0] = (pRowInitial[0] * cprev) >> 8;
		prevOut[1] = (pRowInitial[1] * cprev) >> 8;
		prevOut[2] = (pRowInitial[2] * cprev) >> 8;
		for (int x = 0; x < Width; ++x) {
			prevOut[0] = ((pRowInitial[0] * (a0a1)) - (prevOut[0] * (b1b2))) >> 16;
			prevOut[1] = ((pRowInitial[1] * (a0a1)) - (prevOut[1] * (b1b2))) >> 16;
			prevOut[2] = ((pRowInitial[2] * (a0a1)) - (prevOut[2] * (b1b2))) >> 16;
			bufferPerLine[0] = prevOut[0];
			bufferPerLine[1] = prevOut[1];
			bufferPerLine[2] = prevOut[2];
			bufferPerLine += Channels;
			pRowInitial += Channels;
		}
		pRowInitial -= Channels;
		pColumn += HeightStep * lastWidth;
		bufferPerLine -= Channels;
		prevOut[0] = (pRowInitial[0] * cnext) >> 8;
		prevOut[1] = (pRowInitial[1] * cnext) >> 8;
		prevOut[2] = (pRowInitial[2] * cnext) >> 8;

		for (int x = lastWidth; x >= 0; --x) {
			prevOut[0] = ((pRowInitial[0] * (a2a3)) - (prevOut[0] * (b1b2))) >> 16;
			prevOut[1] = ((pRowInitial[1] * (a2a3)) - (prevOut[1] * (b1b2))) >> 16;
			prevOut[2] = ((pRowInitial[2] * (a2a3)) - (prevOut[2] * (b1b2))) >> 16;
			bufferPerLine[0] += prevOut[0];
			bufferPerLine[1] += prevOut[1];
			bufferPerLine[2] += prevOut[2];
			pColumn[0] = bufferPerLine[0];
			pColumn[1] = bufferPerLine[1];
			pColumn[2] = bufferPerLine[2];
			pRowInitial -= Channels;
			pColumn -= HeightStep;
			bufferPerLine -= Channels;
		}
	}
	else if (Channels == 4)
	{
		int prevOut[4];

		prevOut[0] = (pRowInitial[0] * cprev) >> 8;
		prevOut[1] = (pRowInitial[1] * cprev) >> 8;
		prevOut[2] = (pRowInitial[2] * cprev) >> 8;
		prevOut[3] = (pRowInitial[3] * cprev) >> 8;
		for (int x = 0; x < Width; ++x) {
			prevOut[0] = ((pRowInitial[0] * (a0a1)) - (prevOut[0] * (b1b2))) >> 16;
			prevOut[1] = ((pRowInitial[1] * (a0a1)) - (prevOut[1] * (b1b2))) >> 16;
			prevOut[2] = ((pRowInitial[2] * (a0a1)) - (prevOut[2] * (b1b2))) >> 16;
			prevOut[3] = ((pRowInitial[3] * (a0a1)) - (prevOut[3] * (b1b2))) >> 16;

			bufferPerLine[0] = prevOut[0];
			bufferPerLine[1] = prevOut[1];
			bufferPerLine[2] = prevOut[2];
			bufferPerLine[3] = prevOut[3];
			bufferPerLine += Channels;
			pRowInitial += Channels;
		}
		pRowInitial -= Channels;
		pColumn += HeightStep * lastWidth;
		bufferPerLine -= Channels;

		prevOut[0] = (pRowInitial[0] * cnext) >> 8;
		prevOut[1] = (pRowInitial[1] * cnext) >> 8;
		prevOut[2] = (pRowInitial[2] * cnext) >> 8;
		prevOut[3] = (pRowInitial[3] * cnext) >> 8;

		for (int x = lastWidth; x >= 0; --x) {
			prevOut[0] = ((pRowInitial[0] * a2a3) - (prevOut[0] * b1b2)) >> 16;
			prevOut[1] = ((pRowInitial[1] * a2a3) - (prevOut[1] * b1b2)) >> 16;
			prevOut[2] = ((pRowInitial[2] * a2a3) - (prevOut[2] * b1b2)) >> 16;
			prevOut[3] = ((pRowInitial[3] * a2a3) - (prevOut[3] * b1b2)) >> 16;
			bufferPerLine[0] += prevOut[0];
			bufferPerLine[1] += prevOut[1];
			bufferPerLine[2] += prevOut[2];
			bufferPerLine[3] += prevOut[3];
			pColumn[0] = bufferPerLine[0];
			pColumn[1] = bufferPerLine[1];
			pColumn[2] = bufferPerLine[2];
			pColumn[3] = bufferPerLine[3];
			pRowInitial -= Channels;
			pColumn -= HeightStep;
			bufferPerLine -= Channels;
		}
	}
	else if (Channels == 1)
	{
		int prevOut = (pRowInitial[0] * cprev) >> 8;

		for (int x = 0; x < Width; ++x) {
			prevOut = ((pRowInitial[0] * (a0a1)) - (prevOut  * (b1b2))) >> 16;
			bufferPerLine[0] = prevOut;
			bufferPerLine += Channels;
			pRowInitial += Channels;
		}
		pRowInitial -= Channels;
		pColumn += HeightStep*lastWidth;
		bufferPerLine -= Channels;

		prevOut = (pRowInitial[0] * cnext) >> 8;

		for (int x = lastWidth; x >= 0; --x) {
			prevOut = ((pRowInitial[0] * a2a3) - (prevOut  * b1b2)) >> 16;;
			bufferPerLine[0] += prevOut;
			pColumn[0] = bufferPerLine[0];
			pRowInitial -= Channels;
			pColumn -= HeightStep;
			bufferPerLine -= Channels;
		}
	}
}

void gaussianVertical(unsigned char * bufferPerLine, unsigned char * pRowInitial, unsigned char * pColInitial, int Height, int Width, int Channels, int   a0a1, int a2a3, int b1b2, int  cprev, int  cnext) {

	int WidthStep = Channels*Width;
	int lastHeight = Height - 1;
	if (Channels == 3)
	{
		int prevOut[3];
		prevOut[0] = (pRowInitial[0] * cprev) >> 8;
		prevOut[1] = (pRowInitial[1] * cprev) >> 8;
		prevOut[2] = (pRowInitial[2] * cprev) >> 8;

		for (int y = 0; y < Height; y++) {
			prevOut[0] = ((pRowInitial[0] * a0a1) - (prevOut[0] * b1b2)) >> 16;
			prevOut[1] = ((pRowInitial[1] * a0a1) - (prevOut[1] * b1b2)) >> 16;
			prevOut[2] = ((pRowInitial[2] * a0a1) - (prevOut[2] * b1b2)) >> 16;
			bufferPerLine[0] = prevOut[0];
			bufferPerLine[1] = prevOut[1];
			bufferPerLine[2] = prevOut[2];
			bufferPerLine += Channels;
			pRowInitial += Channels;
		}
		pRowInitial -= Channels;
		bufferPerLine -= Channels;
		pColInitial += WidthStep * lastHeight;
		prevOut[0] = (pRowInitial[0] * cnext) >> 8;
		prevOut[1] = (pRowInitial[1] * cnext) >> 8;
		prevOut[2] = (pRowInitial[2] * cnext) >> 8;
		for (int y = lastHeight; y >= 0; y--) {
			prevOut[0] = ((pRowInitial[0] * a2a3) - (prevOut[0] * b1b2)) >> 16;
			prevOut[1] = ((pRowInitial[1] * a2a3) - (prevOut[1] * b1b2)) >> 16;
			prevOut[2] = ((pRowInitial[2] * a2a3) - (prevOut[2] * b1b2)) >> 16;
			bufferPerLine[0] += prevOut[0];
			bufferPerLine[1] += prevOut[1];
			bufferPerLine[2] += prevOut[2];
			pColInitial[0] = bufferPerLine[0];
			pColInitial[1] = bufferPerLine[1];
			pColInitial[2] = bufferPerLine[2];
			pRowInitial -= Channels;
			pColInitial -= WidthStep;
			bufferPerLine -= Channels;
		}
	}
	else if (Channels == 4)
	{
		int prevOut[4];

		prevOut[0] = (pRowInitial[0] * cprev) >> 8;
		prevOut[1] = (pRowInitial[1] * cprev) >> 8;
		prevOut[2] = (pRowInitial[2] * cprev) >> 8;
		prevOut[3] = (pRowInitial[3] * cprev) >> 8;

		for (int y = 0; y < Height; y++) {
			prevOut[0] = ((pRowInitial[0] * a0a1) - (prevOut[0] * b1b2)) >> 16;
			prevOut[1] = ((pRowInitial[1] * a0a1) - (prevOut[1] * b1b2)) >> 16;
			prevOut[2] = ((pRowInitial[2] * a0a1) - (prevOut[2] * b1b2)) >> 16;
			prevOut[3] = ((pRowInitial[3] * a0a1) - (prevOut[3] * b1b2)) >> 16;
			bufferPerLine[0] = prevOut[0];
			bufferPerLine[1] = prevOut[1];
			bufferPerLine[2] = prevOut[2];
			bufferPerLine[3] = prevOut[3];
			bufferPerLine += Channels;
			pRowInitial += Channels;
		}
		pRowInitial -= Channels;
		bufferPerLine -= Channels;
		pColInitial += WidthStep*lastHeight;
		prevOut[0] = (pRowInitial[0] * cnext) >> 8;
		prevOut[1] = (pRowInitial[1] * cnext) >> 8;
		prevOut[2] = (pRowInitial[2] * cnext) >> 8;
		prevOut[3] = (pRowInitial[3] * cnext) >> 8;
		for (int y = lastHeight; y >= 0; y--) {
			prevOut[0] = ((pRowInitial[0] * a2a3) - (prevOut[0] * b1b2)) >> 16;
			prevOut[1] = ((pRowInitial[1] * a2a3) - (prevOut[1] * b1b2)) >> 16;
			prevOut[2] = ((pRowInitial[2] * a2a3) - (prevOut[2] * b1b2)) >> 16;
			prevOut[3] = ((pRowInitial[3] * a2a3) - (prevOut[3] * b1b2)) >> 16;
			bufferPerLine[0] += prevOut[0];
			bufferPerLine[1] += prevOut[1];
			bufferPerLine[2] += prevOut[2];
			bufferPerLine[3] += prevOut[3];
			pColInitial[0] = bufferPerLine[0];
			pColInitial[1] = bufferPerLine[1];
			pColInitial[2] = bufferPerLine[2];
			pColInitial[3] = bufferPerLine[3];
			pRowInitial -= Channels;
			pColInitial -= WidthStep;
			bufferPerLine -= Channels;
		}
	}
	else if (Channels == 1)
	{
		int prevOut = 0;
		prevOut = (pRowInitial[0] * cprev) >> 8;
		for (int y = 0; y < Height; y++) {
			prevOut = ((pRowInitial[0] * a0a1) - (prevOut * b1b2)) >> 16;
			bufferPerLine[0] = prevOut;
			bufferPerLine += Channels;
			pRowInitial += Channels;
		}
		pRowInitial -= Channels;
		bufferPerLine -= Channels;
		pColInitial += WidthStep*lastHeight;
		prevOut = (pRowInitial[0] * cnext) >> 8;
		for (int y = lastHeight; y >= 0; y--) {
			prevOut = ((pRowInitial[0] * a2a3) - (prevOut * b1b2)) >> 16;
			bufferPerLine[0] += prevOut;
			pColInitial[0] = bufferPerLine[0];
			pRowInitial -= Channels;
			pColInitial -= WidthStep;
			bufferPerLine -= Channels;
		}
	}
}

//本人博客:http://tntmonks.cnblogs.com/ 转载请注明出处.
void GaussianBlurFilter(unsigned char * inputBuffer, unsigned char * outputBuffer, int Width, int Height, int Channels, float gaussianSigma = 2.0f) {

	float a0, a1, a2, a3, b1, b2, cprev, cnext;

	CalGaussianCoeff(gaussianSigma, &a0, &a1, &a2, &a3, &b1, &b2, &cprev, &cnext);

	int   icprev = cprev * 256;
	int   icnext = cnext * 256;
	int   a0a1 = (a0 + a1) * 65536;
	int   a2a3 = (a2 + a3) * 65536;
	int   b1b2 = (b1 + b2) * 65536;

	int bufferSizePerLine = (Width > Height ? Width : Height) * Channels;
	unsigned char * bufferPerLine = (unsigned char*)malloc(bufferSizePerLine);
	unsigned char * cacheData = (unsigned char*)malloc(Height * Width * Channels);
	int WidthStep = Width * Channels;
	for (int y = 0; y < Height; ++y) {
		unsigned char * pRowInitial = inputBuffer + WidthStep * y;
		unsigned char * pColumnInitial = cacheData + y * Channels;
		gaussianHorizontal(bufferPerLine, pRowInitial, pColumnInitial, Width, Height, Channels, Width, a0a1, a2a3, b1b2, icprev, icnext);
	}
	int HeightStep = Height*Channels;
	for (int x = 0; x < Width; ++x) {
		unsigned char * pColInitial = outputBuffer + x*Channels;
		unsigned char * pRowInitial = cacheData + HeightStep * x;
		gaussianVertical(bufferPerLine, pRowInitial, pColInitial, Height, Width, Channels, a0a1, a2a3, b1b2, icprev, icnext);
	}

	free(bufferPerLine);
	free(cacheData);
}

  

调用方法:

  GaussianBlurFilter(输入图像数据,输出图像数据,宽度,高度,通道数,强度)

  注:支持通道数分别为 1 ,3 ,4.

关于IIR相关知识,参阅 百度词条 "IIR数字滤波器"

http://baike.baidu.com/view/3088994.htm

天下武功,唯快不破。
本文只是抛砖引玉一下,若有其他相关问题或者需求也可以邮件联系俺探讨。

邮箱地址是:
gaozhihan@vip.qq.com

题外话:

很多网友一直推崇使用opencv,opencv的确十分强大,但是若是想要有更大的发展空间以及创造力.

还是要一步一个脚印去实现一些最基本的算法,扎实的基础才是构建上层建筑的基本条件.

俺目前只是把opencv当资料库来看,并不认为opencv可以用于绝大多数的商业项目.

若本文帮到您,厚颜无耻求微信扫码打个赏.

半径无关单核单线程最快速高斯模糊实现(附完整C代码)的更多相关文章

  1. 半径无关快速高斯模糊实现(附完整C代码)

    之前,俺也发过不少快速高斯模糊算法. 俺一般认为,只要处理一千六百万像素彩色图片,在2.2GHz的CPU上单核单线程超过1秒的算法,都是不快的. 之前发的几个算法,在俺2.2GHz的CPU上耗时都会超 ...

  2. 快速双边滤波 附完整C代码

    很早之前写过<双边滤波算法的简易实现bilateralFilter>. 当时学习参考的代码来自cuda的样例. 相关代码可以参阅: https://github.com/johng12/c ...

  3. 传统高斯模糊与优化算法(附完整C++代码)

    高斯模糊(英语:Gaussian Blur),也叫高斯平滑,是在Adobe Photoshop.GIMP以及Paint.NET等图像处理软件中广泛使用的处理效果,通常用它来减少图像噪声以及降低细节层次 ...

  4. 【如何快速的开发一个完整的iOS直播app】(美颜篇)

    原文转自:袁峥Seemygo    感谢分享.自我学习 前言 在看这篇之前,如果您还不了解直播原理,请查看这篇文章如何快速的开发一个完整的iOS直播app(原理篇) 开发一款直播app,美颜功能是很重 ...

  5. 【如何快速的开发一个完整的 iOS 直播 app】(美颜篇)

    来源:袁峥Seemygo 链接:http://www.jianshu.com/p/4646894245ba 前言 在看这篇之前,如果您还不了解直播原理,请查看这篇文章如何快速的开发一个完整的iOS直播 ...

  6. 如何快速的开发一个完整的iOS直播app(美颜篇)

    前言 在看这篇之前,如果您还不了解直播原理,请查看这篇文章如何快速的开发一个完整的iOS直播app(原理篇) 开发一款直播app,美颜功能是很重要的,如果没有美颜功能,可能分分钟钟掉粉千万,本篇主要讲 ...

  7. 【如何快速的开发一个完整的iOS直播app】(采集篇)

    原文转自:袁峥Seemygo    感谢分享.自我学习 前言 在看这篇之前,如果您还不了解直播原理,请查看这篇文章如何快速的开发一个完整的iOS直播app(原理篇) 开发一款直播app,首先需要采集主 ...

  8. 【如何快速的开发一个完整的iOS直播app】(播放篇)

    原文转自:袁峥Seemygo    感谢分享.自我学习 前言 在看这篇之前,如果您还不了解直播原理,请查看上篇文章如何快速的开发一个完整的iOS直播app(原理篇) 开发一款直播app,集成ijkpl ...

  9. 【如何快速的开发一个完整的iOS直播app】(原理篇)

    原文转自:袁峥Seemygo    感谢分享.自我学习 目录 [如何快速的开发一个完整的iOS直播app](原理篇) [如何快速的开发一个完整的iOS直播app](播放篇) [如何快速的开发一个完整的 ...

随机推荐

  1. Java集合框架之Collection接口

    Java是一门面向对象的语言,那么我们写程序的时候最经常操作的便是对象了,为此,Java提供了一些专门用来处理对象的类库,这些类库的集合我们称之为集合框架.Java集合工具包位于Java.util包下 ...

  2. ahjesus js 快速求幂

    /* 快速幂计算,传统计算方式如果幂次是100就要循环100遍求值 快速幂计算只需要循环7次即可 求x的y次方 x^y可以做如下分解 把y转换为2进制,设第n位的值为i,计算第n位的权为x^(2^(n ...

  3. Rendering Problems: No Android SDK found. Please configure an Android SDK. 怎解决?

    Rendering Problems No Android SDK found. Please configure an Android SDK.

  4. Vue入门演示

    工作中用了很久vue,但是都是我们这边前端经理封装好的组件,想要看到底部的原理还要从层层代码里面剥离出来,逻辑太复杂,还不如自己一点点整理一下,一步一步走下去. github地址:https://gi ...

  5. 2014年听写VOA50篇

    在沪江英语的VOA听写栏目上听写完成50篇,听写笔记PDF. 103.VOASP.2014奥巴马国情咨文(1-3).mp3 104.VOASP.2014奥巴马国情咨文(2-3).mp3 105.VOA ...

  6. Android 带清除功能的输入框控件EditText

    1.效果图      2.源码下载 http://download.csdn.net/detail/yanzi2015/8864603 3.相关博客 http://www.cnblogs.com/to ...

  7. 推荐几个优秀的java爬虫项目

    java爬虫项目   大型的: Nutch apache/nutch · GitHub 适合做搜索引擎,分布式爬虫是其中一个功能. Heritrix internetarchive/heritrix3 ...

  8. i++是否原子操作

    i++是否原子操作 不是原子操作.理由: 1.i++分为三个阶段: 内存到寄存器 寄存器自增 回内存 这三个阶段中间都可以被中断分离开.  2.++i首先要看编译器是怎么编译的, 某些编译器比如VC在 ...

  9. Android项目结构分析

    andriod项目目录结构如下图: 1. src目录 该目录一个普通的保存java源文件的目录,其和普通java工程中的src目录是一样的. 2. gen目录 此目录用于存放所有由ADT插件自动生成的 ...

  10. AndroidDevTools下载地址

    Android Dev Tools官网地址:www.androiddevtools.cn http://www.androiddevtools.cn/ http://wear.techbrood.co ...