[转]Compact Normal Storage for Small G-Buffers
http://aras-p.info/texts/CompactNormalStorage.html
- Intro
- Baseline: store X&Y&Z
- Method 1: X&Y
- Method 3: Spherical Coordinates
- Method 4: Spheremap Transform
- Method 7: Stereographic projection
- Method 8: Per-pixel View Space
- Performance Comparison
- Quality Comparison
- Changelog
Intro
Various deferred shading/lighting approaches or image postprocessing effects need to store normals as part of their G-buffer. Let’s figure out a compact storage method for view space normals. In my case, main target is minimalist G-buffer, where depth and normals are packed into a single 32 bit (8 bits/channel) render texture. I try to minimize error and shader cycles to encode/decode.
Now of course, 8 bits/channel storage for normals can be not enough for shading, especially if you want specular (low precision & quantization leads to specular “wobble” when camera or objects move). However, everything below should Just Work (tm) for 10 or 16 bits/channel integer formats. For 16 bits/channel half-float formats, some of the computations are not necessary (e.g. bringing normal values into 0..1 range).
If you know other ways to store/encode normals, please let me know in the comments!
Various normal encoding methods and their comparison below. Notes:
- Error images are: 1-pow(dot(n1,n2),1024) and abs(n1-n2)*30, where n1 is actual normal, andn2 is normal encoded into a texture, read back & decoded. MSE and PSNR is computed on the difference (abs(n1-n2)) image.
- Shader code is HLSL. Compiled into ps_3_0 by d3dx9_42.dll (February 2010 SDK).
- Radeon GPU performance numbers from AMD’s GPU ShaderAnalyzer 1.53, using Catalyst 9.12 driver.
- GeForce GPU performance numbers from NVIDIA’s NVShaderPerf 2.0, using 174.74 driver.
Note: there was an error!
Original version of my article had some stupidity: encoding shaders did not normalize the incoming per-vertex normal. This resulted in quality evaluation results being somewhat wrong. Also, if normal is assumed to be normalized, then three methods in original article (Sphere Map, Cry Engine 3 and Lambert Azimuthal) are in fact completely equivalent. The old version is still available for the sake of integrity of the internets.
Test Playground Application
Here is a small Windows application I used to test everything below: NormalEncodingPlayground.zip(4.8MB, source included).
It requires GPU with Shader Model 3.0 support. When it writes fancy shader reports, it expects AMD’s GPUShaderAnalyzer and NVIDIA’s NVShaderPerf to be installed. Source code should build with Visual C++ 2008.
Baseline: store X&Y&Z
Just to set the basis, store all three components of the normal. It’s not suitable for our quest, but I include it here to evaluate “base” encoding error (which happens here only because of quantization to 8 bits per component).
Encoding, Error to Power, Error * 30 images below. MSE: 0.000008; PSNR: 51.081 dB.

| Encoding | Decoding |
|---|---|
half4 encode (half3 n, float3 view) |
half3 decode (half4 enc, float3 view) |
ps_3_0 |
ps_3_0 |
1 ALU |
2 ALU, 1 TEX |
Method #1: store X&Y, reconstruct Z
Used by Killzone 2 among others (PDF link).
Encoding, Error to Power, Error * 30 images below. MSE: 0.013514; PSNR: 18.692 dB.

Pros:
|
Cons:
|
| Encoding | Decoding |
|---|---|
half4 encode (half3 n, float3 view) |
half3 decode (half2 enc, float3 view) |
ps_3_0 |
ps_3_0 |
1 ALU |
7 ALU, 1 TEX |
Method #3: Spherical Coordinates
It is possible to use spherical coordinates to encode the normal. Since we know it’s unit length, we can just store the two angles.
Suggested by Pat Wilson of Garage Games: GG blog post. Other mentions: MJP’s blog, GarageGames thread, Wolf Engel’s blog, gamedev.net forum thread.
Encoding, Error to Power, Error * 30 images below. MSE: 0.000062; PSNR: 42.042 dB.

Pros:
|
Cons:
|
| Encoding | Decoding |
|---|---|
#define kPI 3.1415926536f |
half3 decode (half2 enc, float3 view) |
ps_3_0 |
ps_3_0 |
26 ALU |
17 ALU, 1 TEX |
Method #4: Spheremap Transform
Spherical environment mapping (indirectly) maps reflection vector to a texture coordinate in [0..1] range. The reflection vector can point away from the camera, just like our view space normals. Bingo! See Siggraph 99 notes for sphere map math. Normal we want to encode is R, resulting values are (s,t).
If we assume that incoming normal is normalized, then there are methods derived from elsewhere that end up being exactly equivalent:
- Used in Cry Engine 3, presented by Martin Mittring in “A bit more Deferred” presentation (PPT link, slide 13). For Unity, I had to negate Z component of view space normal to produce good results, I guess Unity’s and Cry Engine’s coordinate systems are different. The code would be:
half2 encode (half3 n, float3 view)
{
half2 enc = normalize(n.xy) * (sqrt(-n.z*0.5+0.5));
enc = enc*0.5+0.5;
return enc;
}
half3 decode (half4 enc, float3 view)
{
half4 nn = enc*half4(2,2,0,0) + half4(-1,-1,1,-1);
half l = dot(nn.xyz,-nn.xyw);
nn.z = l;
nn.xy *= sqrt(l);
return nn.xyz * 2 + half3(0,0,-1);
} - Lambert Azimuthal Equal-Area projection (Wikipedia link). Suggested by Sean Barrett in comments for this article. The code would be:
half2 encode (half3 n, float3 view)
{
half f = sqrt(8*n.z+8);
return n.xy / f + 0.5;
}
half3 decode (half4 enc, float3 view)
{
half2 fenc = enc*4-2;
half f = dot(fenc,fenc);
half g = sqrt(1-f/4);
half3 n;
n.xy = fenc*g;
n.z = 1-f/2;
return n;
}
Encoding, Error to Power, Error * 30 images below. MSE: 0.000016; PSNR: 48.071 dB.

Pros:
|
Cons:
|
| Encoding | Decoding |
|---|---|
half4 encode (half3 n, float3 view) |
half3 decode (half2 enc, float3 view) |
ps_3_0 |
ps_3_0 |
4 ALU |
8 ALU, 1 TEX |
Method #7: Stereographic Projection
What the title says: use Stereographic Projection (Wikipedia link), plus rescaling so that “practically visible” range of normals maps into unit circle (regular stereographic projection maps sphere to circle of infinite size). In my tests, scaling factor of 1.7777 produced best results; in practice it depends on FOV used and how much do you care about normals that point away from the camera.
Suggested by Sean Barrett and Ignacio Castano in comments for this article.
Encoding, Error to Power, Error * 30 images below. MSE: 0.000038; PSNR: 44.147 dB.

Pros:
|
Cons:
|
| Encoding | Decoding |
|---|---|
half4 encode (half3 n, float3 view) |
half3 decode (half4 enc, float3 view) |
ps_3_0 |
ps_3_0 |
5 ALU |
7 ALU, 1 TEX |
Method #8: Per-pixel View Space
If we compute view space per-pixel, then Z component of a normal can never be negative. Then just store X&Y, and compute Z.
Suggested by Yuriy O’Donnell on Twitter.
Encoding, Error to Power, Error * 30 images below. MSE: 0.000134; PSNR: 38.730 dB.

Pros:
|
Cons:
|
| Encoding | Decoding |
|---|---|
float3x3 make_view_mat (float3 view) |
|
ps_3_0 |
ps_3_0 |
17 ALU |
21 ALU, 1 TEX |
Performance Comparison
GPU performance comparison in a single table:
| #1: X & Y | #3: Spherical | #4: Spheremap | #7: Stereo | #8: PPView | |||
|---|---|---|---|---|---|---|---|
| Encoding, GPU cycles | |||||||
| Radeon HD2400 | 1.00 | 17.00 | 3.00 | 4.00 | 11.00 | ||
| Radeon HD5870 | 0.50 | 0.95 | 0.50 | 0.50 | 0.80 | ||
| GeForce 6200 | 1.00 | 12.00 | 4.00 | 2.00 | 12.00 | ||
| GeForce 8800 | 7.00 | 43.00 | 12.00 | 12.00 | 24.00 | ||
| Decoding, GPU cycles | |||||||
| Radeon HD2400 | 1.00 | 17.00 | 3.00 | 4.00 | 11.00 | ||
| Radeon HD5870 | 0.50 | 0.95 | 0.50 | 1.00 | 0.80 | ||
| GeForce 6200 | 4.00 | 7.00 | 6.00 | 4.00 | 12.00 | ||
| GeForce 8800 | 15.00 | 23.00 | 15.00 | 12.00 | 29.00 | ||
| Encoding, D3D ALU+TEX instruction slots | |||||||
| SM3.0 | 1 | 26 | 4 | 5 | 17 | ||
| Decoding, D3D ALU+TEX instruction slots | |||||||
| SM3.0 | 8 | 18 | 9 | 8 | 22 | ||
Quality Comparison
Quality comparison in a single table. PSNR based, higher numbers are better.
| Method | PSNR, dB |
|---|---|
| #1: X & Y | 18.629 |
| #3: Spherical | 42.042 |
| #4: Spheremap | 48.071 |
| #7: Stereographic | 44.147 |
| #8: Per pixel view | 38.730 |
Changelog
- 2010 03 25: Added Method #8: Per-pixel View Space. Suggested by Yuriy O’Donnell.
- 2010 03 24: Stop! Everything before was wrong! Old article moved here.
- 2009 08 12: Added Method #7: Stereographic projection. Suggested by Sean Barrett and Ignacio Castano.
- 2009 08 12: Optimized Method #5, suggested by Steve Hill.
- 2009 08 08: Added power difference images.
- 2009 08 07: Optimized Method #4: Sphere map. Suggested by Irenee Caroulle.
- 2009 08 07: Added Method #6: Lambert Azimuthal Equal Area. Suggested by Sean Barrett.
- 2009 08 05: Added Method #5: Cry Engine 3. Suggested by Steve Hill.
- 2009 08 05: Improved quality of Method #3a: round values in texture LUT.
- 2009 08 05: Added MSE and PSNR values for all methods.
- 2009 08 04: Added Method #3a: Spherical Coordinates w/ texture LUT.
- 2009 08 04: Method #1: 1-dot(n.xy,n.xy) is slightly better than 1-n.x*n.x-n.y*n.y (better pipelining on NV and ATI). Suggested by Arseny “zeux” Kapoulkine.
[转]Compact Normal Storage for Small G-Buffers的更多相关文章
- DirectX11 With Windows SDK--36 延迟渲染基础
前言 随着图形硬件变得越来越通用和可编程化,采用实时3D图形渲染的应用程序已经开始探索传统渲染管线的替代方案,以避免其缺点.其中一项最流行的技术就是所谓的延迟渲染.这项技术主要是为了支持大量的动态灯光 ...
- Hierarchical Storage structure
1.hierarchical storage structure This notion of inserting a smaller, faster storage device (e.g ...
- SQLiteSpy - A fast and compact GUI database manager for SQLite
http://www.yunqa.de/delphi/doku.php/products/sqlitespy/index SQLiteSpy is a fast and compact GUI dat ...
- free -g 说明
free -g 说明: free -g -/+ buffers/cache 说明: buffer 写缓存,表示脏数据写入磁盘之前缓存一段时间,可以释放.sync命令可以把buffer强制写入硬盘 ca ...
- Method and Apparatus for Providing Highly-Scalable Network Storage for Well-Gridded Objects
An apparatus comprising a plurality of storage nodes comprising a plurality of corresponding storage ...
- Flexible implementation of a system management mode (SMM) in a processor
A system management mode (SMM) of operating a processor includes only a basic set of hardwired hooks ...
- [MySQL Reference Manual]14 InnoDB存储引擎
14 InnoDB存储引擎 14 InnoDB存储引擎 14.1 InnoDB说明 14.1.1 InnoDB作为默认存储引擎 14.1.1.1 存储引擎的趋势 14.1.1.2 InnoDB变成默认 ...
- UBIFS FAQ and HOWTO
转:http://www.linux-mtd.infradead.org/faq/ubifs.html UBIFS FAQ and HOWTO Table of contents How do I e ...
- C++开源库集合
| Main | Site Index | Download | mimetic A free/GPL C++ MIME Library mimetic is a free/GPL Email lib ...
随机推荐
- 盘点六大在中国复制失败的O2O案例
O2O概念自2010年11月被引入中国以来被各方迅速炒热,各种分类信息网站.点评类网站.团购类网站.订餐类网站等都开始宣称自己为O2O模式.O2O最基本的解释是通过线上引导流量去线下体验和消费,从这个 ...
- php抓取页面的几种方法详解
本篇文章是对php抓取页面的几种方法进行了详细的分析介绍,需要的朋友参考下 在 做一些天气预报或者RSS订阅的程序时,往往需要抓取非本地文件,一般情况下都是利用php模拟浏览器的访问,通过http请求 ...
- linux 和 ecos 内核线程创建/信号量/event等对比
ecos: int gx_thread_create (const char *thread_name, gx_thread_id *thread_id, void(*entry_func)(void ...
- Android TextView中的ellipsize属性
TextView中有个ellipsize属性,作用是当文字过长时,该控件该如何显示,解释如下: android:ellipsize=”start”—–省略号显示在开头 android:ellipsiz ...
- Hibernate的优缺点
Hibernate是一个开放源代码的对象关系映射框架,它对JDBC进行了非常轻量级的对象封装,使得Java程序员可以随心所欲的使用对象编程思维来操纵数据库. Hibernate可以应用在任何使用JDB ...
- Understanding Item Import and Debugging Problems with Item Import (Doc ID 268968.1)
In this Document Purpose Details Scenario 1: Testing the basic item import with minimum columns po ...
- 如何计算IP地址及CIDR,子网掩码计算
如何计算IP地址及CIDR 一. IP地址概念 IP地址是一个32位的二进制数,它由网络ID和主机ID两部份组成,用来在网络中唯一的标识的一台计算机.网络ID用来标识计算机所处的网段:主 机ID用来标 ...
- 利用逻辑运算符?"三个数字比大小
static void Main(string[] args) { int a, b, c; while (true) ...
- 转自 z55250825 的几篇关于FFT的博文(二)
题目大意:高精度乘法. fft的实现貌似有很多种,咱先写的是一种递归的fft,应该算是比较快的了吧.参考了 Evil君 的代码,那个运算符重载看的咱P党泪流满面. (没想到P竟然有运算符重载咩 ...
- c语言实用功能库函数#include<stdlib.h>
实用函数<stdlib.h> 在头文件<stdlib.h>中说明了用于数值转换.内存分配以及具有其他相似任务的函数. 1 atof #include <stdlib.h& ...