https://software.intel.com/en-us/blogs/2013/07/18/order-independent-transparency-approximation-with-pixel-synchronization

Order-Independent Transparency Approximation with Pixel Synchronization

Submitted by Leigh Davies (Intel) on Thu, 07/18/2013 - 10:09

Transparency is a fundamental challenge in real-time rendering due to the difficulty of compositing in the right order an arbitrary number of transparent layers. The Order Independent Transparency sample using Intel® Iris™ Graphics extension for pixel synchronization shows a real-time solution using the extensions available on 4th Generation Intel® Core™, the algorithm was used by Codemasters in GRID2 to improve rendering of foliage and semi-transparent track side objects as shown in Figure 1.

Figure 1: The great outdoors in GRID 2 by Codemasters with OIT applied to the foliage and chain link fences.

This sample uses a new algorithm that builds on the work originally detailed in the following article on adaptive-transparencyby Marco Salvi, Jefferson Montgomery, and Aaron Lefohn. The article detailed how Adaptive Transparency can closely approximate the ground-truth results obtained from A-buffer compositing but is 5x to 40x faster, rather than storing all color and depth data in per-pixel lists and then sorting and compositing them (Figure 2) we re-factored the alpha-blending equation to avoid recursion and sorting and produce a “visibility function” (VF) (Figure 3).

The number of steps in the visibility function corresponds to the number of nodes used to store visibility information on a per-pixel level during the resolve stage, as pixels are added the algorithm calculates which previous pixels can be merged to create the smallest variation in the visibility function while maintaining the data set size. The final stage is to evaluates the visibility function vis() and composite fragments using the formula final_color= . 

The new algorithm makes 2 main changes over this approach; the first is the use of the Intel® Iris™ Graphics pixel synchronization extension. Pixel synchronization provides ordered Read/Modify/Write (RMW) for a given pixel. If two pixels in flight are being rendered to the same screen location at the point of the synchronization primitive in the pixel shader only one shader is allowed to continue and the one chosen is dependent on the order submitted to the front end. The remaining shader(s) resume once the first shader has completed in the order they were submitted. As shown in figure 4.

Figure 4: Pixel Shader Ordering

Using this we are able to merge pixels as they are rendered rather than in the resolve phase. Merging in the insertion phase removes the requirement to store the per-pixel list, meaning the algorithm now has a fixed memory size and removes any potential artifacts normally seen with A-Buffer and Adaptive Transparency algorithms when information is lost when the linked lists storage overflows. In addition, this also improves performance further by further reducing bandwidth requirements. The second major change was to the original algorithm used to merge the incoming pixels into the fixed set of nodes, rather than using the adaptive routine to create the visibility function we approximate the results by sorting and then merging the furthest pixels. This worked very well when similar color pixels are merged which is the case when rendering lots of foliage. Different insertion routines can easily be used based on the user’s requirements. The sample consists of a simple scene designed to showcase the difficulties in rendering complex geometry where transparency plays a major role in correctly rendering the materials, as shown in figure 5.

Figure 5: Intel OIT sample.

The user can select between alternatives transparency rendering techniques including;

  1. Alpha Blending (Figure 6), where solid geometry is rendered first followed by transparent objects sorted from the inner objects outwards.
  2. Alpha Blending with Alpha to Coverage (Figure 7) which requires MSAA and allows the use of depth buffering on simple transparent objects such as foliage.
  3. The original Adaptive Transparency routine detailed in the linked article implemented using DX11.
  4. The OIT algorithm using Intel® Iris™ Graphics pixel synchronization extension (Figure 8).

To run the final option you will requires hardware that supports the Intel® Iris™ Graphics pixel synchronization extension. The visual differences can be seen below

The sample makes no attempt to hide artifacts in the original alpha blending solution which in a real game could be partly solved by further subdividing the models and sorting relative to the camera, the intent is to simply show the types of artifacts OIT solves without the need to presort the semi-transparent geometry before submission to graphics API for rendering. There is a check box that allows the alpha blended foliage to read/write to the depth buffer showing the types of halo patterns that this would normally be caused if semi-transparent geometry updated the depth buffer. This debug option is included better show the amount of pixels passing the alpha discard test in the foliage shaders.

The sample performs the following steps when running the pixel synchronization OIT algorithm. First all solid geometry is rendered to the scene. Second we render any material that requires transparency; any transparent objects in this second stage updates both a ClearMaskUAV that can be viewed as a debug option and an AOIT surface containing multiple nodes of color and depth information per-pixel. Finally a full screen resolve pass merges the transparent pixels onto the back buffer where the ClearMaskUAV has been set. For debugging the sample allows you to view the depth buffer and to disable the OIT resolve to show the amount of geometry rendered using a standard forward rendering approach

The Intel approximated OIT algorithm offers different quality levels where the data is compressed into 2, 4 or 8 nodes. More nodes can more-accurately approximate the visibility function, but require additional memory and bandwidth. GRID2 used the 2 node version as the tradeoff between performance and visual quality was very favorable with only a minor visual difference and a noticeable performance gain.

The main code for the algorithm can be found in the pixel shader function WriteNewPixelToAoit in AOIT.hlsl and the AOITSPResolvePS function inAOIT_resolve.hlsl. The most expensive routine is generally WriteNewPixelToAoit in the insertion phase, by ensuring any pixel shader calling this routine uses [earlydepthstencil] testing to reject hidden pixels significant performance gains can be made. The more accurate and comprehensive the depth buffer at this point the more transparent geometry can be occluded leading to an optimization in GRID2 where even the trees were rendered to the depth buffer if the foliage was close to 100% opaque to reduce unnecessary overdraw.

One important point to note when running the sample is it starts with VSync enabled by default; this is done to conserve platform power and is strongly encouraged as the default behavior when writing PC graphics applications that are expected to run on portable PC’s.

Performance measurements should be taken with VSync off. When run without VSync the sample will display a range of statistics that are derived from DirectX timing queries on the GPU, these break down the time take into rendering the solid objects in the scene, rendering the transparent objects into the UAV surface and the final resolve pass, the statistics are disabled when VSync is enabled as they can’t be relied upon as an accurate reflection of the time taken to execute the algorithms. When VSync is enabled the system can reduce clock speed to conserve power especially on mobile platforms but a side effect of this is a very efficient algorithm can allow the system to clock down lower and distort the timing measurements.

   
Attachment

Intel OIT demo的更多相关文章

  1. Intel daal4py demo运行过程

    daal安装(记得先安装anaconda): git clone https://github.com/IntelPython/daal4py.git cd daal4py conda create ...

  2. [Intel Edison开发板] 03、Edison开发IDE入门及跑官方提供的DEMO

    一.启动Eclipse爱迪生开发板IDE eclipse开发环境在iss-iot-win_03-14-16中,但是一定每次都是点bat脚本启动,否则就会少东西(windows->preferen ...

  3. Intel® QAT加速卡之编程demo框架

    QAT demo流程框架 示例一: 代码路径:qat1.5.l.1.13.0-19\quickassist\lookaside\access_layer\src\sample_code\functio ...

  4. Wikipedia : OIT history

    http://en.wikipedia.org/wiki/Order-independent_transparency Order-independent transparency From Wiki ...

  5. Intel Media SDK H264 encoder GOP setting

    1 I帧,P帧,B帧,IDR帧,NAL单元 I frame:帧内编码帧,又称intra picture,I 帧通常是每个 GOP(MPEG 所使用的一种视频压缩技术)的第一个帧,经过适度地压缩,做为随 ...

  6. [Intel Edison开发板] 05、Edison开发基于MRAA实现IO控制,特别是UART通信

    一.前言 下面是本系列文章的前几篇: [Intel Edison开发板] 01.Edison开发板性能简述 [Intel Edison开发板] 02.Edison开发板入门 [Intel Edison ...

  7. [Intel Edison开发板] 04、Edison开发基于nodejs和redis的服务器搭建

    一.前言 intel-iot-examples-datastore 是Intel提供用于所有Edison开发板联网存储DEMO所需要的服务器工程.该工程是基于nodejs和redis写成的一个简单的工 ...

  8. RCF进程间通信Demo程序

    在上一篇文章RPC通信框架--RCF介绍中,介绍了RCF的优点,本篇文章从头开始演示如何用RCF编写一个跨进程通信的Demo程序. 将RCF编译为静态库 从官网下载到的源码中包含一个RCF的项目,但是 ...

  9. [ZZ] KlayGE 游戏引擎 之 Order Independent Transparency(OIT)

    转载请注明出处为KlayGE游戏引擎,本文的永久链接为http://www.klayge.org/?p=2233 http://dogasshole.iteye.com/blog/1429665 ht ...

随机推荐

  1. Mysql常用函数列举

    1,HEX(),十六进制转化;eg:select HEX('mysql'); select X'6D7973716C'; 2,bit_count(),计算二进制数中包含1的个数;bit_or(),对两 ...

  2. linux环境下配置虚拟主机域名

    linux环境下面配置虚拟主机域名 第一步:在root目录下面(即根目录)ls(查看文件)cd进入etc目录find hosts文件vi hosts 打开hosts文件并进行编辑在打开的文件最下面添加 ...

  3. MySQL和PHP基础考试错题回顾

    13.关于exit( )与die( )的说法正确的是( B) C A.当exit( )函数执行会停止执行下面的脚本,而die()无法做到 B.当die()函数执行会停止执行下面的脚本,而exit( ) ...

  4. 设计模式学习之建造者模式(Builder,创建型模式)(6)

    假如我们需要建造一个房子,并且我们也不知道如何去建造房子,所以就去找别人帮我们造房子 第一步: 新建一个房子类House,里面有房子该有的属性,我们去找房子建造者接口HouseBuilder,我们要建 ...

  5. linux下php增加curl扩展,生成curl.so文件

    进入php源代码目录 cd /php5.6.9/ext/curl 执行生成so文件编译模式 /usr/local/php/bin/phpize 编译curl扩展 ./configure --with- ...

  6. Xamarin.Android开发实践(十一)

    Xamarin.Android之使用百度地图起始篇 一.前言 如今跨平台开发层出不穷,而对于.NET而言时下最流行的当然还是 Xamarin,不仅仅能够让我们在熟悉的Vs下利用C#开发,在对原生态类库 ...

  7. M方法和D方法的区别

    M方法和D方法的区别 ThinkPHP 中M方法和D方法都用于实例化一个模型类,M方法 用于高效实例化一个基础模型类,而 D方法 用于实例化一个用户定义模型类. 使用M方法 如果是如下情况,请考虑使用 ...

  8. loj 1017(dp)

    题目链接:http://acm.hust.edu.cn/vjudge/problem/viewProblem.action?id=25843 思路:我们可以发现题目与点的X坐标没有关系,于是可以直接对 ...

  9. CC2540开发板学习笔记(三)——外部中断

    一.实验内容 通过外部中断方式依次按下按键S1控制LED1的亮灭 二.实验过程 1.电路原理图同上 2.中断的概念 比如说我们在执行main函数时,突然来了个指令.优先级比现在执行的main还高,那我 ...

  10. AndroidStudio

    Google官方的Android集成开发环境(IDE = Integrated Development Environment),Eclipse + Adt插件的代替者. 实用设置: android ...