本人水平有限,若有错误也请指正~

上面说到pathtracing(pt)的一些优点和缺点,优点即其实现很简单,这就是大概为什么当今市面上流行的很多渲染器如今都相继采用pathtracing算法为核心进行实现,但是pathtracing的最大缺点就是收敛速度很慢,其原因就在于全局光照的那个积分式要求在半球面上连续采样,这就需要每次发射n条采样光线,n的数量直接决定了最终图像的质量(对于pt来讲它的图像质量取决于噪点数量的多少,噪点多的话有点像用老式胶片拍出来的照片那样),一般n越大图像过渡越光滑。前人学者们也早已发现了pt的这个收敛慢的缺点,于是有了多种方法来降噪,具体来讲这些降噪手段有以下几种:

1)改进pt算法。如后续比较经典的bidirectional pathtracing(bdpt),metropolis pathtracing

2)保持pt算法不变,但对其内部的采样机制进行改进。比如:

  a. 根据BRDF函数进行重要性采样;

  b. 采样时所用到的均匀分布,一般都是用c语言自带的rand函数或者c++ random库里面的uniform distribution来完成。但是这种采样方式对于pt算法本身来讲,并不利于最终图像的收敛,

    现在的很多渲染器内部都是利用一种叫“低差异序列(Low Discrepancy Sequence)”的采样方式(详见https://zhuanlan.zhihu.com/p/20197323?columnSlug=graphics)。

  c. pt算法的一大优势就是屏幕每个像素的采样都是独立的,每次迭代也是互相独立的,这就可以利用并行算法来对算法进行加速。CPU端有OpenMP,而且在诸多流行编译器内都直接内置集成了,

    GPU端,N卡有CUDA,A卡可以利用OpenCL,而且渲染速度与GPU数量成线性增长关系,这就有点像暴力解算的意思,GPU端实现的最大区别就是不能递归(虽然N卡早已支持递归,但是

    在计算规模很大的情况下每个线程的栈就很浅,再加上本身pt算法的采样光线有可能采样到很深的深度,所以并不是很敢用),递归主要发生在光线与场景求交的过程中,由于对场景采用的

    是基于层级的划分(Bounding Volume Hierarchy)本身就有递归的意思。不能递归这点只要自己构造合理数据结构,把递归所用的那些信息尽量在构造BVH的时候自然写入结构内部即可,

    可以比较容易的将递归降级为循环。

    (ps:也难怪现在很多步进式渲染器都采用pathtracing...实现简单,开发简单,而且要提速的话直接借助并行算法,只不过多配几个GPU就行了,这就节省了一堆人力,成本就低...)

而最近浏览网页也发现了,很多人和企业也倾向于利用CPU渲染,具体请参见http://stackoverflow.com/questions/38029698/why-do-we-use-cpus-for-ray-tracing-instead-of-gpus

里面有个人的回答也是不错:

I'm one of the rendering software architects at a large VFX and animated feature studio with a proprietary renderer (not Pixar, though I was once the rendering software architect there as well, long, long ago).

Almost all high-quality rendering for film (at all the big studios, with all the major renderers) is CPU only. There are a bunch of reasons why this is the case. In no particular order, some of the really compelling ones to give you the flavor of the issues:

  • GPUs only go fast when everything is in memory. The biggest GPU cards have, what, 12GB or so, and it has to hold everything. Well, we routinely render scenes with 30GB of geometry and that reference 1TB or more of texture. Can't load that into GPU memory, it's literally two orders of magnitude too big. So GPUs are simply unable to deal with our biggest (or even average) scenes. (With CPU renderers, we can page stuff from disk whenever we need. GPUs aren't good at that.)

  • Don't believe the hype, ray tracing with GPUs is not an obvious win over CPU. GPUs are great at highly coherent work (doing the same things to lots of data at once). Ray tracing is very incoherent (each ray can go a different direction, intersect different objects, shade different materials, access different textures), and so this access pattern degrades GPU performance very severely. It's only very recently that GPU ray tracing could match the best CPU-based ray tracing code, and even though it has surpassed it, it's not by much, not enough to throw out all the old code and start fresh with buggy fragile code for GPUs. And the biggest, most expensive scenes are the ones where GPUs are only marginally faster. Being lots faster on the easy scenes is not really important to us.

  • If you have 50 or 100 man years of production-hardened code in your CPU-based renderer, you just don't throw it out and start over in order to get a 2x speedup. Software engineering effort, stability, and so on, is more important and a bigger cost factor.

  • Similarly, if your studio has an investment in a data center holding 20,000 CPU cores, all in the smallest, most power and heat-efficient form factor you can, that's also a sunk cost investment you don't just throw away. Replacing them with new machines containing top of the line GPUs vastly increases the cost of your render farm, and they are bigger and produce more heat, so it literally might not fit in your building.

  • Amdahl's Law: The actual "rendering" per se is only one stage in generating the scenes, and GPUs don't help with it. Let's say that it takes 1 hour to fully generate and export the scene to the renderer, and 9 hours to "render", and out of that 9 hours, an hour is reading texture, volumes, and other data from disk. So out of the total 10 hours of how the user experiences rendering (push button until final image is ready), 8 hours is potentially sped up with GPUs. So, even if GPU was 10x as fast as CPU for that part, you go from 10 hours to 1+1+0.8 = nearly 3 hours. So 10x GPU speedup only translates to 3x actual gain. If GPU was 1,000,000x faster than CPU for ray tracing, you still have 1+1+tiny, which is only a 5x speedup.

But what's different about games? Why are GPUs good for games but not film?

First of all, when you make a game, remember that it's got to render in real time -- that means your most important constraint is the 60Hz (or whatever) frame rate, and you sacrifice quality or features where necessary to achieve that. In contrast, with film, the unbreakable constraint is making the director and VFX supervisor happy with the quality and look he or she wants, and how long it takes you to get that is (to a degree) secondary.

Also, with a game, you render frame after frame after frame, live in front of every user. But with film, you effectively are rendering ONCE, and what's delivered to theaters is a movie file -- so moviegoers will never know or care if it took you 10 hours per frame, but they will notice if it doesn't look good. So again, there is less of a penalty placed on those renders taking a long time, as long as they look fabulous.

With a game, you don't really know what frames you are going to render, since the player may wander all around the world, view from just about anywhere. You can't and shouldn't try to make it all perfect, you just want it to be good enough all the time. But for a film, the shots are all hand-crafted! A tremendous amount of human time goes into composing, animating, lighting, and compositing every shot, and then you only need to render it once. Think about the economics -- once 10 days of calendar (and salary) has gone into lighting and compositing the shot just right, the advantage of rendering it in an hour (or even a minute) versus overnight, is pretty small, and not worth any sacrifice of quality or achievable complexity of the image.

对pathtracing的一些个人理解的更多相关文章

  1. 理解CSS视觉格式化

    前面的话   CSS视觉格式化这个词可能比较陌生,但说起盒模型可能就恍然大悟了.实际上,盒模型只是CSS视觉格式化的一部分.视觉格式化分为块级和行内两种处理方式.理解视觉格式化,可以确定得到的效果是应 ...

  2. 彻底理解AC多模式匹配算法

    (本文尤其适合遍览网上的讲解而仍百思不得姐的同学) 一.原理 AC自动机首先将模式组记录为Trie字典树的形式,以节点表示不同状态,边上标以字母表中的字符,表示状态的转移.根节点状态记为0状态,表示起 ...

  3. 理解加密算法(三)——创建CA机构,签发证书并开始TLS通信

    接理解加密算法(一)--加密算法分类.理解加密算法(二)--TLS/SSL 1 不安全的TCP通信 普通的TCP通信数据是明文传输的,所以存在数据泄露和被篡改的风险,我们可以写一段测试代码试验一下. ...

  4. node.js学习(三)简单的node程序&&模块简单使用&&commonJS规范&&深入理解模块原理

    一.一个简单的node程序 1.新建一个txt文件 2.修改后缀 修改之后会弹出这个,点击"是" 3.运行test.js 源文件 使用node.js运行之后的. 如果该路径下没有该 ...

  5. 如何一步一步用DDD设计一个电商网站(一)—— 先理解核心概念

    一.前言     DDD(领域驱动设计)的一些介绍网上资料很多,这里就不继续描述了.自己使用领域驱动设计摸滚打爬也有2年多的时间,出于对知识的总结和分享,也是对自我理解的一个公开检验,介于博客园这个平 ...

  6. 学习AOP之透过Spring的Ioc理解Advisor

    花了几天时间来学习Spring,突然明白一个问题,就是看书不能让人理解Spring,一方面要结合使用场景,另一方面要阅读源代码,这种方式理解起来事半功倍.那看书有什么用呢?主要还是扩展视野,毕竟书是别 ...

  7. ThreadLocal简单理解

    在java开源项目的代码中看到一个类里ThreadLocal的属性: private static ThreadLocal<Boolean> clientMode = new Thread ...

  8. JS核心系列:理解 new 的运行机制

    和其他高级语言一样 javascript 中也有 new 运算符,我们知道 new 运算符是用来实例化一个类,从而在内存中分配一个实例对象. 但在 javascript 中,万物皆对象,为什么还要通过 ...

  9. 深入理解JS 执行细节

    javascript从定义到执行,JS引擎在实现层做了很多初始化工作,因此在学习JS引擎工作机制之前,我们需要引入几个相关的概念:执行环境栈.全局对象.执行环境.变量对象.活动对象.作用域和作用域链等 ...

随机推荐

  1. finally块执行时间

    finally块在代码中什么时候被执行? 在Java语言的异常处理中,finally块的作用十九为了保证无论出现什么情况,finally块里面的代码一定会被执行.由于程序执行return就以为这结束对 ...

  2. 仿QQ空间动态界面分享

    先看看效果: 用极少的代码实现了 动态详情 及 二级评论 的 数据获取与处理 和 UI显示与交互,并且高解耦.高复用.高灵活. 动态列表界面MomentListFragment支持 下拉刷新与上拉加载 ...

  3. 用惯了Task,你应该也需要了解它的内部调度机制TaskScheduler

    平时我们在用多线程开发的时候少不了Task,确实task给我们带来了巨大的编程效率,在Task底层有一个TaskScheduler,它决定了task该如何执行,而在 .net framework中有两 ...

  4. [ext4]06 磁盘布局 - 特殊inode

    Ext4预留了一些inode做特殊特性使用,见下表: inode Purpose 0 不存在,Ext4中不存在inode 0. 1 存放损坏的数据块链表 2 根目录 3 User quota. 用户q ...

  5. caffe:使用C++来提取任意一张图片的特征(从内存读取数据)

    0x00 关于使用C++接口来提取特征,caffe官方提供了一个extract_features.cpp的例程,但是这个文件的输入是blob数据,即使输入层使用的是ImageData,也需要在depl ...

  6. Android计时器 android.widget.Chronometer

    说起做定时器,大家一般会想到Timer和Executors的定时器线程池,其实用这两个做都会有问题,在停止和重新计时时你回发现无法停止或者说计时加快(加快是因为多个线程在记录同一个变量),Androi ...

  7. oracle 12c 新特性之(相同字段上的多重索引、ddl 日志、限制PGA的大小、分页查询)

    1. 相同字段上的多重索引   在Oracle 12c R1之前,一个字段是无法以任何形式拥有多个索引的.或许有人会想知道为什么通常一个字段需要有多重索引,事实上需要多重索引的字段或字段集合是很多的. ...

  8. 更新jar包里的配置文件

    更新jar包里的配置文件 起因 从笔记本传了个jar到服务器,运行的时候才发现配置文件一个ip项填错了.本来很简单的问题,maven重新打包就可以了,但是30多M的jar包就因为一个配置项错了又要重新 ...

  9. JS 基本数据类型和引用数据类型

    本文章已收录于:   .embody { padding: 10px 10px 10px; margin: 0 -20px; border-bottom: solid 1px #ededed } .e ...

  10. 区划代码 node 版爬虫尝试

    前言 对于区划代码数据,很多人都不会陌生,大多公司数据库都会维护一份区划代码,包含省市区等数据.区划信息跟用户信息息息相关,往往由于历史原因很多数据都是比较老的数据,且不会轻易更改.网上也有很多人提供 ...