对pathtracing的一些个人理解

本人水平有限，若有错误也请指正~

上面说到pathtracing（pt）的一些优点和缺点，优点即其实现很简单，这就是大概为什么当今市面上流行的很多渲染器如今都相继采用pathtracing算法为核心进行实现，但是pathtracing的最大缺点就是收敛速度很慢，其原因就在于全局光照的那个积分式要求在半球面上连续采样，这就需要每次发射n条采样光线，n的数量直接决定了最终图像的质量（对于pt来讲它的图像质量取决于噪点数量的多少，噪点多的话有点像用老式胶片拍出来的照片那样），一般n越大图像过渡越光滑。前人学者们也早已发现了pt的这个收敛慢的缺点，于是有了多种方法来降噪，具体来讲这些降噪手段有以下几种：

1）改进pt算法。如后续比较经典的bidirectional pathtracing（bdpt），metropolis pathtracing

2）保持pt算法不变，但对其内部的采样机制进行改进。比如：

　　a. 根据BRDF函数进行重要性采样；

　　b. 采样时所用到的均匀分布，一般都是用c语言自带的rand函数或者c++ random库里面的uniform distribution来完成。但是这种采样方式对于pt算法本身来讲，并不利于最终图像的收敛，

　　　　现在的很多渲染器内部都是利用一种叫“低差异序列（Low Discrepancy Sequence）”的采样方式（详见https://zhuanlan.zhihu.com/p/20197323?columnSlug=graphics）。

　　c. pt算法的一大优势就是屏幕每个像素的采样都是独立的，每次迭代也是互相独立的，这就可以利用并行算法来对算法进行加速。CPU端有OpenMP，而且在诸多流行编译器内都直接内置集成了，

　　　　GPU端，N卡有CUDA，A卡可以利用OpenCL，而且渲染速度与GPU数量成线性增长关系，这就有点像暴力解算的意思，GPU端实现的最大区别就是不能递归（虽然N卡早已支持递归，但是

　　　　在计算规模很大的情况下每个线程的栈就很浅，再加上本身pt算法的采样光线有可能采样到很深的深度，所以并不是很敢用），递归主要发生在光线与场景求交的过程中，由于对场景采用的

　　　　是基于层级的划分（Bounding Volume Hierarchy）本身就有递归的意思。不能递归这点只要自己构造合理数据结构，把递归所用的那些信息尽量在构造BVH的时候自然写入结构内部即可，

　　　　可以比较容易的将递归降级为循环。

　　　　（ps：也难怪现在很多步进式渲染器都采用pathtracing...实现简单，开发简单，而且要提速的话直接借助并行算法，只不过多配几个GPU就行了，这就节省了一堆人力，成本就低...）

而最近浏览网页也发现了，很多人和企业也倾向于利用CPU渲染，具体请参见http://stackoverflow.com/questions/38029698/why-do-we-use-cpus-for-ray-tracing-instead-of-gpus

里面有个人的回答也是不错：

I'm one of the rendering software architects at a large VFX and animated feature studio with a proprietary renderer (not Pixar, though I was once the rendering software architect there as well, long, long ago).

Almost all high-quality rendering for film (at all the big studios, with all the major renderers) is CPU only. There are a bunch of reasons why this is the case. In no particular order, some of the really compelling ones to give you the flavor of the issues:

GPUs only go fast when everything is in memory. The biggest GPU cards have, what, 12GB or so, and it has to hold everything. Well, we routinely render scenes with 30GB of geometry and that reference 1TB or more of texture. Can't load that into GPU memory, it's literally two orders of magnitude too big. So GPUs are simply unable to deal with our biggest (or even average) scenes. (With CPU renderers, we can page stuff from disk whenever we need. GPUs aren't good at that.)
Don't believe the hype, ray tracing with GPUs is not an obvious win over CPU. GPUs are great at highly coherent work (doing the same things to lots of data at once). Ray tracing is very incoherent (each ray can go a different direction, intersect different objects, shade different materials, access different textures), and so this access pattern degrades GPU performance very severely. It's only very recently that GPU ray tracing could match the best CPU-based ray tracing code, and even though it has surpassed it, it's not by much, not enough to throw out all the old code and start fresh with buggy fragile code for GPUs. And the biggest, most expensive scenes are the ones where GPUs are only marginally faster. Being lots faster on the easy scenes is not really important to us.
If you have 50 or 100 man years of production-hardened code in your CPU-based renderer, you just don't throw it out and start over in order to get a 2x speedup. Software engineering effort, stability, and so on, is more important and a bigger cost factor.
Similarly, if your studio has an investment in a data center holding 20,000 CPU cores, all in the smallest, most power and heat-efficient form factor you can, that's also a sunk cost investment you don't just throw away. Replacing them with new machines containing top of the line GPUs vastly increases the cost of your render farm, and they are bigger and produce more heat, so it literally might not fit in your building.
Amdahl's Law: The actual "rendering" per se is only one stage in generating the scenes, and GPUs don't help with it. Let's say that it takes 1 hour to fully generate and export the scene to the renderer, and 9 hours to "render", and out of that 9 hours, an hour is reading texture, volumes, and other data from disk. So out of the total 10 hours of how the user experiences rendering (push button until final image is ready), 8 hours is potentially sped up with GPUs. So, even if GPU was 10x as fast as CPU for that part, you go from 10 hours to 1+1+0.8 = nearly 3 hours. So 10x GPU speedup only translates to 3x actual gain. If GPU was 1,000,000x faster than CPU for ray tracing, you still have 1+1+tiny, which is only a 5x speedup.

But what's different about games? Why are GPUs good for games but not film?

First of all, when you make a game, remember that it's got to render in real time -- that means your most important constraint is the 60Hz (or whatever) frame rate, and you sacrifice quality or features where necessary to achieve that. In contrast, with film, the unbreakable constraint is making the director and VFX supervisor happy with the quality and look he or she wants, and how long it takes you to get that is (to a degree) secondary.

Also, with a game, you render frame after frame after frame, live in front of every user. But with film, you effectively are rendering ONCE, and what's delivered to theaters is a movie file -- so moviegoers will never know or care if it took you 10 hours per frame, but they will notice if it doesn't look good. So again, there is less of a penalty placed on those renders taking a long time, as long as they look fabulous.

With a game, you don't really know what frames you are going to render, since the player may wander all around the world, view from just about anywhere. You can't and shouldn't try to make it all perfect, you just want it to be good enough all the time. But for a film, the shots are all hand-crafted! A tremendous amount of human time goes into composing, animating, lighting, and compositing every shot, and then you only need to render it once. Think about the economics -- once 10 days of calendar (and salary) has gone into lighting and compositing the shot just right, the advantage of rendering it in an hour (or even a minute) versus overnight, is pretty small, and not worth any sacrifice of quality or achievable complexity of the image.

对pathtracing的一些个人理解的更多相关文章

理解CSS视觉格式化
前面的话 CSS视觉格式化这个词可能比较陌生,但说起盒模型可能就恍然大悟了.实际上,盒模型只是CSS视觉格式化的一部分.视觉格式化分为块级和行内两种处理方式.理解视觉格式化,可以确定得到的效果是应 ...
彻底理解AC多模式匹配算法
(本文尤其适合遍览网上的讲解而仍百思不得姐的同学) 一.原理 AC自动机首先将模式组记录为Trie字典树的形式,以节点表示不同状态,边上标以字母表中的字符,表示状态的转移.根节点状态记为0状态,表示起 ...
理解加密算法（三）——创建CA机构，签发证书并开始TLS通信
接理解加密算法(一)--加密算法分类.理解加密算法(二)--TLS/SSL 1 不安全的TCP通信普通的TCP通信数据是明文传输的,所以存在数据泄露和被篡改的风险,我们可以写一段测试代码试验一下. ...
node.js学习（三）简单的node程序&&模块简单使用&&commonJS规范&&深入理解模块原理
一.一个简单的node程序 1.新建一个txt文件 2.修改后缀修改之后会弹出这个,点击"是" 3.运行test.js 源文件使用node.js运行之后的. 如果该路径下没有该 ...
如何一步一步用DDD设计一个电商网站（一）—— 先理解核心概念
一.前言 DDD(领域驱动设计)的一些介绍网上资料很多,这里就不继续描述了.自己使用领域驱动设计摸滚打爬也有2年多的时间,出于对知识的总结和分享,也是对自我理解的一个公开检验,介于博客园这个平 ...
学习AOP之透过Spring的Ioc理解Advisor
花了几天时间来学习Spring,突然明白一个问题,就是看书不能让人理解Spring,一方面要结合使用场景,另一方面要阅读源代码,这种方式理解起来事半功倍.那看书有什么用呢?主要还是扩展视野,毕竟书是别 ...
ThreadLocal简单理解
在java开源项目的代码中看到一个类里ThreadLocal的属性: private static ThreadLocal<Boolean> clientMode = new Thread ...
JS核心系列：理解 new 的运行机制
和其他高级语言一样 javascript 中也有 new 运算符,我们知道 new 运算符是用来实例化一个类,从而在内存中分配一个实例对象. 但在 javascript 中,万物皆对象,为什么还要通过 ...
深入理解JS 执行细节
javascript从定义到执行,JS引擎在实现层做了很多初始化工作,因此在学习JS引擎工作机制之前,我们需要引入几个相关的概念:执行环境栈.全局对象.执行环境.变量对象.活动对象.作用域和作用域链等 ...

随机推荐

网络语音视频技术浅议（附多个demo源码下载）
我们在开发实践中常常会涉及到网络语音视频技术.诸如即时通讯.视频会议.远程医疗.远程教育.网络监控等等,这些网络多媒体应用系统都离不开网络语音视频技术.本人才疏学浅,对于网络语音视频技术也仅仅是略知皮 ...
【模板】二分图最大权完美匹配KM算法
hdu2255模板题 KM是什么意思,详见百度百科. 总之知道它可以求二分图最大权完美匹配就可以了,时间复杂度为O(n^3). 给张图. 二分图有了边权,求最大匹配下的最大权值. 所以该怎么做呢?对啊 ...
Keepalived高可用集群实践
(1)实践的硬件环境准备准备4台物理服务器或者4台VM虚拟机,其中两台用来做Keepalived服务器,两台做web测试站点 HOSTNAME I P 解释 lb01 10.0.0.7 K ...
List<T>对元素的查找。
要在List<T>中查找特定的元素,可以使用Contains() .IndexOf().LastIndexOf()和BinarySearch()方法.除了 LastIndexOf()是从最 ...
HTML、CSS、JS 样式
把一个数组(一维或二维等)的内容转化为对应的字符串.相当于把print_r($array)显示出来的内容赋值给一个变量.$data= array('hello',',','world','!'); $ ...
CSAcademy Beta Round #5 Force Graph
题目链接:https://csacademy.com/contest/arhiva/#task/force_graph/ 大意是有若干个节点,每个节点对应一个二维坐标,节点之间相互有斥力存在.同时有些 ...
【算法】RMQ LCA 讲课杂记
4月4日,应学弟要求去了次学校给小同学们讲了一堂课,其实讲的挺内容挺杂的,但是目的是引出LCA算法. 现在整理一下当天讲课的主要内容: 开始并没有直接引出LCA问题,而是讲了RMQ(Range Min ...
关于用jQuery的animate方法实现的动画在IE中失效的原因以及解决方法
这几天在学jQuery,本身还只是一个新手,写了一个简单的动画--圆形头像的缩放.本身是用Firefox进行调试的,一切进行的很顺利,缩放可以按照预期执行,结果拿到IE上去之后,发现缩放动画失效了.后 ...
Android 新一代多渠道打包神器
欢迎大家关注腾讯云技术社区-博客园官方主页,我们将持续在博客园为大家推荐技术精品文章哦~ 作者:李涛 ApkChannelPackage是一种快速多渠道打包工具,同时支持基于V1签名和V2签名进行多渠 ...
HTTP长连接、短连接使用及测试
概念 HTTP短连接(非持久连接)是指,客户端和服务端进行一次HTTP请求/响应之后,就关闭连接.所以,下一次的HTTP请求/响应操作就需要重新建立连接. HTTP长连接(持久连接)是指,客户端和服务 ...

对pathtracing的一些个人理解

对pathtracing的一些个人理解的更多相关文章

随机推荐

热门专题