对pathtracing的一些个人理解
本人水平有限,若有错误也请指正~
上面说到pathtracing(pt)的一些优点和缺点,优点即其实现很简单,这就是大概为什么当今市面上流行的很多渲染器如今都相继采用pathtracing算法为核心进行实现,但是pathtracing的最大缺点就是收敛速度很慢,其原因就在于全局光照的那个积分式要求在半球面上连续采样,这就需要每次发射n条采样光线,n的数量直接决定了最终图像的质量(对于pt来讲它的图像质量取决于噪点数量的多少,噪点多的话有点像用老式胶片拍出来的照片那样),一般n越大图像过渡越光滑。前人学者们也早已发现了pt的这个收敛慢的缺点,于是有了多种方法来降噪,具体来讲这些降噪手段有以下几种:
1)改进pt算法。如后续比较经典的bidirectional pathtracing(bdpt),metropolis pathtracing
2)保持pt算法不变,但对其内部的采样机制进行改进。比如:
a. 根据BRDF函数进行重要性采样;
b. 采样时所用到的均匀分布,一般都是用c语言自带的rand函数或者c++ random库里面的uniform distribution来完成。但是这种采样方式对于pt算法本身来讲,并不利于最终图像的收敛,
现在的很多渲染器内部都是利用一种叫“低差异序列(Low Discrepancy Sequence)”的采样方式(详见https://zhuanlan.zhihu.com/p/20197323?columnSlug=graphics)。
c. pt算法的一大优势就是屏幕每个像素的采样都是独立的,每次迭代也是互相独立的,这就可以利用并行算法来对算法进行加速。CPU端有OpenMP,而且在诸多流行编译器内都直接内置集成了,
GPU端,N卡有CUDA,A卡可以利用OpenCL,而且渲染速度与GPU数量成线性增长关系,这就有点像暴力解算的意思,GPU端实现的最大区别就是不能递归(虽然N卡早已支持递归,但是
在计算规模很大的情况下每个线程的栈就很浅,再加上本身pt算法的采样光线有可能采样到很深的深度,所以并不是很敢用),递归主要发生在光线与场景求交的过程中,由于对场景采用的
是基于层级的划分(Bounding Volume Hierarchy)本身就有递归的意思。不能递归这点只要自己构造合理数据结构,把递归所用的那些信息尽量在构造BVH的时候自然写入结构内部即可,
可以比较容易的将递归降级为循环。
(ps:也难怪现在很多步进式渲染器都采用pathtracing...实现简单,开发简单,而且要提速的话直接借助并行算法,只不过多配几个GPU就行了,这就节省了一堆人力,成本就低...)
而最近浏览网页也发现了,很多人和企业也倾向于利用CPU渲染,具体请参见http://stackoverflow.com/questions/38029698/why-do-we-use-cpus-for-ray-tracing-instead-of-gpus
里面有个人的回答也是不错:
I'm one of the rendering software architects at a large VFX and animated feature studio with a proprietary renderer (not Pixar, though I was once the rendering software architect there as well, long, long ago).
Almost all high-quality rendering for film (at all the big studios, with all the major renderers) is CPU only. There are a bunch of reasons why this is the case. In no particular order, some of the really compelling ones to give you the flavor of the issues:
GPUs only go fast when everything is in memory. The biggest GPU cards have, what, 12GB or so, and it has to hold everything. Well, we routinely render scenes with 30GB of geometry and that reference 1TB or more of texture. Can't load that into GPU memory, it's literally two orders of magnitude too big. So GPUs are simply unable to deal with our biggest (or even average) scenes. (With CPU renderers, we can page stuff from disk whenever we need. GPUs aren't good at that.)
Don't believe the hype, ray tracing with GPUs is not an obvious win over CPU. GPUs are great at highly coherent work (doing the same things to lots of data at once). Ray tracing is very incoherent (each ray can go a different direction, intersect different objects, shade different materials, access different textures), and so this access pattern degrades GPU performance very severely. It's only very recently that GPU ray tracing could match the best CPU-based ray tracing code, and even though it has surpassed it, it's not by much, not enough to throw out all the old code and start fresh with buggy fragile code for GPUs. And the biggest, most expensive scenes are the ones where GPUs are only marginally faster. Being lots faster on the easy scenes is not really important to us.
If you have 50 or 100 man years of production-hardened code in your CPU-based renderer, you just don't throw it out and start over in order to get a 2x speedup. Software engineering effort, stability, and so on, is more important and a bigger cost factor.
Similarly, if your studio has an investment in a data center holding 20,000 CPU cores, all in the smallest, most power and heat-efficient form factor you can, that's also a sunk cost investment you don't just throw away. Replacing them with new machines containing top of the line GPUs vastly increases the cost of your render farm, and they are bigger and produce more heat, so it literally might not fit in your building.
Amdahl's Law: The actual "rendering" per se is only one stage in generating the scenes, and GPUs don't help with it. Let's say that it takes 1 hour to fully generate and export the scene to the renderer, and 9 hours to "render", and out of that 9 hours, an hour is reading texture, volumes, and other data from disk. So out of the total 10 hours of how the user experiences rendering (push button until final image is ready), 8 hours is potentially sped up with GPUs. So, even if GPU was 10x as fast as CPU for that part, you go from 10 hours to 1+1+0.8 = nearly 3 hours. So 10x GPU speedup only translates to 3x actual gain. If GPU was 1,000,000x faster than CPU for ray tracing, you still have 1+1+tiny, which is only a 5x speedup.
But what's different about games? Why are GPUs good for games but not film?
First of all, when you make a game, remember that it's got to render in real time -- that means your most important constraint is the 60Hz (or whatever) frame rate, and you sacrifice quality or features where necessary to achieve that. In contrast, with film, the unbreakable constraint is making the director and VFX supervisor happy with the quality and look he or she wants, and how long it takes you to get that is (to a degree) secondary.
Also, with a game, you render frame after frame after frame, live in front of every user. But with film, you effectively are rendering ONCE, and what's delivered to theaters is a movie file -- so moviegoers will never know or care if it took you 10 hours per frame, but they will notice if it doesn't look good. So again, there is less of a penalty placed on those renders taking a long time, as long as they look fabulous.
With a game, you don't really know what frames you are going to render, since the player may wander all around the world, view from just about anywhere. You can't and shouldn't try to make it all perfect, you just want it to be good enough all the time. But for a film, the shots are all hand-crafted! A tremendous amount of human time goes into composing, animating, lighting, and compositing every shot, and then you only need to render it once. Think about the economics -- once 10 days of calendar (and salary) has gone into lighting and compositing the shot just right, the advantage of rendering it in an hour (or even a minute) versus overnight, is pretty small, and not worth any sacrifice of quality or achievable complexity of the image.
对pathtracing的一些个人理解的更多相关文章
- 理解CSS视觉格式化
前面的话 CSS视觉格式化这个词可能比较陌生,但说起盒模型可能就恍然大悟了.实际上,盒模型只是CSS视觉格式化的一部分.视觉格式化分为块级和行内两种处理方式.理解视觉格式化,可以确定得到的效果是应 ...
- 彻底理解AC多模式匹配算法
(本文尤其适合遍览网上的讲解而仍百思不得姐的同学) 一.原理 AC自动机首先将模式组记录为Trie字典树的形式,以节点表示不同状态,边上标以字母表中的字符,表示状态的转移.根节点状态记为0状态,表示起 ...
- 理解加密算法(三)——创建CA机构,签发证书并开始TLS通信
接理解加密算法(一)--加密算法分类.理解加密算法(二)--TLS/SSL 1 不安全的TCP通信 普通的TCP通信数据是明文传输的,所以存在数据泄露和被篡改的风险,我们可以写一段测试代码试验一下. ...
- node.js学习(三)简单的node程序&&模块简单使用&&commonJS规范&&深入理解模块原理
一.一个简单的node程序 1.新建一个txt文件 2.修改后缀 修改之后会弹出这个,点击"是" 3.运行test.js 源文件 使用node.js运行之后的. 如果该路径下没有该 ...
- 如何一步一步用DDD设计一个电商网站(一)—— 先理解核心概念
一.前言 DDD(领域驱动设计)的一些介绍网上资料很多,这里就不继续描述了.自己使用领域驱动设计摸滚打爬也有2年多的时间,出于对知识的总结和分享,也是对自我理解的一个公开检验,介于博客园这个平 ...
- 学习AOP之透过Spring的Ioc理解Advisor
花了几天时间来学习Spring,突然明白一个问题,就是看书不能让人理解Spring,一方面要结合使用场景,另一方面要阅读源代码,这种方式理解起来事半功倍.那看书有什么用呢?主要还是扩展视野,毕竟书是别 ...
- ThreadLocal简单理解
在java开源项目的代码中看到一个类里ThreadLocal的属性: private static ThreadLocal<Boolean> clientMode = new Thread ...
- JS核心系列:理解 new 的运行机制
和其他高级语言一样 javascript 中也有 new 运算符,我们知道 new 运算符是用来实例化一个类,从而在内存中分配一个实例对象. 但在 javascript 中,万物皆对象,为什么还要通过 ...
- 深入理解JS 执行细节
javascript从定义到执行,JS引擎在实现层做了很多初始化工作,因此在学习JS引擎工作机制之前,我们需要引入几个相关的概念:执行环境栈.全局对象.执行环境.变量对象.活动对象.作用域和作用域链等 ...
随机推荐
- 关于Git增、删、改源地址问题
在上篇博客中我们了解了Git的基本使用,如果你已经建立了一个远程代码库,并且遇到了远程代码库源地址修改的问题,那么这篇博客可能会帮到你. 1.如何查看当前远程Git库源地址呢? $git remote ...
- Android 代码库(自定义一套 Dialog通用提示框 )
做Android开发五年了,期间做做停停(去做后台开发,服务器管理),当回来做Android的时候,发现很生疏,好些控件以前写得很顺手,现在好像忘记些什么了,总要打开这个项目,打开那个项目 ...
- Java--定时器问题
定时器问题 定时器属于基本的基础组件,不管是用户空间的程序开发,还是内核空间的程序开发,很多时候都需要有定时器作为基础组件的支持.一个定时器的实现需要具备以下四种基本行为:添加定时器.取消定时器.定时 ...
- JS的块级作用域
今天带来的是 "对<你不知道的js>中块级作用域的总结" 分享: 1)用with从对象中创建出来的作用域只在with声明中而非外部作用域有效,同时可以访问已有对象的属性 ...
- java面试题—精选30道Java笔试题解答(二)
摘要: java面试题-精选30道Java笔试题解答(二) 19. 下面程序能正常运行吗() public class NULL { public static void haha(){ System ...
- .NET遇上Docker - Docker集成Cron定时运行.NETCore(ConsoleApp)程序.md
配置项目的Docker支持 对于VS中Docker的配置,依旧重复一些废话. 给项目添加Docker支持,VS2015可以直接使用Docker for VS插件,VS2017在安装时选择容器支持.VS ...
- codeforces 803C Maximal GCD(GCD数学)
Maximal GCD 题目链接:http://codeforces.com/contest/803/problem/C 题目大意: 给你n,k(1<=n,k<=1e10). 要你输出k个 ...
- 利刃 MVVMLight 8:DispatchHelper在多线程和调度中的使用
在应用程序中,线程可以被看做是应用程序的一个较小的执行单位.每个应用程序都至少拥有一个线程,我们称为主线程,这是在启动时调用应用程序的主方法时由操作系统分配启动的线程. 当调用和操作主线程的时候,该操 ...
- OAuth 2.0: Bearer Token Usage
Bearer Token (RFC 6750) 用于HTTP请求授权访问OAuth 2.0资源,任何Bearer持有者都可以无差别地用它来访问相关的资源,而无需证明持有加密key.一个Bearer代表 ...
- python基础——正则表达式
正则表达式 正则表达式为高级的文本模式匹配.抽取.与/或文本形式的搜索和替换功能提供了基础.简单的说,正则表达式是一些由字符和特殊符号组成的字符串,他们描述了模式的重复或者表述多个字符,于是正则表达式 ...