Nvidia's Pascal to use stacked memory, proprietary NVLink interconnect

by Scott Wasson — 6:50 PM on March 25, 2014

GTC — Today during his opening keynote at the Nvidia GPU Technology Conference, CEO Jen-Hsun Huang offered an update to Nvidia's GPU roadmap. The big reveal was about a GPU code-named Pascal, which will be a generation beyond the still-being-introduced Maxwell architecture in the firm's plans.

Pascal's primary innovation will be the integration of stacked "3D" memory situated on the same substrate with the GPU, providing substantially higher bandwidth than traditional DRAMs mounted on the same circuit board.

If all of this info sounds more than a little familiar, perhaps you'll recall that Nvidia also announced a future, post-Maxwell GPU at GTC 2013. It was code-named Volta and was also slated to feature stacked memory on package. So what happened?

Turns out Volta remains on the roadmap, but it comes after Pascal and will evidently include more extensive changes to Nvidia's core GPU architecture.

Nvidia has inserted Pascal into its plans in order to take advantage of stacked memory and other innovations sooner. (I'm not sure we can say that Volta has been delayed, since the firm never pinned down that GPU's projected release date.) That makes Pascal intriguing even though its SM will be based on a modified version of the one from Maxwell. Memory bandwidth has long been one of the primary constraints for GPU performance, and bringing DRAM onto the same substrate opens up the possibility of substantial performance gains.

The picture above includes a single benchmark result, as projected for Pascal, in the bandwidth-intensive SGEMM matrix multiplication test. As you can see, Pascal nearly triples the performance of today's Kepler GPUs and nearly doubles the throughput of the upcoming Maxwell chips. This comparison is made at the same power level for each GPU, so Pascal should also represent a nice increase in energy efficiency.

Compared to today's GPU memory subsystems, Huang claimed Pascal's 3D memory will offer "many times" the bandwidth, two and a half times the capacity, and four times the energy efficiency. The Pascal chip itself will not participate in the 3D stacking, but it will have DRAM stacks situated around it on the same package. Those DRAM stacks will be of the HBM type being developed at Hynix. You can see the DRAM stacks cuddled up next to the GPU in the picture of the Pascal test module below.

The other item of note in Pascal's feature set is a new, proprietary chip-to-chip interconnect known as NVLink. This interconnect is a higher-bandwidth alternative to PCI Express 3.0 that Nvidia claims will be substantially more power-efficient. In many ways, NVLink looks very similar to PCI Express. It uses differential signaling with an embedded clock, and it will support the PCI Express programming model, including "DMA+", so driver support should be straightforward. Nvidia expects NVLink to act as a GPU-to-GPU connection and, in some cases, as a GPU-to-CPU link. To that end, the second generation of NVLink will be capable of maintaining cache coherency between multiple chips.

NVLink was created chiefly for use in supercomputing clusters and other enterprise-class deployments where many GPUs may be installed into a single server. Interestingly, as part of today's announcements, IBM revealed that it will incorporate NVLink into future CPUs. We don't have any details yet about which CPUs or what proportion of the Power CPU lineup will use NVLink, though.

Huang claimed NVLink will offer five to 12 times the bandwidth of PCIe. That may be a bit of CEO math. The first generation of NVLink will feature eight lanes per block or "brick" of connectivity. Each of those lanes will be capable of transporting 20Gbps of data, so the aggregate bandwidth of a brick should be 20GB/s. By contrast, PCIe 3.0 transfers 8Gbps per lane and 8GB/s across eight lanes, and the still-in-the-works PCIe 4.0 standard is targeting double that rate.

NVLink apparently gets some of its added bandwidth by imposing stricter limits on trace lengths across the motherboard, and the company says it has made a "fundamental breakthrough" in energy efficiency, resulting from Nvidia's own research, that differentiates NVLink from PCIe. NVLink will not be an open standard, though, so we may not be seeing a public airing of the entire spec.

The module pictured above will be the basic building block of many solutions based on the Pascal GPU. Each module has two "bricks" of NVLink connectivity onboard, and the board will connect to the host system via a mezzanine-style NVLink connector. The combination of connector and NVLink protocol should allow for some nice, dense, and high-integrity server systems built around Nvidia GPUs—and it will also ensure that those systems can only play host to Nvidia silicon. This proprietary hook is surely another motivation for the creation of NVLink, at the end of the day.

Huang said he wants the Pascal module to be the future of not just supercomputers but all sorts of visual computing systems, including gaming PCs. Mezzanine-style modules do have size and signal integrity advantages over traditional expansion cards with edge-based connectors. Another benefit of this module is additional power without auxiliary power cables. Nvidia's current Tesla GPUs draw between 225 and 300W, and the firm apparently expects to power them solely via the mezzanine connection to the module. We'll have to work to tease out exactly what Huang's statement means for future consumer PCs, but Nvidia admits it doesn't expect PCIe cards to be going away any time soon.

NVlink的更多相关文章

  1. [转帖]nvidia nvlink互联与nvswitch介绍

    nvidia nvlink互联与nvswitch介绍 https://www.chiphell.com/thread-1851449-1-1.html 差不多在一个月前在年度gtc会议上,老黄公开了d ...

  2. 深度学习“引擎”之争:GPU加速还是专属神经网络芯片?

    深度学习“引擎”之争:GPU加速还是专属神经网络芯片? 深度学习(Deep Learning)在这两年风靡全球,大数据和高性能计算平台的推动作用功不可没,可谓深度学习的“燃料”和“引擎”,GPU则是引 ...

  3. R – GPU Programming for All with ‘gpuR’

    INTRODUCTION GPUs (Graphic Processing Units) have become much more popular in recent years for compu ...

  4. 让AI简单且强大:深度学习引擎OneFlow技术实践

    本文内容节选自由msup主办的第七届TOP100summit,北京一流科技有限公司首席科学家袁进辉(老师木)分享的<让AI简单且强大:深度学习引擎OneFlow背后的技术实践>实录. 北京 ...

  5. 计算系统中互联设备Survey

    Survey of Inter-connects in computer system 姚伟峰 http://www.cnblogs.com/Matrix_Yao/ https://github.co ...

  6. CUDA compiler driver nvcc 散点 part 1

    ▶ 参考[https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html] ▶ nvcc 预定义的宏 __NVCC__ // 编译 ...

  7. 转帖 IBM要推POWER9,来了解一下POWER处理器的前世今生

    https://blog.csdn.net/kwame211/article/details/76669555 先来说一下最新的POWER 9 在Hot Chips会议上首次提到的IBM Power ...

  8. CUDA ---- Memory Model

    Memory kernel性能高低是不能单纯的从warp的执行上来解释的.比如之前博文涉及到的,将block的维度设置为warp大小的一半会导致load efficiency降低,这个问题无法用war ...

  9. OpenACC 与 CUDA 的相互调用

    ▶ 按照书上的代码完成了 OpenACC 与CUDA 的相互调用,以及 OpenACC 调用 cuBLAS.便于过程遇到了很多问题,注入 CUDA 版本,代码版本,计算能力指定等,先放在这里,以后填坑 ...

随机推荐

  1. nginx: [warn] conflicting server name "locahost" on 0.0.0.0:80, ignored

    里面域名重复: 在vhosts下多个虚拟机配置文件,都是基于域名配置的,其中两个配置文件,都起了localhost ,所以会报错!!!! 多个域名可以指向同一个目录,但同一个域名不可一指向多个目录!! ...

  2. Redis笔记(二)Redis的部署和启动

    Linux下Redis的部署和启动 下载安装介质 Redis官网地址:http://www.redis.io/目前最新版本是redis-3.0.3. 可以访问 http://download.redi ...

  3. Redis经验谈

    新浪作为全世界最大的Redis用户,在开发和运维方面有非常多的经验.本文作者来自新浪,希望能为业界提供一些亲身经历,让大家少走弯路. 使用初衷 从2010年上半年起,我们就开始尝试使用Redis,主要 ...

  4. ytu 1789:n皇后问题(水题,枚举)

    n皇后问题 Time Limit: 1 Sec  Memory Limit: 64 MB  Special JudgeSubmit: 12  Solved: 3[Submit][Status][Web ...

  5. Oracle数据库表设计时的注意事项

    表是Oracle数据库中最基本的对象之一.万丈高楼从平地起,这个基础对象对于数据库来说,非常重要.因为其设计是否合理,直接跟数据库的性能相关.从Oracle数据库菜鸟到数据库专家这个过程中,在表设计与 ...

  6. node.js简单的页面输出

    在node.js基本上没有兼容问题(如果你不是从早期的node.js玩起来),而且原生对象又加了这么多扩展,再加上node.js自带的库,每个模块都提供了花样繁多的API,如果还嫌不够,github上 ...

  7. win7下loadrunner创建mysql数据库参数化问题解决

    问题现象: 安装mysql数据源驱动后,lr创建mysql驱动程序列表没有安装的驱动程序: 安装完mysql ODBC数据源后 2.在控制面板-数据源(ODBC) 3.创建mysql数据源: 4.从l ...

  8. 排序+逆向思维 ACdream 1205 Disappeared Block

    题目传送门 /* 从大到小排序,逆向思维,从最后开始考虑,无后向性 每找到一个没被淹没的,对它左右的楼层查询是否它是孤立的,若是++,若不是-- 复杂度 O(n + m),还以为 O(n^2)吓得写了 ...

  9. LogHelper拾遗

    1.被简化之前 对已LogHelper,形如: public static void WriteError(string className,string methodName,string mess ...

  10. 如何在 .Net Framework 4.0 项目上使用 OData?

    最新的 Microsoft ASP.NET Web API 2.1 OData 5.1.0 已只能在 .Net Framework 4.5 的安装了,如果要在 VS2010的 .Net Framewo ...