CUDA C Programming Guide 在线教程学习笔记 Part 13

▶ 纹理内存访问补充（见纹理内存博客 http://www.cnblogs.com/cuancuancuanhao/p/7809713.html）

▶ 计算能力

● 不同计算能力的硬件对计算特性的支持。

● 不同计算能力的硬件技术特性（重要）。

● 浮点运算技术标准描述（原文）

■ All compute devices follow the IEEE 754-2008 standard for binary floating-point arithmetic with the following deviations:

　　There is no dynamically configurable rounding mode; however, most of the operations support multiple IEEE rounding modes, exposed via device intrinsics;

　　There is no mechanism for detecting that a floating-point exception has occurred and all operations behave as if the IEEE-754 exceptions are always masked, and deliver the masked response as defined by IEEE-754 if there is an exceptional event; for the same reason, while SNaN encodings are supported, they are not signaling and are handled as quiet;

　　The result of a single-precision floating-point operation involving one or more input NaNs is the quiet NaN of bit pattern 0x7fffffff;

　　Double-precision floating-point absolute value and negation are not compliant with IEEE-754 with respect to NaNs; these are passed through unchanged;

Code must be compiled with -ftz=false, -prec-div=true, and -prec-sqrt=true to ensure IEEE compliance (this is the default setting; see the nvcc user manual for description of these compilation flags).

Regardless of the setting of the compiler flag -ftz,

Atomic single-precision floating-point adds on global memory always operate in flush-to-zero mode, i.e., behave equivalent to FADD.F32.FTZ.RN,
Atomic single-precision floating-point adds on shared memory always operate with denormal support, i.e., behave equivalent to FADD.F32.RN.

■ IEEE-754R 标准，CUDA 设备上函数 fminf()，fmin()，fmaxf()，fmax() 在有且仅有一个参数为NaN时，返回值将是另一个参数。

■ 浮点数转化为整数，且超出整形格式边界时，IEEE-754 称为未定义，而 CUDA 设备将该数映射到整形格式边界（改格式能表示的最大或最小的数）。

■ 整数除以零或整数溢出时，IEEE-754 称为未定义，而 CUDA 设备将抛出一个非特定的值。

● 各代 CC 设备的新特性介绍

■ 调整共享内存和 L1 缓存的平衡

 // driver_types.h

 enum __device_builtin__ cudaFuncCache

 {

     cudaFuncCachePreferNone = ,    // 默认缓存模式

     cudaFuncCachePreferShared = ,  // 扩大共享内存缩小 L1 缓存

     cudaFuncCachePreferL1 = ,      // 扩大 L1 缓存缩小共享内存

     cudaFuncCachePreferEqual =     // 等量大小的共享内存和 L1 缓存

 };

 // cuda_runtime_api.h

 extern __host__ cudaError_t CUDARTAPI cudaDeviceSetCacheConfig(enum cudaFuncCache cacheConfig);

CUDA C Programming Guide 在线教程学习笔记 Part 13的更多相关文章

CUDA C Programming Guide 在线教程学习笔记 Part 5
附录 A,CUDA计算设备附录 B,C语言扩展 ▶ 函数的标识符 ● __device__,__global__ 和 __host__ ● 宏 __CUDA_ARCH__ 可用于区分代码的运行位置. ...
CUDA C Programming Guide 在线教程学习笔记 Part 4
▶ 图形互操作性,OpenGL 与 Direct3D 相关.(没学过,等待填坑) ▶ 版本号与计算能力 ● 计算能力(Compute Capability)表征了硬件规格,CUDA版本号表征了驱动接口 ...
CUDA C Programming Guide 在线教程学习笔记 Part 2
▶ 纹理内存使用 ● 纹理内存使用有两套 API,称为 Object API 和 Reference API .纹理对象(texture object)在运行时被 Object API 创建,同时指定 ...
CUDA C Programming Guide 在线教程学习笔记 Part 10【坑】
▶ 动态并行. ● 动态并行直接从 GPU 上创建工作,可以减少主机和设备间数据传输,在设备线程中调整配置.有数据依赖的并行工作可以在内核运行时生成,并利用 GPU 的硬件调度和负载均衡.动态并行要求 ...
CUDA C Programming Guide 在线教程学习笔记 Part 9
▶ 协作组,要求 cuda ≥ 9.0,一个简单的例子见 http://www.cnblogs.com/cuancuancuanhao/p/7881093.html ● 灵活调节需要进行通讯的线程组合 ...
CUDA C Programming Guide 在线教程学习笔记 Part 8
▶ 线程束表决函数(Warp Vote Functions) ● 用于同一线程束内各线程通信和计算规约指标. // device_functions.h,cc < 9.0 __DEVICE_FU ...
CUDA C Programming Guide 在线教程学习笔记 Part 7
▶ 可缓存只读操作(Read-Only Data Cache Load Function),定义在 sm_32_intrinsics.hpp 中.从地址 adress 读取类型为 T 的函数返回,T ...
CUDA C Programming Guide 在线教程学习笔记 Part 3
▶ 表面内存使用 ● 创建 cuda 数组时使用标志 cudaArraySurfaceLoadStore 来创建表面内存,可以用表面对象(surface object)或表面引用(surface re ...
CUDA C Programming Guide 在线教程学习笔记 Part 1
1. 简介 2. 编程模型 ▶ SM version 指的是硬件构架和特性,CUDA version 指的是软件平台版本. 3. 编程接口.参考 http://chenrudan.github.io/ ...

随机推荐

出于迁移项目的考虑，GitHub 中 Fork 出来的项目，如何与原项目断开 Fork 关系？
如果需要为 GitHub 上的项目做贡献,我们通常会 Fork 到自己的名称空间下.在推送代码之后添加 pull request 时,GitHub 会自动为我们跨仓库建立 pull request 的 ...
原型设计 Axure8.1 软件注册码
用户名:Koshy 注册码: wTADPqxn3KChzJxLmUr5jTTitCgsfRkftQQ1yIG9HmK83MYSm7GPxLREGn+Ii6xY
dfs 与剪枝
http://blog.csdn.net/u010700335/article/details/44095171
vi文字处理器
http://blog.csdn.net/wangloveall/article/details/22649331 摘要:vi是类UNIX命令行接口的标准文字处理软件,也是进行shell脚本程序编写与 ...
创意：Soap一款新型的触摸式家用智能路由器
版权声明:本文为博主原创文章.未经博主同意不得转载. https://blog.csdn.net/iefreer/article/details/34808749 Soap简单介绍这里的Soap不是 ...
PHP中开启gzip压缩的2种方法
网页开启gzip压缩以后,其体积可以减小20%~90%,可以节省下大量的带宽,从而减少页面响应时间,提高用户体验. php配置改法: 复制代码代码如下: zlib.output_compression ...
sklearn, Numpy以及Pandas
pandas里面的对于数据操作比如where,drop以及dropna等都会有一个属性:inplace,这个单词意思是原地,如果inplace=true代表数据本身要执行该操作:如果inplace=f ...
转： jmeter分布式测试的坑
有关jmeter分布式测试的环境配置,大概就是那样,但是每次想要进行jmeter分布式测试的时候,总是会有各种奇怪的问题,下面整理了一些可能遇到的坑. 只要错误中出现:Error in rconfig ...
JQuery 实现 Tab 切换 index
$(function(){ var h_new=$('.new h4 a'); var new_dl=$('.new dl'); new_dl.first().show(); h_new.mousee ...
PHP 图片打水印
<?php /*----------------------------------------------------------------------------------- *函数名称 ...

CUDA C Programming Guide 在线教程学习笔记 Part 13

CUDA C Programming Guide 在线教程学习笔记 Part 13的更多相关文章

随机推荐

热门专题