https://github.com/prem30488/C2CUDATranslator

http://www.training.prace-ri.eu/uploads/tx_pracetmo/GPSMEToolkitIntro.pdf

gp-sme.co.uk

https://www.openacc.org/get-started

http://www.openmp.org/             好像只是多核编程, 不像上面几个,是c代码转gpu c 代码。

There are many high-level libraries dedicated to GPGPU programming. Since they rely on CUDA and/or OpenCL, they have to be chosen wisely (a CUDA-based program will not run on AMD's GPUs, unless it goes through a pre-processing step with projects such as gpuocelot).

CUDA

You can find some examples of CUDA libraries on the NVIDIA website.

  • Thrust: the official description speaks for itself

Thrust is a parallel algorithms library which resembles the C++ Standard Template Library (STL). Thrust's high-level interface greatly enhances programmer productivity while enabling performance portability between GPUs and multicore CPUs. Interoperability with established technologies (such as CUDA, TBB, and OpenMP) facilitates integration with existing software.

As @Ashwin pointed out, the STL-like syntax of Thrust makes it a widely chosen library when developing CUDA programs. A quick look at the examples shows the kind of code you will be writing if you decide to use this library. NVIDIA's website presents the key features of this library. A video presentation (from GTC 2012) is also available.

  • CUB: the official description tells us:

CUB provides state-of-the-art, reusable software components for every layer of the CUDA programming mode. It is a flexible library of cooperative threadblock primitives and other utilities for CUDA kernel programming.

It provides device-wide, block-wide and warp-wide parallel primitives such as parallel sort, prefix scan, reduction, histogram etc.

It is open-source and available on GitHub. It is not high-level from an implementation point of view (you develop in CUDA kernels), but provides high-level algorithms and routines.

  • mshadow: lightweight CPU/GPU matrix/tensor template library in C++/CUDA.

This library is mostly used for machine learning, and relies on expression templates.

Starting from Eigen 3.3, it is now possible to use Eigen's objects and algorithms within CUDA kernels. However, only a subset of features are supported to make sure that no dynamic allocation is triggered within a CUDA kernel.

OpenCL

Note that OpenCL does more than GPGPU computing, since it supports heterogeneous platforms (multi-core CPUs, GPUs etc.).

  • OpenACC: this project provides OpenMP-like support for GPGPU. A large part of the programming is done implicitly by the compiler and the run-time API. You can find a sample code on their website.

The OpenACC Application Program Interface describes a collection of compiler directives to specify loops and regions of code in standard C, C++ and Fortran to be offloaded from a host CPU to an attached accelerator, providing portability across operating systems, host CPUs and accelerators.

  • Bolt: open-source library with STL-like interface.

Bolt is a C++ template library optimized for heterogeneous computing. Bolt is designed to provide high-performance library implementations for common algorithms such as scan, reduce, transform, and sort. The Bolt interface was modeled on the C++ Standard Template Library (STL). Developers familiar with the STL will recognize many of the Bolt APIs and customization techniques.

  • Boost.Compute: as @Kyle Lutz said, Boost.Compute provides a STL-like interface for OpenCL. Note that this is not an official Boost library (yet).

  • SkelCL "is a library providing high-level abstractions for alleviated programming of modern parallel heterogeneous systems". This library relies on skeleton programming, and you can find more information in their research papers.

CUDA + OpenCL

  • ArrayFire is an open-source (used to be proprietary) GPGPU programming library. They first targeted CUDA, but now support OpenCL as well. You can check the examples available online. NVIDIA's website provides a good summary of its key features.

Complementary information

Although this is not really in the scope of this question, there is also the same kind of support for other programming languages:

If you need to do linear algebra (for instance) or other specific operations, dedicated math libraries are also available for CUDA and OpenCL (e.g. ViennaCL, CUBLAS, MAGMA etc.).

Also note that using these libraries does not prevent you from doing some low-level operations if you need to do some very specific computation.

Finally, we can mention the future of the C++ standard library. There has been extensive work to add parallelism support. This is still a technical specification, and GPUs are not explicitely mentioned AFAIK (although NVIDIA's Jared Hoberock, developer of Thrust, is directly involved), but the will to make this a reality is definitely there.

High level GPU programming in C++的更多相关文章

  1. 把书《CUDA By Example an Introduction to General Purpose GPU Programming》读薄

    鉴于自己的毕设需要使用GPU CUDA这项技术,想找一本入门的教材,选择了Jason Sanders等所著的书<CUDA By Example an Introduction to Genera ...

  2. R – GPU Programming for All with ‘gpuR’

    INTRODUCTION GPUs (Graphic Processing Units) have become much more popular in recent years for compu ...

  3. Udacity并行计算课程笔记-The GPU Programming Model

    一.传统的提高计算速度的方法 faster clocks (设置更快的时钟) more work over per clock cycle(每个时钟周期做更多的工作) more processors( ...

  4. 【Udacity并行计算课程笔记】- lesson 1 The GPU Programming Model

    一.传统的提高计算速度的方法 faster clocks (设置更快的时钟) more work over per clock cycle(每个时钟周期做更多的工作) more processors( ...

  5. [CUDA] 00 - GPU Driver Installation & Concurrency Programming

    前言 对,这是一个高大上的技术,终于要做老崔当年做过的事情了,生活很传奇. 一.主流 GPU 编程接口 1. CUDA 是英伟达公司推出的,专门针对 N 卡进行 GPU 编程的接口.文档资料很齐全,几 ...

  6. D3D9 GPU Hacks (转载)

    D3D9 GPU Hacks I’ve been trying to catch up what hacks GPU vendors have exposed in Direct3D9, and tu ...

  7. 深入剖析GPU Early Z优化

    最近在公司群里同事发了一个UE4关于Mask材质的优化,比如在场景中有大面积的草和树的时候,可以在很大程度上提高效率.这其中的原理就是利用了GPU的特性Early Z,但是它的做法跟我最开始的理解有些 ...

  8. PatentTips - Heterogeneous Parallel Primitives Programming Model

    BACKGROUND 1. Field of the Invention The present invention relates generally to a programming model ...

  9. GPU深度发掘(一)::GPGPU数学基础教程

    作者:Dominik Göddeke                 译者:华文广 Contents 介绍 准备条件 硬件设备要求 软件设备要求 两者选择 初始化OpenGL GLUT OpenGL ...

随机推荐

  1. 循环viewpager

    如果viewpager listadapter小于三个.用这个移除异常. for (View view : viewList) {             ViewGroup p = (ViewGro ...

  2. 解决FLASH遮住层的问题 IE,Firefox都适用!

    <object classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://down ...

  3. 精确度量Linux下进程占用多少内存的方法

    背景 在Linux中,要了解进程的信息,莫过于从 proc 文件系统中入手去看. proc的详细介绍,可以参考内核文档的解读,里面有很多内容 yum install -y kernel-doc cat ...

  4. Java 搜索引擎

    1.Java 全文搜索引擎框架 Lucene 毫无疑问,Lucene是目前最受欢迎的Java全文搜索框架,准确地说,它是一个全文检索引擎的架构,提供了完整的查询引擎和索引引擎,部分文本分析引擎.Luc ...

  5. Scala进阶之路-Scala高级语法之隐式(implicit)详解

    Scala进阶之路-Scala高级语法之隐式(implicit)详解 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 我们调用别人的框架,发现少了一些方法,需要添加,但是让别人为你一 ...

  6. cdqz2017-test11-占卜的准备

    #include<cstdio> #include<iostream> #include<algorithm> using namespace std; #defi ...

  7. Docker 入门 第六部分:部署app

    目录 Docker 入门 第六部分:部署app 先决条件 介绍 选择一个选项 Docker CE(Cloud provider) Enterprise(Cloud provider)这里不做介绍 En ...

  8. C# 实现Bresenham算法(vs2010)

    using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; usin ...

  9. ECSHOP /mobile/admin/edit_languages.php

    漏洞名称:ecshop代码注入漏洞 补丁编号:10017531 补丁文件:/mobile/admin/edit_languages.php 补丁来源:云盾自研 更新时间:2017-01-05 08:4 ...

  10. Java入门系列(十二)Java反射

    Why--指的是为什么做这件事,也既事物的本质. 反射之中包含了一个“反”的概念,所以要想解释反射就必须先从“正”开始解释,一般而言,当用户使用一个类的时候,应该先知道这个类,而后通过这个类产生实例化 ...