使用blas做矩阵乘法

#define min(x,y) (((x) < (y)) ? (x) : (y))

#include <stdio.h>

#include <stdlib.h>

#include <cublas_v2.h>

#include <iostream>

#include <vector>

//extern "C"

//{

   #include <cblas.h>

//}

using namespace std;

int main()

{

    const enum CBLAS_ORDER Order=CblasRowMajor;

    const enum CBLAS_TRANSPOSE TransA=CblasNoTrans;

    const enum CBLAS_TRANSPOSE TransB=CblasNoTrans;

    const int M=4;//A的行数，C的行数

    const int N=2;//B的列数，C的列数

    const int K=3;//A的列数，B的行数

    const float alpha=1;

    const float beta=0;

    const int lda=K;//A的列

    const int ldb=N;//B的列

    const int ldc=N;//C的列

    const float A[M*K]={1,2,3,4,5,6,7,8,9,8,7,6};

    const float B[K*N]={5,4,3,2,1,0};

    float C[M*N];

    cblas_sgemm(Order, TransA, TransB, M, N, K, alpha, A, lda, B, ldb, beta, C, ldc);

    for(int i=0;i<M;i++)

    {

       for(int j=0;j<N;j++)

       {

           cout<<C[i*N+j]<<"\n";

       }

       cout<<endl;

    }

    return EXIT_SUCCESS;

}

g++ testblas.c++ -lopenblas -o testout

g++ testblas.c++ -lopenblas_piledriverp-r0.2.9 -o testout 本地编译openblas版本

注意library放在引用library的函数的后面

cblas_sgemm

Multiplies two matrices (single-precision).

void cblas_sgemm (

const enum CBLAS_ORDER Order,  // Specifies row-major (C) or column-major (Fortran) data ordering.

//typedef enum CBLAS_ORDER     {CblasRowMajor=101, CblasColMajor=102} CBLAS_ORDER;

const enum CBLAS_TRANSPOSE TransA,//Specifies whether to transpose matrix A.

const enum CBLAS_TRANSPOSE TransB,

const int M,   //Number of rows in matrices A and C.

const int N,//Number of rows in matrices A and C.

const int K,  //Number of columns in matrix A; number of rows in matrix B

const float alpha, //Scaling factor for the product of matrices A and B

const float *A,

const int lda, //The size of the first dimention of matrix A; if you are passing a matrix A[m][n], the value should be m.  stride

lda, ldb and ldc (the strides) are not relevant to my problem after all, but here's an explanation of them : 

The elements of a matrix (i.e a 2D array) are stored contiguously in memory. However, they may be stored in either column-major or row-major fashion. The stride represents the distance in memory between elements in adjacent rows (if row-major) or in adjacent columns (if column-major). This means that the stride is usually equal to the number of rows/columns in the matrix.

Matrix A =

[1 2 3]

[4 5 6]

Row-major stores values as {1,2,3,4,5,6}

Stride here is 3

Col-major stores values as {1, 4, 2, 5, 3, 6}

Stride here is 2

Matrix B =

[1 2 3]

[4 5 6]

[7 8 9]

Col-major storage is {1, 4, 7, 2, 5, 8, 3, 6, 9}

Stride here is 3

Read more: http://www.physicsforums.com 

const float *B,

const int ldb,  //The size of the first dimention of matrix B; if you are passing a matrix B[m][n], the value should be m.

const float beta,  //Scaling factor for matrix C.

float *C,

const int ldc    //The size of the first dimention of matrix C; if you are passing a matrix C[m][n], the value should be m.

);

Thus, it calculates either

C←αAB + βC

or

C←αBA + βC

with optional use of transposed forms of A, B, or both.

typedef enum CBLAS_ORDER     {CblasRowMajor=101, CblasColMajor=102} CBLAS_ORDER;

typedef enum CBLAS_TRANSPOSE {CblasNoTrans=111, CblasTrans=112, CblasConjTrans=113, CblasConjNoTrans=114} CBLAS_TRANSPOSE;

C=A∗BC=A∗B

CT=(A∗B)T=BT∗ATCT=(A∗B)T=BT∗AT 把A和B的顺序颠倒，可以直接得到转制矩阵乘法的结果，不用作其他变换，（结果C也是转制）。

Y←αAX + βY

cblas_sgemv

Multiplies a matrix by a vector (single precision).

void cblas_sgemv (

const enum CBLAS_ORDER Order,

const enum CBLAS_TRANSPOSE TransA,

const int M,

const int N,

const float alpha,

const float *A,

const int lda,

const float *X,

const int incX,

const float beta,

float *Y,

const int incY

);

STL版本

cblas_daxpy

Computes a constant times a vector plus a vector (double-precision).　　

On return, the contents of vector Y are replaced with the result. The value computed is (alpha * X[i]) +

Y[i].

#include <OpenBlas/cblas.h>

#include <OpenBlas/common.h>

#include <iostream>

#include <vector>

int main()

{

    blasint n = 10;

    blasint in_x =1;

    blasint in_y =1;

    std::vector<double> x(n);

    std::vector<double> y(n);

    double alpha = 10;

    std::fill(x.begin(),x.end(),1.0);

    std::fill(y.begin(),y.end(),2.0);

    cblas_daxpy( n, alpha, &x[0], in_x, &y[0], in_y);

    //Print y

    for(int j=0;j<n;j++)

        std::cout << y[j] << "\t";

    std::cout << std::endl;

}



cublas

cublasStatus_t

cublasCreate(cublasHandle_t *handle)

Return Value MeaningCUBLAS_STATUS_SUCCESS the initialization succeededCUBLAS_STATUS_NOT_INITIALIZED the CUDATM Runtime initialization failedCUBLAS_STATUS_ALLOC_FAILED the resources could not be allocated

cublasStatus_tcublasDestroy(cublasHandle_t handle)

Return Value MeaningCUBLAS_STATUS_SUCCESS the shut down succeededCUBLAS_STATUS_NOT_INITIALIZED the library was not initialized


cublasStatus_t cublasSgemm(cublasHandle_t handle,  // 唯一的不同：handle to the cuBLAS library context.

cublasOperation_t transa,

 cublasOperation_t transb

int m,

 int n,

int k,

const float *alpha,

const float*A,

int lda,

const float*B,

int ldb,

const float*beta,

float*C,

 int ldc

)

void cblas_sgemm (

const enum CBLAS_ORDER Order,  // Specifies row-major (C) or column-major (Fortran) data ordering.

//typedef enum CBLAS_ORDER     {CblasRowMajor=101, CblasColMajor=102} CBLAS_ORDER;

const enum CBLAS_TRANSPOSE TransA,//Specifies whether to transpose matrix A.

const enum CBLAS_TRANSPOSE TransB,

const int M,   //Number of rows in matrices A and C.

const int N,//Number of rows in matrices A and C.

const int K,  //Number of columns in matrix A; number of rows in matrix B

const float alpha, //Scaling factor for the product of matrices A and B

const float *A,

const int lda, //The size of the first dimention of matrix A; if you are passing a matrix A[m][n], the value should be m.

const float *B,

const int ldb,  //The size of the first dimention of matrix B; if you are passing a matrix B[m][n], the value should be m.

const float beta,  //Scaling factor for matrix C.

float *C,

const int ldc    //The size of the first dimention of matrix C; if you are passing a matrix C[m][n], the value should be m.

);

【神经网络与深度学习】【C/C++】使用blas做矩阵乘法的更多相关文章

使用blas做矩阵乘法
#define min(x,y) (((x) < (y)) ? (x) : (y)) #include <stdio.h> #include <stdlib.h> # ...
（转）神经网络和深度学习简史（第一部分）：从感知机到BP算法
深度|神经网络和深度学习简史(第一部分):从感知机到BP算法 2016-01-23 机器之心来自Andrey Kurenkov 作者:Andrey Kurenkov 机器之心编译出品参与:chen ...
[DeeplearningAI笔记]神经网络与深度学习人工智能行业大师访谈
觉得有用的话,欢迎一起讨论相互学习~Follow Me 吴恩达采访Geoffrey Hinton NG:前几十年,你就已经发明了这么多神经网络和深度学习相关的概念,我其实很好奇,在这么多你发明的东西中 ...
【吴恩达课后测验】Course 1 - 神经网络和深度学习 - 第二周测验【中英】
[中英][吴恩达课后测验]Course 1 - 神经网络和深度学习 - 第二周测验第2周测验 - 神经网络基础神经元节点计算什么? [ ]神经元节点先计算激活函数,再计算线性函数(z = Wx + ...
【吴恩达课后测验】Course 1 - 神经网络和深度学习 - 第一周测验【中英】
[吴恩达课后测验]Course 1 - 神经网络和深度学习 - 第一周测验[中英] 第一周测验 - 深度学习简介和“AI是新电力”相类似的说法是什么? [ ]AI为我们的家庭和办公室的个人设备供电 ...
对比《动手学深度学习》 PDF代码+《神经网络与深度学习》PDF
随着AlphaGo与李世石大战的落幕,人工智能成为话题焦点.AlphaGo背后的工作原理"深度学习"也跳入大众的视野.什么是深度学习,什么是神经网络,为何一段程序在精密的围棋大赛中 ...
如何理解归一化（Normalization）对于神经网络（深度学习）的帮助？
如何理解归一化(Normalization)对于神经网络(深度学习)的帮助? 作者:知乎用户链接:https://www.zhihu.com/question/326034346/answer/730 ...
【神经网络与深度学习】卷积神经网络（CNN）
[神经网络与深度学习]卷积神经网络(CNN) 标签:[神经网络与深度学习] 实际上前面已经发布过一次,但是这次重新复习了一下,决定再发博一次. 说明:以后的总结,还应该以我的认识进行总结,这样比较符合 ...
【神经网络与深度学习】【CUDA开发】caffe-windows win32下的编译尝试
[神经网络与深度学习][CUDA开发]caffe-windows win32下的编译尝试标签:[神经网络与深度学习] [CUDA开发] 主要是在开发Qt的应用程序时,需要的是有一个使用的库文件也只是 ...

随机推荐

numpy常用矩阵操作
1.删除列 column_to_delete = [0, 1, 2] arr = np.delete(arr, [0, 1, 2], axis=1) 2.归一化 arr = normalize(arr ...
Jenkins-邮件模板
<!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title> ...
No message错误
Symfony \ Component \ HttpKernel \ Exception \ MethodNotAllowedHttpException No message 错误原因是因为表单提交的 ...
removeAttr(name)
removeAttr(name) 概述从每一个匹配的元素中删除一个属性 1.6以下版本在IE6使用JQuery的removeAttr方法删除disabled是无效的.解决的方法就是使用$(" ...
B/S上传超大文件解决方案
4GB以上超大文件上传和断点续传服务器的实现随着视频网站和大数据应用的普及,特别是高清视频和4K视频应用的到来,超大文件上传已经成为了日常的基础应用需求. 但是在很多情况下,平台运营方并没有大文件上 ...
luoguP3371 【模板】单源最短路径
P3371 [模板]单源最短路径 3K通过 10.7K提交题目提供者 HansBug 标签云端↑ 难度普及/提高- 时空限制 1s / 128MB 题目描述如题,给出一个有向图,请输出从某一点 ...
Jmeter（七）关联之JSON提取器
如果返回的数据是JSON格式的,我们可以用JSON提取器来提取需要的字段,这样更简单一点 Variable names:保存的变量名,后面使用${Variable names}引用 JSON Path ...
python3精品解析运算符
算数运算符 +:两个对象相加 -:得到负数或者,或者一个数减去另一个数 *:两个数相乘或者是返回一个被重复若干次的字符串 /:5/2等于2.1 5//2=2(/有余数,//取整) %:取模(5%2=1 ...
阿里云OSS细粒度权限控制
做下工作记录: 自定义安全策略,然后授权即可 { ", "Statement": [ { "Effect": "Allow", & ...
koa 基础（十四）cookie 的基本使用
1.app.js /** * cookie的简介: * 1.cookie保存在浏览器客户端 * 2.可以让我们用同一个浏览器访问同一个域名的时候共享数据 * * cookie的作用: * 1.保存用户 ...

【神经网络与深度学习】【C/C++】使用blas做矩阵乘法

使用blas做矩阵乘法

【神经网络与深度学习】【C/C++】使用blas做矩阵乘法的更多相关文章

随机推荐

热门专题