使用blas做矩阵乘法

#define min(x,y) (((x) < (y)) ? (x) : (y))

#include <stdio.h>

#include <stdlib.h>

#include <cublas_v2.h>

#include <iostream>

#include <vector>

//extern "C"

//{

   #include <cblas.h>

//}

using namespace std;

int main()

{

    const enum CBLAS_ORDER Order=CblasRowMajor;

    const enum CBLAS_TRANSPOSE TransA=CblasNoTrans;

    const enum CBLAS_TRANSPOSE TransB=CblasNoTrans;

    const int M=;//A的行数，C的行数

    const int N=;//B的列数，C的列数

    const int K=;//A的列数，B的行数

    const float alpha=;

    const float beta=;

    const int lda=K;//A的列

    const int ldb=N;//B的列

    const int ldc=N;//C的列

    const float A[M*K]={,,,,,,,,,,,};

    const float B[K*N]={,,,,,};

    float C[M*N];

    cblas_sgemm(Order, TransA, TransB, M, N, K, alpha, A, lda, B, ldb, beta, C, ldc);

    for(int i=;i<M;i++)

    {

       for(int j=;j<N;j++)

       {

           cout<<C[i*N+j]<<"\n";

       }

       cout<<endl;

    }

    return EXIT_SUCCESS;

}

g++ testblas.c++ -lopenblas -o testout

g++ testblas.c++ -lopenblas_piledriverp-r0.2.9 -o testout 本地编译openblas版本

注意library放在引用library的函数的后面

cblas_sgemm

Multiplies two matrices (single-precision).

void cblas_sgemm (

const enum CBLAS_ORDER Order,  // Specifies row-major (C) or column-major (Fortran) data ordering.

//typedef enum CBLAS_ORDER     {CblasRowMajor=101, CblasColMajor=102} CBLAS_ORDER;

const enum CBLAS_TRANSPOSE TransA,//Specifies whether to transpose matrix A.

const enum CBLAS_TRANSPOSE TransB,

const int M,   //Number of rows in matrices A and C.

const int N,//Number of rows in matrices A and C.

const int K,  //Number of columns in matrix A; number of rows in matrix B

const float alpha, //Scaling factor for the product of matrices A and B

const float *A,

const int lda, //The size of the first dimention of matrix A; if you are passing a matrix A[m][n], the value should be m.  stride

lda, ldb and ldc (the strides) are not relevant to my problem after all, but here's an explanation of them : 

The elements of a matrix (i.e a 2D array) are stored contiguously in memory. However, they may be stored in either column-major or row-major fashion. The stride represents the distance in memory between elements in adjacent rows (if row-major) or in adjacent columns (if column-major). This means that the stride is usually equal to the number of rows/columns in the matrix.

Matrix A =
[1 2 3]
[4 5 6]
Row-major stores values as {1,2,3,4,5,6}
Stride here is 3

Col-major stores values as {1, 4, 2, 5, 3, 6}
Stride here is 2

Matrix B =
[1 2 3]
[4 5 6]
[7 8 9]

Col-major storage is {1, 4, 7, 2, 5, 8, 3, 6, 9}
Stride here is 3

Read more: http://www.physicsforums.com 

const float *B,

const int ldb,  //The size of the first dimention of matrix B; if you are passing a matrix B[m][n], the value should be m.

const float beta,  //Scaling factor for matrix C.

float *C,

const int ldc    //The size of the first dimention of matrix C; if you are passing a matrix C[m][n], the value should be m.

);

Thus, it calculates either

C←αAB + βC

or

C←αBA + βC

with optional use of transposed forms of A, B, or both.

typedef enum CBLAS_ORDER     {CblasRowMajor=101, CblasColMajor=102} CBLAS_ORDER;

typedef enum CBLAS_TRANSPOSE {CblasNoTrans=111, CblasTrans=112, CblasConjTrans=113, CblasConjNoTrans=114} CBLAS_TRANSPOSE;

$C=A*B$

$C^T=(A*B)^T=B^T*A^T$ 把A和B的顺序颠倒，可以直接得到转制矩阵乘法的结果，不用作其他变换，（结果C也是转制）。

Y←αAX + βY

cblas_sgemv

Multiplies a matrix by a vector (single precision).

void cblas_sgemv (

const enum CBLAS_ORDER Order,

const enum CBLAS_TRANSPOSE TransA,

const int M,

const int N,

const float alpha,

const float *A,

const int lda,

const float *X,

const int incX,

const float beta,

float *Y,

const int incY

);

STL版本

cblas_daxpy
Computes a constant times a vector plus a vector (double-precision).　　

On return, the contents of vector Y are replaced with the result. The value computed is (alpha * X[i]) +
Y[i].

#include <OpenBlas/cblas.h>

#include <OpenBlas/common.h>

#include <iostream>

#include <vector>

int main()

{

    blasint n = ;

    blasint in_x =;

    blasint in_y =;

    std::vector<double> x(n);

    std::vector<double> y(n);

    double alpha = ;

    std::fill(x.begin(),x.end(),1.0);

    std::fill(y.begin(),y.end(),2.0);

    cblas_daxpy( n, alpha, &x[], in_x, &y[], in_y);

    //Print y

    for(int j=;j<n;j++)

        std::cout << y[j] << "\t";

    std::cout << std::endl;

}


cublas

cublasStatus_t
cublasCreate(cublasHandle_t *handle)

Return Value Meaning
CUBLAS_STATUS_SUCCESS the initialization succeeded
CUBLAS_STATUS_NOT_INITIALIZED the CUDATM Runtime initialization failed
CUBLAS_STATUS_ALLOC_FAILED the resources could not be allocated

cublasStatus_t
cublasDestroy(cublasHandle_t handle)

Return Value Meaning
CUBLAS_STATUS_SUCCESS the shut down succeeded
CUBLAS_STATUS_NOT_INITIALIZED the library was not initialized


cublasStatus_t cublasSgemm(cublasHandle_t handle,  // 唯一的不同：handle to the cuBLAS library context.

cublasOperation_t transa,
 cublasOperation_t transb

int m,
 int n, 
int k,

const float *alpha,

const float*A, 
int lda,

const float*B, 
int ldb,

const float*beta,

float*C,
 int ldc
)

void cblas_sgemm (

const enum CBLAS_ORDER Order,  // Specifies row-major (C) or column-major (Fortran) data ordering.

//typedef enum CBLAS_ORDER     {CblasRowMajor=101, CblasColMajor=102} CBLAS_ORDER;

const enum CBLAS_TRANSPOSE TransA,//Specifies whether to transpose matrix A.

const enum CBLAS_TRANSPOSE TransB,

const int M,   //Number of rows in matrices A and C.

const int N,//Number of rows in matrices A and C.

const int K,  //Number of columns in matrix A; number of rows in matrix B

const float alpha, //Scaling factor for the product of matrices A and B

const float *A,

const int lda, //The size of the first dimention of matrix A; if you are passing a matrix A[m][n], the value should be m.

const float *B,

const int ldb,  //The size of the first dimention of matrix B; if you are passing a matrix B[m][n], the value should be m.

const float beta,  //Scaling factor for matrix C.

float *C,

const int ldc    //The size of the first dimention of matrix C; if you are passing a matrix C[m][n], the value should be m.

);

使用blas做矩阵乘法的更多相关文章

【神经网络与深度学习】【C/C++】使用blas做矩阵乘法
使用blas做矩阵乘法 #define min(x,y) (((x) < (y)) ? (x) : (y)) #include <stdio.h> #include <st ...
numpy.loadtxt() 出现codecError_____ Excel 做矩阵乘法
1) 用 numpy读入csv文件是报错 UnicodeDecodeError: 'gbk' codec can't decode byte 0xbf in position 2: illegal m ...
cuda中用cublas库做矩阵乘法
这里矩阵C=A*B,原始文档给的公式是C=alpha*A*B+beta*C,所以这里alpha=1,beta=0. 主要使用cublasSgemm这个函数,这个函数的第二个参数有三种类型,这里CUBL ...
POJ 2778 DNA Sequence (AC自动机,矩阵乘法)
题意:给定n个不能出现的模式串,给定一个长度m,要求长度为m的合法串有多少种. 思路:用AC自动机,利用AC自动机上的节点做矩阵乘法. #include<iostream> #includ ...
poj3233之经典矩阵乘法
Matrix Power Series Time Limit: 3000MS Memory Limit: 131072K Total Submissions: 12346 Accepted: ...
51nod 1462 树据结构 | 树链剖分矩阵乘法
题目链接 51nod 1462 题目描述给一颗以1为根的树. 每个点有两个权值:vi, ti,一开始全部是零. Q次操作: 读入o, u, d o = 1 对u到根上所有点的vi += d o = ...
【BZOJ1706】[usaco2007 Nov]relays 奶牛接力跑矩阵乘法
[BZOJ1706][usaco2007 Nov]relays 奶牛接力跑 Description FJ的N(2 <= N <= 1,000,000)头奶牛选择了接力跑作为她们的日常锻炼项 ...
[ZJOI2005]沼泽鳄鱼矩阵乘法
---题面--- 题解: 乍一看还是挺懵逼的.和HH去散步很像,思路也是类似的. 复制一段我在HH去散步的题解里面写的一段话吧: 考虑f[i][j]表示i和j是否右边相连,有为1,否则为0,那么f同时 ...
BZOJ_3231_[Sdoi2008]递归数列_矩阵乘法
BZOJ_3231_[Sdoi2008]递归数列_矩阵乘法 Description 一个由自然数组成的数列按下式定义: 对于i <= k:ai = bi 对于i > k: ai = c1a ...

随机推荐

node(03)--利用 HTTP 模块 URl 模块 PATH 模块 FS 模块创建一个 WEB 服务器
Web 服务器一般指网站服务器,是指驻留于因特网上某种类型计算机的程序,可以向浏览器等 Web 客户端提供文档,也可以放置网站文件,让全世界浏览:可以放置数据文件,让全世界下载.目前最主流的三个 We ...
dfs进阶
当自己以为自己深搜(其实就是暴力啦)小成的时候,发现没有题目的积累还是很难写出程序,自己真的是太年轻了:总结一下就是做此类题看是否需要使用vis数组优化以及继续搜索的条件或者满足答案的条件.以下为2题 ...
vim中的分屏操作
title: vim中的分屏操作 date: 2017-11-14 21:45:11 tags: vim categories: 开发工具在命令行中: vim -On file1 file2 # O ...
Delphi7打开项目提示'one or more lines were too long and has been truncated'
打开主项目文件直接显示一排'口'形状!查了下资料也问了下伙伴,这多半应该是文件损坏了,解决办法: 1. 不关D7的事,所以重装D7应该是无效的,最好看看自己是不是有备份文件,我之前有备份的所以直接覆盖 ...
JVM探秘2--详解内存溢出OutOfMemoryError异常
JVM运行时内存被划分成多个区域,而除了程序计数器之外,其他几个区都会出现OutOfMemoryError异常,主要原因就是对应内存区域的内存不足以再分配内存,一般要么是内存泄漏了要么就是内存参数设置 ...
k-means算法 - 数据挖掘算法（5）
(2017-05-02 银河统计) k-means算法,也被称为k-平均或k-均值,是数据挖掘技术中一种广泛使用的聚类算法. 它是将各个聚类子集内的所有数据样本的均值作为该聚类的代表点,算法的主要思想 ...
Docker Kubernetes 命令行创建容器
Docker Kubernetes 命令行创建容器环境: 系统:Centos 7.4 x64 Docker版本:18.09.0 Kubernetes版本:v1.8 管理节点:192.168.1.79 ...
ORA-64379: Action cannot be performed on the tablespace assigned to FastStart while the feature is enabled
解决方法: 禁止IM FastStart exec DBMS_INMEMORY_ADMIN.FASTSTART_DISABLE();
vim自动补全头注释与说明
做个笔记吧. .vimrc autocmd BufNewFile *.c,*.cpp,*.sh,*.py,*.java exec ":call SetTitle()" " ...
剑指offer（36）两个链表中的第一个公共节点
题目描述输入两个链表,找出它们的第一个公共结点. 题目分析我发现关于链表的题都涉及双指针,大家做的时候记得用双指针. 题目理解了就很好做了,比较简单,先在长的链表上跑,知道长的和短的一样长,再一起 ...

使用blas做矩阵乘法

使用blas做矩阵乘法的更多相关文章

随机推荐

热门专题