Spark Distributed matrix 分布式矩阵

RowMatrix行矩阵

import org.apache.spark.rdd.RDD

import org.apache.spark.mllib.linalg.Vectors

import org.apache.spark.mllib.linalg.distributed.RowMatrix

val df1 = Seq(

     |       (1.0, 2.0, 3.0),

     |       (1.1, 2.1, 3.1),

     |       (1.2, 2.2, 3.2)).toDF("c1", "c2", "c3")

df1: org.apache.spark.sql.DataFrame = [c1: double, c2: double ... 1 more field]

df1.show

+---+---+---+

| c1| c2| c3|

+---+---+---+

|1.0|2.0|3.0|

|1.1|2.1|3.1|

|1.2|2.2|3.2|

+---+---+---+

// DataFrame转换成RDD[Vector]

val rowsVector= df1.rdd.map {

     |       x =>

     |         Vectors.dense(

     |           x(0).toString().toDouble,

     |           x(1).toString().toDouble,

     |           x(2).toString().toDouble)

     |     }

rowsVector: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] = MapPartitionsRDD[4] at map

// Create a RowMatrix from an RDD[Vector].

val mat1: RowMatrix = new RowMatrix(rowsVector)

mat1: org.apache.spark.mllib.linalg.distributed.RowMatrix = org.apache.spark.mllib.linalg.distributed.RowMatrix@7ba821ef

// Get its size.

val m = mat1.numRows()

m: Long = 3                                                                     

val n = mat1.numCols()

n: Long = 3

// 将RowMatrix转换成DataFrame

val resDF = mat1.rows.map {

     |       x =>

     |         (x(0).toDouble,

     |           x(1).toDouble,

     |           x(2).toDouble)

     |     }.toDF("c1", "c2", "c3")

resDF: org.apache.spark.sql.DataFrame = [c1: double, c2: double ... 1 more field]

resDF.show

+---+---+---+

| c1| c2| c3|

+---+---+---+

|1.0|2.0|3.0|

|1.1|2.1|3.1|

|1.2|2.2|3.2|

+---+---+---+

mat1.rows.collect().take(10)

res3: Array[org.apache.spark.mllib.linalg.Vector] = Array([1.0,2.0,3.0], [1.1,2.1,3.1], [1.2,2.2,3.2])

CoordinateMatrix坐标矩阵

import org.apache.spark.rdd.RDD

import org.apache.spark.mllib.linalg.distributed.{CoordinateMatrix, MatrixEntry}

// 第一列：行坐标；第二列：列坐标；第三列：矩阵元素

val df = Seq(

     |       (0, 0, 1.1), (0, 1, 1.2), (0, 2, 1.3),

     |       (1, 0, 2.1), (1, 1, 2.2), (1, 2, 2.3),

     |       (2, 0, 3.1), (2, 1, 3.2), (2, 2, 3.3),

     |       (3, 0, 4.1), (3, 1, 4.2), (3, 2, 4.3)).toDF("row", "col", "value")

df: org.apache.spark.sql.DataFrame = [row: int, col: int ... 1 more field]

df.show

+---+---+-----+

|row|col|value|

+---+---+-----+

|  0|  0|  1.1|

|  0|  1|  1.2|

|  0|  2|  1.3|

|  1|  0|  2.1|

|  1|  1|  2.2|

|  1|  2|  2.3|

|  2|  0|  3.1|

|  2|  1|  3.2|

|  2|  2|  3.3|

|  3|  0|  4.1|

|  3|  1|  4.2|

|  3|  2|  4.3|

+---+---+-----+

// 生成入口矩阵

val entr = df.rdd.map { x =>

     |       val a = x(0).toString().toLong

     |       val b = x(1).toString().toLong

     |       val c = x(2).toString().toDouble

     |       MatrixEntry(a, b, c)

     |     }

entr: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.distributed.MatrixEntry] = MapPartitionsRDD[20] at map

// 生成坐标矩阵

val mat: CoordinateMatrix = new CoordinateMatrix(entr)

mat: org.apache.spark.mllib.linalg.distributed.CoordinateMatrix = org.apache.spark.mllib.linalg.distributed.CoordinateMatrix@5381deec

mat.numRows()

res5: Long = 4                                                                  

mat.numCols()

res6: Long = 3

mat.entries.collect().take(10)

res7: Array[org.apache.spark.mllib.linalg.distributed.MatrixEntry] = Array(MatrixEntry(0,0,1.1), MatrixEntry(0,1,1.2), MatrixEntry(0,2,1.3), MatrixEntry(1,0,2.1), MatrixEntry(1,1,2.2), MatrixEntry(1,2,2.3), MatrixEntry(2,0,3.1), MatrixEntry(2,1,3.2), MatrixEntry(2,2,3.3), MatrixEntry(3,0,4.1))

// 坐标矩阵转成，带行索引的DataFrame，行索引为行坐标

val t = mat.toIndexedRowMatrix().rows.map { x =>

     |       val v=x.vector

     |       (x.index,v(0).toDouble, v(1).toDouble, v(2).toDouble)

     |     }

t: org.apache.spark.rdd.RDD[(Long, Double, Double, Double)] = MapPartitionsRDD[33] at map

t.toDF().show

+---+---+---+---+

| _1| _2| _3| _4|

+---+---+---+---+

|  0|1.1|1.2|1.3|

|  1|2.1|2.2|2.3|

|  2|3.1|3.2|3.3|

|  3|4.1|4.2|4.3|

+---+---+---+---+

// 坐标矩阵转成DataFrame

val t1 = mat.toRowMatrix().rows.map { x =>

     |       (x(0).toDouble, x(1).toDouble, x(2).toDouble)

     |     }

t1: org.apache.spark.rdd.RDD[(Double, Double, Double)] = MapPartitionsRDD[26] at map

t1.toDF().show

+---+---+---+

| _1| _2| _3|

+---+---+---+

|1.1|1.2|1.3|

|3.1|3.2|3.3|

|2.1|2.2|2.3|

|4.1|4.2|4.3|

+---+---+---+

Spark Distributed matrix 分布式矩阵的更多相关文章

Spark Mllib里的分布式矩阵（行矩阵、带有行索引的行矩阵、坐标矩阵和块矩阵概念、构成）（图文详解）
不多说,直接上干货! Distributed matrix : 分布式矩阵一般能采用分布式矩阵,说明这数据存储下来,量还是有一定的.在Spark Mllib里,提供了四种分布式矩阵存储形式,均由支 ...
Spark机器学习MLlib系列１（for python）－－数据类型，向量，分布式矩阵，API
Spark机器学习MLlib系列1(for python)--数据类型,向量,分布式矩阵,API 关键词:Local vector,Labeled point,Local matrix,Distrib ...
[CareerCup] 1.7 Set Matrix Zeroes 矩阵赋零
1.7 Write an algorithm such that if an element in an MxN matrix is 0, its entire row and column are ...
A Practical Guide to Distributed Scrum - 分布式Scrum的实用指南 - 读书笔记
最近读了这本IBM出的<A Practical Guide to Distributed Scrum>(分布式Scrum的实用指南),书中的章节结构比较清楚,是针对Scrum项目进行,一个 ...
在Hadoop2.2基础上安装Spark（伪分布式）
没想到,在我的hadoop2.2.0小集群上上安装传说中的Spark竟然如此顺利,可能是因为和搭建Hadoop时比较像,更多需要学习的地方还是scala编程和RDD机制吧总之,开个好头原来的集群: ...
css3 matrix 2D矩阵和canvas transform 2D矩阵
一看到“2D矩阵”这个高大上的名词,有的同学可能会有种畏惧感,“矩阵”,看起来好高深的样子,我还是看点简单的吧.其实本文就很简单,你只需要有一点点css3 transform的基础就好. 没有前戏,直 ...
Leetcode 54:Spiral Matrix 螺旋矩阵
54:Spiral Matrix 螺旋矩阵 Given a matrix of m x n elements (m rows, n columns), return all elements of t ...
bzoj 4128: Matrix ——BSGS&&矩阵快速幂&&哈希
题目给定矩阵A, B和模数p,求最小的正整数x满足 A^x = B(mod p). 分析与整数的离散对数类似,只不过普通乘法换乘了矩阵乘法. 由于矩阵的求逆麻烦,使用 $A^{km-t} = B( ...
【Distributed】分布式解决方案【汇总】
一.问题引出二.分布式Session问题三.网站跨域问题四.分布式任务调度平台五.分布式配置中心六.分布式锁解决方案七.缓存技术一.问题引出 [Distributed]分布式系统中遇到的 ...

随机推荐

delphi ListView 设置固定列宽
object Form1: TForm1 Left = Top = Caption = 'Form1' ClientHeight = ClientWidth = Color = clBtnFace F ...
常见爬虫/BOT对抗技术介绍（一）
爬虫,是大家获取互联网公开数据的有效手段.爬虫.反爬虫技术.反-反爬虫技术随着互联网的不断发展,也在不断发展更新, 本文简要介绍现代的爬虫/BOT对抗技术,如有疏漏,多谢指正! 一.反爬虫/BOT技术 ...
android应用程序中获取view的位置
我们重点在获取view的y坐标,你懂的... 依次介绍以下四个方法: 1.getLocationInWindow int[] position = new int[2]; textview.getLo ...
scipy.stats与统计学：4个概率分布：N，chi2，F，t
scipy.stats与统计学:4个概率分布:N,chi2,F,t 四个常用分布的概率密度函数.分布函数.期望.分位数.以及期望方差标准差中位数原点矩: 1,正态分布: from scipy.st ...
微软 microsoft calendar control 11.0 控件下载
微软 microsoft calendar control 11.0 控件下载 https://files.cnblogs.com/files/mqingqing123/csccal2.rar
golang dlv 远程调试
因为不知道delvel 是如何设置源码的,本地编译的上传到服务器上,服务器要调试看不到源码,很是忧伤,所以干脆使用远程调试吧: 在服务器上 ps x|grep game 查找到gameserver的进 ...
C# Task 是什么？返回值如何实现? Wait如何实现
关于Task的API太多了,网上的实例也很多,现在我们来说说Task究竟是个什么东西[task一般用于多线程,它一定与线程有关],还有它的返回值有事怎么搞的. 首先我们以一个最简单的API开始,Tas ...
MySQL DBA工作角色和职责介绍
MySQL DBA分架构DBA,运维DBA和开发DBA三种角色,职责介绍如下:
HTML语言字符编码
! ! — 惊叹号Exclamation mark ” " " 双引号Quotation mark # # — 数字标志Number sign $ $ — 美元标志Dollar s ...
from __future__ import print_function的作用
阅读代码的时候会看到下面语句: from __future__ import print_function 该语句是python2的概念,那么python3对于python2就是future了,也就是 ...

Spark Distributed matrix 分布式矩阵

RowMatrix行矩阵

CoordinateMatrix坐标矩阵

Spark Distributed matrix 分布式矩阵的更多相关文章

随机推荐

热门专题