spark mllib prefixspan demo
./bin/spark-submit ~/src_test/prefix_span_test.py
source code:
import os
import sys
from pyspark.mllib.fpm import PrefixSpan
from pyspark import SparkContext
from pyspark import SparkConf sc = SparkContext("local","testing")
print(sc)
data = [
[['a'],["a", "b", "c"], ["a","c"],["d"],["c", "f"]],
[["a","d"], ["c"],["b", "c"], ["a", "e"]],
[["e", "f"], ["a", "b"], ["d","f"],["c"],["b"]],
[["e"], ["g"],["a", "f"],["c"],["b"],["c"]]
]
rdd = sc.parallelize(data, 2)
model = PrefixSpan.train(rdd, 0.5,4)
result = sorted(model.freqSequences().collect())
print("*"*88)
print(result)
print("*"*88)
output:
****************************************************************************************
[FreqSequence(sequence=[['a']], freq=4), FreqSequence(sequence=[['a'], ['a']], freq=2), FreqSequence(sequence=[['a'], ['b']], freq=4), FreqSequence(sequence=[['a'], ['b'], ['a']], freq=2), FreqSequence(sequence=[['a'], ['b'], ['c']], freq=2), FreqSequence(sequence=[['a'], ['b', 'c']], freq=2), FreqSequence(sequence=[['a'], ['b', 'c'], ['a']], freq=2), FreqSequence(sequence=[['a'], ['c']], freq=4), FreqSequence(sequence=[['a'], ['c'], ['a']], freq=2), FreqSequence(sequence=[['a'], ['c'], ['b']], freq=3), FreqSequence(sequence=[['a'], ['c'], ['c']], freq=3), FreqSequence(sequence=[['a'], ['d']], freq=2), FreqSequence(sequence=[['a'], ['d'], ['c']], freq=2), FreqSequence(sequence=[['a'], ['f']], freq=2), FreqSequence(sequence=[['b']], freq=4), FreqSequence(sequence=[['b'], ['a']], freq=2), FreqSequence(sequence=[['b'], ['c']], freq=3), FreqSequence(sequence=[['b'], ['d']], freq=2), FreqSequence(sequence=[['b'], ['d'], ['c']], freq=2), FreqSequence(sequence=[['b'], ['f']], freq=2), FreqSequence(sequence=[['b', 'a']], freq=2), FreqSequence(sequence=[['b', 'a'], ['c']], freq=2), FreqSequence(sequence=[['b', 'a'], ['d']], freq=2), FreqSequence(sequence=[['b', 'a'], ['d'], ['c']], freq=2), FreqSequence(sequence=[['b', 'a'], ['f']], freq=2), FreqSequence(sequence=[['b', 'c']], freq=2), FreqSequence(sequence=[['b', 'c'], ['a']], freq=2), FreqSequence(sequence=[['c']], freq=4), FreqSequence(sequence=[['c'], ['a']], freq=2), FreqSequence(sequence=[['c'], ['b']], freq=3), FreqSequence(sequence=[['c'], ['c']], freq=3), FreqSequence(sequence=[['d']], freq=3), FreqSequence(sequence=[['d'], ['b']], freq=2), FreqSequence(sequence=[['d'], ['c']], freq=3), FreqSequence(sequence=[['d'], ['c'], ['b']], freq=2), FreqSequence(sequence=[['e']], freq=3), FreqSequence(sequence=[['e'], ['a']], freq=2), FreqSequence(sequence=[['e'], ['a'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['a'], ['c']], freq=2), FreqSequence(sequence=[['e'], ['a'], ['c'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['b'], ['c']], freq=2), FreqSequence(sequence=[['e'], ['c']], freq=2), FreqSequence(sequence=[['e'], ['c'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['f']], freq=2), FreqSequence(sequence=[['e'], ['f'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['f'], ['c']], freq=2), FreqSequence(sequence=[['e'], ['f'], ['c'], ['b']], freq=2), FreqSequence(sequence=[['f']], freq=3), FreqSequence(sequence=[['f'], ['b']], freq=2), FreqSequence(sequence=[['f'], ['b'], ['c']], freq=2), FreqSequence(sequence=[['f'], ['c']], freq=2), FreqSequence(sequence=[['f'], ['c'], ['b']], freq=2)]
****************************************************************************************
spark mllib prefixspan demo的更多相关文章
- 在Java Web中使用Spark MLlib训练的模型
PMML是一种通用的配置文件,只要遵循标准的配置文件,就可以在Spark中训练机器学习模型,然后再web接口端去使用.目前应用最广的就是基于Jpmml来加载模型在javaweb中应用,这样就可以实现跨 ...
- 十二、spark MLlib的scala示例
简介 spark MLlib官网:http://spark.apache.org/docs/latest/ml-guide.html mllib是spark core之上的算法库,包含了丰富的机器学习 ...
- Spark MLlib + maven + scala 试水~
使用SGD算法逻辑回归的垃圾邮件分类器 package com.oreilly.learningsparkexamples.scala import org.apache.spark.{SparkCo ...
- Spark MLlib之线性回归源代码分析
1.理论基础 线性回归(Linear Regression)问题属于监督学习(Supervised Learning)范畴,又称分类(Classification)或归纳学习(Inductive Le ...
- spark mllib docs,MLlib: RDD-based API
MLlib: RDD-based API This page documents sections of the MLlib guide for the RDD-based API (the spar ...
- spark mllib lda 中文分词、主题聚合基本样例
github https://github.com/cclient/spark-lda-example spark mllib lda example 官方示例较为精简 在官方lda示例的基础上,给合 ...
- Spark MLlib中KMeans聚类算法的解析和应用
聚类算法是机器学习中的一种无监督学习算法,它在数据科学领域应用场景很广泛,比如基于用户购买行为.兴趣等来构建推荐系统. 核心思想可以理解为,在给定的数据集中(数据集中的每个元素有可被观察的n个属性), ...
- Spark MLlib - LFW
val path = "/usr/data/lfw-a/*" val rdd = sc.wholeTextFiles(path) val first = rdd.first pri ...
- 《Spark MLlib机器学习实践》内容简介、目录
http://product.dangdang.com/23829918.html Spark作为新兴的.应用范围最为广泛的大数据处理开源框架引起了广泛的关注,它吸引了大量程序设计和开发人员进行相 ...
随机推荐
- sql语句中select……as的用法
- Python基础_列表 list
列表是Python的一种基础数据类型,可以进行的操作包括索引,切片,加,乘,检查成员 列表定义: list(列表.数组) eg:stus=['lisi','jion','peter'] #下标:即角标 ...
- angular2 ----字符串、对象、base64 之间的转换
1. JSON对象转化为字符串 let obj = { "name":Ayinger; "sex":"女"; } let str = JSO ...
- Windows下安装Redis服务
说明:本文拷贝自https://jingyan.baidu.com/article/0f5fb099045b056d8334ea97.html Redis是有名的NoSql数据库,一般Linux都会默 ...
- 火狐开发----如何快速的安装火狐XPI文件
第一步:火狐的自动安装扩展程序,https://addons.mozilla.org/zh-CN/firefox/addon/autoinstaller/ 第二步:安装wget工具,这个Linux应该 ...
- Linux网络属性管理
Linux网络属性管理 局域网:以太网,令牌环网 Ethernet: CSMA/CD 冲突域 广播域 MAC:Media Access Control 48bits: 24bits: 24bits: ...
- bash 基础命令
bash的基础特性(): () 命令历史 history 环境变量: HISTSIZE:命令历史记录的条数: HISTFILE:~/.bash_history: HISTFILESIZE:命令历史文件 ...
- 编程类-----matlab基础语法复习(2)
2019年美赛准备:matlab基本题目运算 clear,clc %% 计算1/3 + 2/5 + ...3/7 +10/21 % i = 1; j = 3; ans = 0; % while i & ...
- JS(JavaScript)的初了解6(更新中···)
Js数据类型具体分析 基础类型: string number boolean null undefined 引用类型: object ==> json array 等 复习 ...
- Qt setstylesheet指定窗口
#窗口名称{ ...} 在窗口名称前加#号可以指定某个窗口设置stylesheet而不影响子窗口.子控件,可以用于设置边框,不影响子控件产生一样的边框.