spark mllib prefixspan demo
./bin/spark-submit ~/src_test/prefix_span_test.py
source code:
import os
import sys
from pyspark.mllib.fpm import PrefixSpan
from pyspark import SparkContext
from pyspark import SparkConf sc = SparkContext("local","testing")
print(sc)
data = [
[['a'],["a", "b", "c"], ["a","c"],["d"],["c", "f"]],
[["a","d"], ["c"],["b", "c"], ["a", "e"]],
[["e", "f"], ["a", "b"], ["d","f"],["c"],["b"]],
[["e"], ["g"],["a", "f"],["c"],["b"],["c"]]
]
rdd = sc.parallelize(data, 2)
model = PrefixSpan.train(rdd, 0.5,4)
result = sorted(model.freqSequences().collect())
print("*"*88)
print(result)
print("*"*88)
output:
****************************************************************************************
[FreqSequence(sequence=[['a']], freq=4), FreqSequence(sequence=[['a'], ['a']], freq=2), FreqSequence(sequence=[['a'], ['b']], freq=4), FreqSequence(sequence=[['a'], ['b'], ['a']], freq=2), FreqSequence(sequence=[['a'], ['b'], ['c']], freq=2), FreqSequence(sequence=[['a'], ['b', 'c']], freq=2), FreqSequence(sequence=[['a'], ['b', 'c'], ['a']], freq=2), FreqSequence(sequence=[['a'], ['c']], freq=4), FreqSequence(sequence=[['a'], ['c'], ['a']], freq=2), FreqSequence(sequence=[['a'], ['c'], ['b']], freq=3), FreqSequence(sequence=[['a'], ['c'], ['c']], freq=3), FreqSequence(sequence=[['a'], ['d']], freq=2), FreqSequence(sequence=[['a'], ['d'], ['c']], freq=2), FreqSequence(sequence=[['a'], ['f']], freq=2), FreqSequence(sequence=[['b']], freq=4), FreqSequence(sequence=[['b'], ['a']], freq=2), FreqSequence(sequence=[['b'], ['c']], freq=3), FreqSequence(sequence=[['b'], ['d']], freq=2), FreqSequence(sequence=[['b'], ['d'], ['c']], freq=2), FreqSequence(sequence=[['b'], ['f']], freq=2), FreqSequence(sequence=[['b', 'a']], freq=2), FreqSequence(sequence=[['b', 'a'], ['c']], freq=2), FreqSequence(sequence=[['b', 'a'], ['d']], freq=2), FreqSequence(sequence=[['b', 'a'], ['d'], ['c']], freq=2), FreqSequence(sequence=[['b', 'a'], ['f']], freq=2), FreqSequence(sequence=[['b', 'c']], freq=2), FreqSequence(sequence=[['b', 'c'], ['a']], freq=2), FreqSequence(sequence=[['c']], freq=4), FreqSequence(sequence=[['c'], ['a']], freq=2), FreqSequence(sequence=[['c'], ['b']], freq=3), FreqSequence(sequence=[['c'], ['c']], freq=3), FreqSequence(sequence=[['d']], freq=3), FreqSequence(sequence=[['d'], ['b']], freq=2), FreqSequence(sequence=[['d'], ['c']], freq=3), FreqSequence(sequence=[['d'], ['c'], ['b']], freq=2), FreqSequence(sequence=[['e']], freq=3), FreqSequence(sequence=[['e'], ['a']], freq=2), FreqSequence(sequence=[['e'], ['a'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['a'], ['c']], freq=2), FreqSequence(sequence=[['e'], ['a'], ['c'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['b'], ['c']], freq=2), FreqSequence(sequence=[['e'], ['c']], freq=2), FreqSequence(sequence=[['e'], ['c'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['f']], freq=2), FreqSequence(sequence=[['e'], ['f'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['f'], ['c']], freq=2), FreqSequence(sequence=[['e'], ['f'], ['c'], ['b']], freq=2), FreqSequence(sequence=[['f']], freq=3), FreqSequence(sequence=[['f'], ['b']], freq=2), FreqSequence(sequence=[['f'], ['b'], ['c']], freq=2), FreqSequence(sequence=[['f'], ['c']], freq=2), FreqSequence(sequence=[['f'], ['c'], ['b']], freq=2)]
****************************************************************************************
spark mllib prefixspan demo的更多相关文章
- 在Java Web中使用Spark MLlib训练的模型
PMML是一种通用的配置文件,只要遵循标准的配置文件,就可以在Spark中训练机器学习模型,然后再web接口端去使用.目前应用最广的就是基于Jpmml来加载模型在javaweb中应用,这样就可以实现跨 ...
- 十二、spark MLlib的scala示例
简介 spark MLlib官网:http://spark.apache.org/docs/latest/ml-guide.html mllib是spark core之上的算法库,包含了丰富的机器学习 ...
- Spark MLlib + maven + scala 试水~
使用SGD算法逻辑回归的垃圾邮件分类器 package com.oreilly.learningsparkexamples.scala import org.apache.spark.{SparkCo ...
- Spark MLlib之线性回归源代码分析
1.理论基础 线性回归(Linear Regression)问题属于监督学习(Supervised Learning)范畴,又称分类(Classification)或归纳学习(Inductive Le ...
- spark mllib docs,MLlib: RDD-based API
MLlib: RDD-based API This page documents sections of the MLlib guide for the RDD-based API (the spar ...
- spark mllib lda 中文分词、主题聚合基本样例
github https://github.com/cclient/spark-lda-example spark mllib lda example 官方示例较为精简 在官方lda示例的基础上,给合 ...
- Spark MLlib中KMeans聚类算法的解析和应用
聚类算法是机器学习中的一种无监督学习算法,它在数据科学领域应用场景很广泛,比如基于用户购买行为.兴趣等来构建推荐系统. 核心思想可以理解为,在给定的数据集中(数据集中的每个元素有可被观察的n个属性), ...
- Spark MLlib - LFW
val path = "/usr/data/lfw-a/*" val rdd = sc.wholeTextFiles(path) val first = rdd.first pri ...
- 《Spark MLlib机器学习实践》内容简介、目录
http://product.dangdang.com/23829918.html Spark作为新兴的.应用范围最为广泛的大数据处理开源框架引起了广泛的关注,它吸引了大量程序设计和开发人员进行相 ...
随机推荐
- 【转载】pycharm破解,可使用到2099年.pycharm版本 pycharm-professional-2016.3.1
1. Pycharm的安装方法,论坛很多,这里就不赘述了.参照:http://blog.csdn.net/qq_29883591/article/details/52664478 2. 下载Pycha ...
- 2018.6.10数据结构串讲_HugeGun
链接: https://pan.baidu.com/s/1uQwLZAT8gjENDWLDm7-Oig 密码: mk8p @echo off : ) shuju test test_ fc test. ...
- Lesson 02-Linux基础命令(一)
查看系统IP Linux:ifconfig/ip a Windows:ipconfig vi:创建文件并编辑 touch:创建空文件 mkdir ~/a 在用户家目录下创建名称为a的文件夹 -p 创建 ...
- 编译openwrt时报错:g++: internal compiler error: Killed (program cc1plus)
答: 这是内存不足导致的,增大内存或者减少运行的线程即可
- Privoxy教程
简介 Privoxy 是一个 代理软件 简单说,就是进出你电脑的流量守门人.借由 Privoxy,我们可以控制出去的请求,还可以改写返回的响应.不必要的请求 – 比如视频广告的地址.图片广告的地址,我 ...
- spring boot 2使用Mybatis多表关联查询
模拟业务关系:一个用户user有对应的一个公司company,每个用户有多个账户account. spring boot 2的环境搭建见上文:spring boot 2整合mybatis 一.mysq ...
- iso移动端input的bug解决(vue)
iso中input很奇怪,点击空白地方,键盘也不会消失,影响页面中其他功能 解决办法: 点击的元素不是input或者textarea,那么就让上一个获得焦点的输入框失去焦点. 涉及的代码: <i ...
- C C++互相调用注意
注意:直接调用会找不到函数定义 1. C 调用 C++封装好后的函数: 在C++中有一个函数 int main_cpp(): 首先构建头文件, #ifndef CPP_FILE_H #define ...
- 解决 Cannot uninstall 'pyparsing' 问题
参考 pyparsing 无法卸载导致安装 matplotlib 出错 解决 Cannot uninstall 'pyparsing' 问题 在安装 pydot 时遇到依赖 pyparsing 无法更 ...
- 启动一个SpringBoot的maven项目
最近拿到了一个maven项目,原先是使用.net开发的,虽然Java和C#的语法相近,但是难免还有一些差别,包括语言特性,IDE的使用方面,都需要一段时间的习惯和适应. 该项目总体上是前后端分 ...