./bin/spark-submit ~/src_test/prefix_span_test.py

source code:

import os
import sys
from pyspark.mllib.fpm import PrefixSpan
from pyspark import SparkContext
from pyspark import SparkConf sc = SparkContext("local","testing")
print(sc)
data = [
[['a'],["a", "b", "c"], ["a","c"],["d"],["c", "f"]],
[["a","d"], ["c"],["b", "c"], ["a", "e"]],
[["e", "f"], ["a", "b"], ["d","f"],["c"],["b"]],
[["e"], ["g"],["a", "f"],["c"],["b"],["c"]]
]
rdd = sc.parallelize(data, 2)
model = PrefixSpan.train(rdd, 0.5,4)
result = sorted(model.freqSequences().collect())
print("*"*88)
print(result)
print("*"*88)

output:

****************************************************************************************
[FreqSequence(sequence=[['a']], freq=4), FreqSequence(sequence=[['a'], ['a']], freq=2), FreqSequence(sequence=[['a'], ['b']], freq=4), FreqSequence(sequence=[['a'], ['b'], ['a']], freq=2), FreqSequence(sequence=[['a'], ['b'], ['c']], freq=2), FreqSequence(sequence=[['a'], ['b', 'c']], freq=2), FreqSequence(sequence=[['a'], ['b', 'c'], ['a']], freq=2), FreqSequence(sequence=[['a'], ['c']], freq=4), FreqSequence(sequence=[['a'], ['c'], ['a']], freq=2), FreqSequence(sequence=[['a'], ['c'], ['b']], freq=3), FreqSequence(sequence=[['a'], ['c'], ['c']], freq=3), FreqSequence(sequence=[['a'], ['d']], freq=2), FreqSequence(sequence=[['a'], ['d'], ['c']], freq=2), FreqSequence(sequence=[['a'], ['f']], freq=2), FreqSequence(sequence=[['b']], freq=4), FreqSequence(sequence=[['b'], ['a']], freq=2), FreqSequence(sequence=[['b'], ['c']], freq=3), FreqSequence(sequence=[['b'], ['d']], freq=2), FreqSequence(sequence=[['b'], ['d'], ['c']], freq=2), FreqSequence(sequence=[['b'], ['f']], freq=2), FreqSequence(sequence=[['b', 'a']], freq=2), FreqSequence(sequence=[['b', 'a'], ['c']], freq=2), FreqSequence(sequence=[['b', 'a'], ['d']], freq=2), FreqSequence(sequence=[['b', 'a'], ['d'], ['c']], freq=2), FreqSequence(sequence=[['b', 'a'], ['f']], freq=2), FreqSequence(sequence=[['b', 'c']], freq=2), FreqSequence(sequence=[['b', 'c'], ['a']], freq=2), FreqSequence(sequence=[['c']], freq=4), FreqSequence(sequence=[['c'], ['a']], freq=2), FreqSequence(sequence=[['c'], ['b']], freq=3), FreqSequence(sequence=[['c'], ['c']], freq=3), FreqSequence(sequence=[['d']], freq=3), FreqSequence(sequence=[['d'], ['b']], freq=2), FreqSequence(sequence=[['d'], ['c']], freq=3), FreqSequence(sequence=[['d'], ['c'], ['b']], freq=2), FreqSequence(sequence=[['e']], freq=3), FreqSequence(sequence=[['e'], ['a']], freq=2), FreqSequence(sequence=[['e'], ['a'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['a'], ['c']], freq=2), FreqSequence(sequence=[['e'], ['a'], ['c'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['b'], ['c']], freq=2), FreqSequence(sequence=[['e'], ['c']], freq=2), FreqSequence(sequence=[['e'], ['c'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['f']], freq=2), FreqSequence(sequence=[['e'], ['f'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['f'], ['c']], freq=2), FreqSequence(sequence=[['e'], ['f'], ['c'], ['b']], freq=2), FreqSequence(sequence=[['f']], freq=3), FreqSequence(sequence=[['f'], ['b']], freq=2), FreqSequence(sequence=[['f'], ['b'], ['c']], freq=2), FreqSequence(sequence=[['f'], ['c']], freq=2), FreqSequence(sequence=[['f'], ['c'], ['b']], freq=2)]
****************************************************************************************

spark mllib prefixspan demo的更多相关文章

  1. 在Java Web中使用Spark MLlib训练的模型

    PMML是一种通用的配置文件,只要遵循标准的配置文件,就可以在Spark中训练机器学习模型,然后再web接口端去使用.目前应用最广的就是基于Jpmml来加载模型在javaweb中应用,这样就可以实现跨 ...

  2. 十二、spark MLlib的scala示例

    简介 spark MLlib官网:http://spark.apache.org/docs/latest/ml-guide.html mllib是spark core之上的算法库,包含了丰富的机器学习 ...

  3. Spark MLlib + maven + scala 试水~

    使用SGD算法逻辑回归的垃圾邮件分类器 package com.oreilly.learningsparkexamples.scala import org.apache.spark.{SparkCo ...

  4. Spark MLlib之线性回归源代码分析

    1.理论基础 线性回归(Linear Regression)问题属于监督学习(Supervised Learning)范畴,又称分类(Classification)或归纳学习(Inductive Le ...

  5. spark mllib docs,MLlib: RDD-based API

    MLlib: RDD-based API This page documents sections of the MLlib guide for the RDD-based API (the spar ...

  6. spark mllib lda 中文分词、主题聚合基本样例

    github https://github.com/cclient/spark-lda-example spark mllib lda example 官方示例较为精简 在官方lda示例的基础上,给合 ...

  7. Spark MLlib中KMeans聚类算法的解析和应用

    聚类算法是机器学习中的一种无监督学习算法,它在数据科学领域应用场景很广泛,比如基于用户购买行为.兴趣等来构建推荐系统. 核心思想可以理解为,在给定的数据集中(数据集中的每个元素有可被观察的n个属性), ...

  8. Spark MLlib - LFW

    val path = "/usr/data/lfw-a/*" val rdd = sc.wholeTextFiles(path) val first = rdd.first pri ...

  9. 《Spark MLlib机器学习实践》内容简介、目录

      http://product.dangdang.com/23829918.html Spark作为新兴的.应用范围最为广泛的大数据处理开源框架引起了广泛的关注,它吸引了大量程序设计和开发人员进行相 ...

随机推荐

  1. sql语句中select……as的用法

  2. Python基础_列表 list

    列表是Python的一种基础数据类型,可以进行的操作包括索引,切片,加,乘,检查成员 列表定义: list(列表.数组) eg:stus=['lisi','jion','peter'] #下标:即角标 ...

  3. angular2 ----字符串、对象、base64 之间的转换

    1. JSON对象转化为字符串 let obj = { "name":Ayinger; "sex":"女"; } let str = JSO ...

  4. Windows下安装Redis服务

    说明:本文拷贝自https://jingyan.baidu.com/article/0f5fb099045b056d8334ea97.html Redis是有名的NoSql数据库,一般Linux都会默 ...

  5. 火狐开发----如何快速的安装火狐XPI文件

    第一步:火狐的自动安装扩展程序,https://addons.mozilla.org/zh-CN/firefox/addon/autoinstaller/ 第二步:安装wget工具,这个Linux应该 ...

  6. Linux网络属性管理

    Linux网络属性管理 局域网:以太网,令牌环网 Ethernet: CSMA/CD 冲突域 广播域 MAC:Media Access Control 48bits: 24bits: 24bits: ...

  7. bash 基础命令

    bash的基础特性(): () 命令历史 history 环境变量: HISTSIZE:命令历史记录的条数: HISTFILE:~/.bash_history: HISTFILESIZE:命令历史文件 ...

  8. 编程类-----matlab基础语法复习(2)

    2019年美赛准备:matlab基本题目运算 clear,clc %% 计算1/3 + 2/5 + ...3/7 +10/21 % i = 1; j = 3; ans = 0; % while i & ...

  9. JS(JavaScript)的初了解6(更新中···)

    Js数据类型具体分析 基础类型:  string  number   boolean   null  undefined 引用类型:  object ==>  json  array  等 复习 ...

  10. Qt setstylesheet指定窗口

    #窗口名称{ ...} 在窗口名称前加#号可以指定某个窗口设置stylesheet而不影响子窗口.子控件,可以用于设置边框,不影响子控件产生一样的边框.