./bin/spark-submit ~/src_test/prefix_span_test.py

source code:

import os
import sys
from pyspark.mllib.fpm import PrefixSpan
from pyspark import SparkContext
from pyspark import SparkConf sc = SparkContext("local","testing")
print(sc)
data = [
[['a'],["a", "b", "c"], ["a","c"],["d"],["c", "f"]],
[["a","d"], ["c"],["b", "c"], ["a", "e"]],
[["e", "f"], ["a", "b"], ["d","f"],["c"],["b"]],
[["e"], ["g"],["a", "f"],["c"],["b"],["c"]]
]
rdd = sc.parallelize(data, 2)
model = PrefixSpan.train(rdd, 0.5,4)
result = sorted(model.freqSequences().collect())
print("*"*88)
print(result)
print("*"*88)

output:

****************************************************************************************
[FreqSequence(sequence=[['a']], freq=4), FreqSequence(sequence=[['a'], ['a']], freq=2), FreqSequence(sequence=[['a'], ['b']], freq=4), FreqSequence(sequence=[['a'], ['b'], ['a']], freq=2), FreqSequence(sequence=[['a'], ['b'], ['c']], freq=2), FreqSequence(sequence=[['a'], ['b', 'c']], freq=2), FreqSequence(sequence=[['a'], ['b', 'c'], ['a']], freq=2), FreqSequence(sequence=[['a'], ['c']], freq=4), FreqSequence(sequence=[['a'], ['c'], ['a']], freq=2), FreqSequence(sequence=[['a'], ['c'], ['b']], freq=3), FreqSequence(sequence=[['a'], ['c'], ['c']], freq=3), FreqSequence(sequence=[['a'], ['d']], freq=2), FreqSequence(sequence=[['a'], ['d'], ['c']], freq=2), FreqSequence(sequence=[['a'], ['f']], freq=2), FreqSequence(sequence=[['b']], freq=4), FreqSequence(sequence=[['b'], ['a']], freq=2), FreqSequence(sequence=[['b'], ['c']], freq=3), FreqSequence(sequence=[['b'], ['d']], freq=2), FreqSequence(sequence=[['b'], ['d'], ['c']], freq=2), FreqSequence(sequence=[['b'], ['f']], freq=2), FreqSequence(sequence=[['b', 'a']], freq=2), FreqSequence(sequence=[['b', 'a'], ['c']], freq=2), FreqSequence(sequence=[['b', 'a'], ['d']], freq=2), FreqSequence(sequence=[['b', 'a'], ['d'], ['c']], freq=2), FreqSequence(sequence=[['b', 'a'], ['f']], freq=2), FreqSequence(sequence=[['b', 'c']], freq=2), FreqSequence(sequence=[['b', 'c'], ['a']], freq=2), FreqSequence(sequence=[['c']], freq=4), FreqSequence(sequence=[['c'], ['a']], freq=2), FreqSequence(sequence=[['c'], ['b']], freq=3), FreqSequence(sequence=[['c'], ['c']], freq=3), FreqSequence(sequence=[['d']], freq=3), FreqSequence(sequence=[['d'], ['b']], freq=2), FreqSequence(sequence=[['d'], ['c']], freq=3), FreqSequence(sequence=[['d'], ['c'], ['b']], freq=2), FreqSequence(sequence=[['e']], freq=3), FreqSequence(sequence=[['e'], ['a']], freq=2), FreqSequence(sequence=[['e'], ['a'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['a'], ['c']], freq=2), FreqSequence(sequence=[['e'], ['a'], ['c'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['b'], ['c']], freq=2), FreqSequence(sequence=[['e'], ['c']], freq=2), FreqSequence(sequence=[['e'], ['c'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['f']], freq=2), FreqSequence(sequence=[['e'], ['f'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['f'], ['c']], freq=2), FreqSequence(sequence=[['e'], ['f'], ['c'], ['b']], freq=2), FreqSequence(sequence=[['f']], freq=3), FreqSequence(sequence=[['f'], ['b']], freq=2), FreqSequence(sequence=[['f'], ['b'], ['c']], freq=2), FreqSequence(sequence=[['f'], ['c']], freq=2), FreqSequence(sequence=[['f'], ['c'], ['b']], freq=2)]
****************************************************************************************

spark mllib prefixspan demo的更多相关文章

  1. 在Java Web中使用Spark MLlib训练的模型

    PMML是一种通用的配置文件,只要遵循标准的配置文件,就可以在Spark中训练机器学习模型,然后再web接口端去使用.目前应用最广的就是基于Jpmml来加载模型在javaweb中应用,这样就可以实现跨 ...

  2. 十二、spark MLlib的scala示例

    简介 spark MLlib官网:http://spark.apache.org/docs/latest/ml-guide.html mllib是spark core之上的算法库,包含了丰富的机器学习 ...

  3. Spark MLlib + maven + scala 试水~

    使用SGD算法逻辑回归的垃圾邮件分类器 package com.oreilly.learningsparkexamples.scala import org.apache.spark.{SparkCo ...

  4. Spark MLlib之线性回归源代码分析

    1.理论基础 线性回归(Linear Regression)问题属于监督学习(Supervised Learning)范畴,又称分类(Classification)或归纳学习(Inductive Le ...

  5. spark mllib docs,MLlib: RDD-based API

    MLlib: RDD-based API This page documents sections of the MLlib guide for the RDD-based API (the spar ...

  6. spark mllib lda 中文分词、主题聚合基本样例

    github https://github.com/cclient/spark-lda-example spark mllib lda example 官方示例较为精简 在官方lda示例的基础上,给合 ...

  7. Spark MLlib中KMeans聚类算法的解析和应用

    聚类算法是机器学习中的一种无监督学习算法,它在数据科学领域应用场景很广泛,比如基于用户购买行为.兴趣等来构建推荐系统. 核心思想可以理解为,在给定的数据集中(数据集中的每个元素有可被观察的n个属性), ...

  8. Spark MLlib - LFW

    val path = "/usr/data/lfw-a/*" val rdd = sc.wholeTextFiles(path) val first = rdd.first pri ...

  9. 《Spark MLlib机器学习实践》内容简介、目录

      http://product.dangdang.com/23829918.html Spark作为新兴的.应用范围最为广泛的大数据处理开源框架引起了广泛的关注,它吸引了大量程序设计和开发人员进行相 ...

随机推荐

  1. SynchronousQueueDemo

    1.ArrayDeque, (数组双端队列) 2.PriorityQueue, (优先级队列) 3.ConcurrentLinkedQueue, (基于链表的并发队列) 4.DelayQueue, ( ...

  2. C博客作业03--函数

    1. 本章学习总结 1.1 思维导图 1.2 本章学习体会及代码量学习体会 1.2.1 学习体会 这几周学习了函数,题目还是原样只是多了种做题的方法.一开始看书感觉声明,定义啊,还有全局变量那些,文绉 ...

  3. vue路由详解

    自己看vue文档,难免有些地方不懂,自己整理一下,主要是vue-router具体实现的操作步骤. 安装 直接下载/CDN https://unpkg.com/vue-router/dist/vue-r ...

  4. Haystack全文检索

    1.什么是Haystack Haystack是django的开源全文搜索框架(全文检索不同于特定字段的模糊查询,使用全文检索的效率更高 ),该框架支持Solr,Elasticsearch(java写的 ...

  5. FL Studio里的常规设置介绍

    上期我们介绍了FL Studio中的项目设置,今天我们来介绍FL Studio中的常规设置.要打开常规设置,我们需要在主菜单中选择选项>常规选项,当然也可以直接按快捷键F10. “常规设置”页面 ...

  6. ARDUINO入门按键通信试验

    1.1按键实验 1.需要学习的知识: 1) Arduino 的输入口配置方法,配置函数的用法 通过pinMode()函数,可以将ADUINO的引脚配置(INPUT)输入模式 2) 搞懂什么是抖动 机械 ...

  7. 我应该如何在Pycharm中去运行别人的Django项目

    django数据库迁移,本地运行 前言: 从网络上下载好django项目后,在本地用pycharm导入后,并不能运行.此时我们需要添加库和创建数据库. 零:这里是一个基于django写的小项目,可以作 ...

  8. C# 利用反射动态给模型Model 赋值

    https://www.cnblogs.com/waitingfor/articles/2220669.html object ff = Activator.CreateInstance(tt, nu ...

  9. Linux 搭建批量网络装机

  10. oracle 游标分析与理解(基础)

    --------------坚持写一点 慢慢成长 希望对大家有所帮助(小白的理解)  也是自己学习后的理解(只是一小部分,需要更深沉的还需日后成长) 接下来就是我们的重点 --游标 提供了一种对从表中 ...