pig—WordCount analysis

 grunt> cat /opt/dataset/input.txt

	keyword1 keyword2

	keyword2 keyword4

	keyword3 keyword1

	keyword4 keyword4

 A = LOAD '/opt/dataset/input.txt' using PigStorage('\n')  as (line:chararray);

 B = foreach A generate TOKENIZE((chararray)$0);

 C = foreach B generate flatten($0) as word;

 D = group C by word;

 E = foreach D generate COUNT(C), group;

 dump B;

({(keyword1),(keyword2)})

({(keyword2),(keyword4)})

({(keyword3),(keyword1)})

({(keyword4),(keyword4)})

 dump C;

(keyword1)

(keyword2)

(keyword2)

(keyword4)

(keyword3)

(keyword1)

(keyword4)

(keyword4)

 dump D;

(keyword1,{(keyword1),(keyword1)})

(keyword2,{(keyword2),(keyword2)})

(keyword3,{(keyword3)})

(keyword4,{(keyword4),(keyword4),(keyword4)})

 dump E;

(2,keyword1)

(2,keyword2)

(1,keyword3)

(3,keyword4)

 store E into './wordcount';

TOKENIZE

Splits a string and outputs a bag of words.

Syntax

TOKENIZE(expression)       

Terms

expression

An expression with data type chararray.

Usage

Use the TOKENIZE function to split a string of words (all words in a single tuple) into a bag of words (each word in a single tuple). The following characters are considered to be word separators: space, double quote("), coma(,) parenthesis(()), star(*).

Example

In this example the strings in each row are split.

A  = LOAD 'data' AS (f1:chararray);

DUMP A;

(Here is the first string.)

(Here is the second string.)

(Here is the third string.)

X = FOREACH A GENERATE TOKENIZE(f1);

DUMP X;

({(Here),(is),(the),(first),(string.)})

({(Here),(is),(the),(second),(string.)})

({(Here),(is),(the),(third),(string.)})

pig—WordCount analysis的更多相关文章

WordCount Analysis
1.Create a new java project, then copy examples folder from /home/hadoop/hadoop-1.0.4/src; Create a ...
Hive Word count
--https://github.com/slimandslam/pig-hive-wordcount/blob/master/wordcount.hql DROP TABLE myinput; DR ...
Latent semantic analysis note(LSA)
1 LSA Introduction LSA(latent semantic analysis)潜在语义分析,也被称为LSI(latent semantic index),是Scott Deerwes ...
Hadoop:pig 安装及入门示例
pig是hadoop的一个子项目,用于简化MapReduce的开发工作,可以用更人性化的脚本方式分析数据. 一.安装 a) 下载从官网http://pig.apache.org下载最新版本(目前是0 ...
hadoop家族之pig入门
昨天成功运行第一个在hadoop集群上面的python版本的wordcount,今天白天继续看网上提供的文档.下午上头给定的回复是把hadoop家族都熟悉一下,那就恭敬不如从命,开始学习pig吧- 这 ...
pig对null的处理（实际，对空文本处理为两种取值null或‘’）
pig对文本null的处理非常特殊.会处理成两种null,还会处理成''这样的空值. 比方,读name,age,sex日志信息.name取值处理,假设记录为".,,"这样,会将na ...
软件质量与测试--第二周作业 WordCount
github地址: https://github.com/wzfhuster/software_test_tasks psp表格: PSP2.1 PSP 阶段预估耗时 (分钟) 实际耗时 (分钟) ...
软件质量与测试——WordCount编码实现及测试
1.GitHub地址 https://github.com/noblegongzi/WordCount 2.PSP表格 PSP2.1 PSP 阶段预估耗时 (分钟) 实际耗时 (分钟) ...
第二周个人作业WordCount
1.Github地址 https://github.com/JingzheWu/WordCount 2.PSP表格 PSP2.1 PSP阶段预估耗时 (分钟) 实际耗时 (分钟) Planning ...

随机推荐

vue 递归组件
如果你的项目里面的数据结构是一个树状的数据结构然后递归组件是一个很好的一个解决你这个数据结构的一个方式就是组件内部调用自身 tree.vue里面直接tree-node <tree-node& ...
java对象与xml相互转换工具类
public class XmlHelper { /** * Object转XML * * @param object * @return * @throws Exception */ public ...
nsis安装包_示例脚本语法解析
以下是代码及解析,其中有底色的部分为脚本内容. 注释.!define.变量.!include.常量 ; Script generated by the HM NIS Edit Script Wizar ...
Vue.js—快速入门及实现用户信息的增删
Vue.js是什么 Vue.js 是一套构建用户界面的渐进式框架.与其他重量级框架不同的是,Vue 采用自底向上增量开发的设计.Vue 的核心库只关注视图层,它不仅易于上手,还便于与第三方库或既有项目 ...
Exception in thread ""http-bio-80"exec-1" java.lang.OutOfMemoryError: PermGen s解决方案
问题描述: Exception in thread ""http-bio-80"-exec-1" java.lang.OutOfMemoryError: Per ...
xshell连接linux,切换焦点，自动执行ctrl+c
这几天发现 xshell 连接 linux 的时候,无缘无故的执行了 ctrl+c,导致执行界面终端,比方说 ,hbase shell 执行窗口命令 ,每次切换窗口焦点之后,就终止了.百度后发 ...
2018年全国多校算法寒假训练营练习比赛（第二场）F - 德玛西亚万岁
链接:https://www.nowcoder.com/acm/contest/74/F来源:牛客网题目描述德玛西亚是一个实力雄厚.奉公守法的国家,有着功勋卓著的光荣军史. 这里非常重视正义.荣耀 ...
Redis和MySQL数据一致中出现的几种情况
1. MySQL持久化数据,Redis只读数据 redis在启动之后,从数据库加载数据. 读请求: 不要求强一致性的读请求,走redis,要求强一致性的直接从mysql读取写请求: 数据首先都写到数 ...
[leetcode DP]120. Triangle
Given a triangle, find the minimum path sum from top to bottom. Each step you may move to adjacent n ...
CORS跨域请求[简单请求与复杂请求]
CORS即Cross Origin Resource Sharing(跨来源资源共享),通俗说就是我们所熟知的跨域请求.众所周知,在以前,跨域可以采用代理.JSONP等方式,而在Modern浏览器面前 ...

pig—WordCount analysis

pig—WordCount analysis的更多相关文章

随机推荐

热门专题