pig flatten

今天通过不断的尝试，终于知道这个flatten的用法了。其实吧，有时候关键是要test，才能充分理解解说。不过，同事给说的有点问题，误导了我。整的我一直没明白怎么回事。

这是官方的解释：

The FLATTEN operator looks like a UDF syntactically, but it is actually an operator that changes the structure of tuples and bags in a way that a UDF cannot. Flatten un-nests tuples as well as bags. The idea is the same, but the operation and result is different
for each type of structure.

For tuples, flatten substitutes the fields of a tuple in place of the tuple. For example, consider a relation that has a tuple of the form (a, (b, c)). The expression GENERATE $0, flatten($1), will cause that tuple to become (a, b, c).

For bags, the situation becomes more complicated. When we un-nest a bag, we create new tuples. If we have a relation that is made up of tuples of the form ({(b,c),(d,e)}) and we apply GENERATE flatten($0), we end up with two tuples (b,c) and (d,e). When we
remove a level of nesting in a bag, sometimes we cause a cross product to happen. For example, consider a relation that has a tuple of the form (a, {(b,c), (d,e)}), commonly produced by the GROUP operator. If we apply the expression GENERATE $0, flatten($1)
to this tuple, we will create new tuples: (a, b, c) and (a, d, e).

我试验下来也是这样的，我今天把第一种和第二种情况都尝试了，实验证明，即使是第二种，其实一次flatten就够了，就得到schema了。这样的数据，

Joe {(Joe,18,3.8)}

Bill {(Bill,20,3.9)}

John {(John,18,4.0)}

Mary {(Mary,19,3.8),(Mary,19,5.0)}

a = load 'result' as (f1:chararray,B: bag {T: tuple(t1:chararray, t2:int, t3:float)});

b = foreach a GENERATE FLATTEN(B) as (t1:chararray,t2:int,t3:float);

这个是可以一次性flatten的。但是更高的复杂度我每测试，应该是需要两次这种操作的吧。真是真是对bag, tuple也长了见识了。明天看看能否把数据传输到UDF中操作。

总结一句话，在不确定时要首先看官方文档，然后就先拿小数据测试一下，看看每一步得到的是什么结构describe,同时store后看看是什么结果，是否和自己想的一样。整体来说还是很清晰的。

pig flatten的更多相关文章

Pig Flatten 解包操作，解元组
Flatten Operator The FLATTEN operator looks like a UDF syntactically, but it is actually an operator ...
【Pig源码分析】谈谈Pig的数据模型
1. 数据模型 Schema Pig Latin表达式操作的是relation,FILTER.FOREACH.GROUP.SPLIT等关系操作符所操作的relation就是bag,bag为tuple的 ...
Pig + Ansj 统计中文文本词频
最近特别喜欢用Pig,拥有能满足大部分需求的内置函数(built-in functions),支持自定义函数(user defined functions, UDF),能load 纯文本.avro等格 ...
Hadoop:pig 安装及入门示例
pig是hadoop的一个子项目,用于简化MapReduce的开发工作,可以用更人性化的脚本方式分析数据. 一.安装 a) 下载从官网http://pig.apache.org下载最新版本(目前是0 ...
Pig用户自定义函数（UDF）转
原文地址:http://blog.csdn.net/zythy/article/details/18326693 我们以气温统计和词频统计为例,讲解以下三种用户自定义函数. 用户自定义函数什么时候需 ...
pig 介绍与pig版 hello world
前两天使用pig做ETL,粗浅的看了一下,没有系统地学习,感觉pig还是值得学习的,故又重新看programming pig. 以下是看的第一章的笔记: What is pig? Pig provid ...
xml in hadoop ETL with pig summary
项目中需要把source为xml的文件通过flume放置到hdfs,然后通过MR导入到vertica中去,我之前做过简单的尝试,是通过pig的piggybank的xmlloader然后Regex_e ...
使用Pig预测电信用户的移动路径
实战数据: 预期结果: 测试数据: 002|2014-09-10 00-09|东油大学 002|2014-09-10 09-17|学苑小区 001|2014-09-12 00-09|东油大学 001| ...
2014-08-05 pig
Pig的数据类型能够分为两种:一种是scalar类型,包含单一的value,一种是complex类型,包含有其他的类型. 对于scalar类型: int,long,float,double,chara ...

随机推荐

每天一个Linux命令（4）touch命令
touch命令有两个功能:一是用于把已存在文件的时间标签更新为系统当前的时间(默认方式),它们的数据将原封不动地保留下来:二是用来创建新的空文件. (1)用法用法:touch [选项]... ...
算法（Algorithms）第4版练习 1.5.6
对于weighted quick-union,对每个输入数据对,其最大的循环次数为lgN(sites) 故对于109 sites和106 input pairs,其总的指令次数为:sum = lg10 ...
vps 虚拟机云服务器
vps :wxmp 03服务器虚拟主机: 万网免费主机云服务器:wxmp阿里云
request bs4
requests Python标准库中提供了:urllib.urllib2.httplib等模块以供Http请求,但是,它的 API 太渣了.它是为另一个时代.另一个互联网所创建的.它需要巨量的工作, ...
java错误：The superclass "javax.servlet.http.HttpServlet" was not found on the Java Bu
我们在用Eclipse进行Java web开发时,可能会出现这样的错误:The superclass javax.servlet.http.HttpServlet was not found on t ...
Mysql备份和还原（命令）
1.备份方法一 ①.进入数据库 mysql -uroot -p pwd; ②.查看数据库 show databases; ③.备份数据库 mysqldump -hlocalhost -uroot(用户 ...
1.start
1. react-native init Helloworld // 创建 helloworld 工程 2. 进入 helloworld ->android, 运行 react-navite ...
Selenium-元素定位与操作
UI的自动化本质就是识别元素,操作元素,而元素的识别就是通过HTML的标签和属性,所以对于基本的HTML的只是是必备的随着页面复杂度的提升,加之很多公司的开发也没有统一规范,这就给识别元素造成了非常 ...
Mysql异常_01_ 誓死登进mysql_Can't connect to MySQL server on 'localhost' (10061)
现象:打开cmd,输入命令:mysql -uroot -p 回车之后,输入密码,结果进不去mysql,并且抛出异常异常:Can't connect to MySQL server on 'local ...
linux命令学习笔记（61）：tree 命令
shendu@shenlan:~$ tree 程序“tree”尚未安装. 您可以使用以下命令安装: sudo apt-get install tree shendu@shenlan:~$ sudo a ...

pig flatten

pig flatten的更多相关文章

随机推荐

热门专题