概述：

Pig的安装很简单，注意一下几点：

1、设置系统环境变量：

export PIG_HOME=.../pig-x.y.z

export PATH=$PATH:$PIG_HOME/bin

设置完成后使用pig -help进行验证一下。

2、两种mode：

local mode：访问本地文件系统，进入shell时使用命令：pig -x local

MapReduce mode：pig将查询翻译为MapReduce作业，然后在hadoop集群上执行。此时，进入shell时的命令为：pig -x mapreduce 或者pig

hadoop@master:/usr/local/hadoop/conf$ pig -x mapreduce

Warning: $HADOOP_HOME is deprecated.

2013-08-16 16:18:52,388 [main] INFO  org.apache.pig.Main - Apache Pig version 0.11.1 (r1459641) compiled Mar 22 2013, 02:13:53

2013-08-16 16:18:52,389 [main] INFO  org.apache.pig.Main - Logging error messages to: /usr/local/hadoop/conf/pig_1376641132384.log

2013-08-16 16:18:52,470 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /home/hadoop/.pigbootup not found

2013-08-16 16:18:52,760 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master:9000

2013-08-16 16:18:53,174 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: master:9001

注意：使用MapReduce模式需要设置hadoop的配置文件hadoop-env.sh，加入：

export PIG_CLASSPATH=$HADOOP_HOME/conf

示例一：

.../in/ncdc/micro-tab/sample.txt文件的内容为：

在pig的shell下执行下列命令：

grunt> -- max_temp.pig: Finds the maximum temperature by year

grunt> records = LOAD 'hdfs://master:9000/in/ncdc/micro-tab/sample.txt'--在不确定自己设置的默认路径是什么的情况下使用hdfs完整路径

>>   AS (year:chararray, temperature:int, quality:int);

grunt> filtered_records = FILTER records BY temperature != 9999 AND

>>   (quality == 0 OR quality == 1 OR quality == 4 OR quality == 5 OR quality == 9);

grunt> grouped_records = GROUP filtered_records BY year;

grunt> max_temp = FOREACH grouped_records GENERATE group,

>>   MAX(filtered_records.temperature);

grunt> DUMP max_temp;

pig同时提供ILLUSTRATE操作，以生成简洁明了的数据集。

grunt>ILLUSTRATE max_temp;

输出为：

示例二：

指南中关于注释的示例，在此处，略作修改，加入schema：

grunt> B = LOAD 'input/pig/join/B' AS (chararry,int);

grunt> A = LOAD 'input/pig/join/A' AS (int,chararry);

grunt> C = JOIN A BY $0, /* ignored */ B BY $1;

grunt> DESCRIBE C

C: {A::val_0: int,A::chararry: bytearray,B::chararry: bytearray,B::val_0: int}

grunt>  ILLUSTRATE C

输出为：

----------------------------------------------------

| A     | val_0:int      | chararry:bytearray      |

----------------------------------------------------

|       | 3              | Hat                     |

|       | 3              | Hat                     |

----------------------------------------------------

----------------------------------------------------

| B     | chararry:bytearray      | val_0:int      |

----------------------------------------------------

|       | Eve                     | 3              |

|       | Eve                     | 3              |

----------------------------------------------------

-----------------------------------------------------------------------------------------------------------

| C     | A::val_0:int      | A::chararry:bytearray      | B::chararry:bytearray      | B::val_0:int      |

-----------------------------------------------------------------------------------------------------------

|       | 3                 | Hat                        | Eve                        | 3                 |

|       | 3                 | Hat                        | Eve                        | 3                 |

|       | 3                 | Hat                        | Eve                        | 3                 |

|       | 3                 | Hat                        | Eve                        | 3                 |

-----------------------------------------------------------------------------------------------------------

注意：Pig Latin的大小写敏感性采用混合的规则，其中：

操作和命令是大小写无关；

别名和函数大小写敏感。

例如上例中：

grunt> describe c

2013-08-16 17:14:49,397 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1005: No plan for c to describe

Details at logfile: /usr/local/hadoop/conf/pig_1376641755235.log

grunt> describe C

C: {A::val_0: int,A::chararry: bytearray,B::chararry: bytearray,B::val_0: int}

grunt> DESCRIBE C

C: {A::val_0: int,A::chararry: bytearray,B::chararry: bytearray,B::val_0: int}

Hadoop: the definitive guide 第三版拾遗第十章之Pig的更多相关文章

Hadoop: the definitive guide 第三版拾遗第十二章之Hive初步
Hive简介 Hive是建立在 Hadoop 上的数据仓库基础构架.它提供了一系列的工具,可以用来进行数据提取转化加载(ETL),这是一种可以存储.查询和分析存储在 Hadoop 中的大规模数据的机制 ...
Hadoop: the definitive guide 第三版拾遗第十二章之Hive分区表、桶
Hive分区表在Hive Select查询中一般会扫描整个表内容,会消耗很多时间做没必要的工作.有时候只需要扫描表中关心的一部分数据,因此建表时引入了partition概念.分区表指的是在创建表时指 ...
Hadoop: the definitive guide 第三版拾遗第十三章之HBase起步
指南上这一章的开篇即提出:HBase是一个分布式的.面向列的开源数据库.如果需要实时的随机读/写超大规模数据集,HBase无疑是一个好的选择. 简介 HBase 是一个高可靠性.高性能.面向列.可伸缩 ...
Hadoop: the definitive guide 第三版拾遗第四章
第四章中提到了通过CompressionCodec对streams进行压缩和解压缩,并提供了示例程序: 输入:标准输入流输出:压缩后的标准输出流 // cc StreamCompressor A p ...
Hadoop – The Definitive Guide Examples,,IntelliJ
IntelliJ Project for Building Hadoop – The Definitive Guide Examples http://vichargrave.com/intellij ...
Hadoop: The Definitive Guide (3rd Edition)
chapter 1 解决计算能力不足的问题,不是去制造更大的计算机,而是用更多的计算机来解决问题. 我们生活在一个数据的时代.“大数据”的到来不仅仅是影响到那些科研和金融机构,对小型企业以及我们个人都 ...
《Hadoop权威指南》(Hadoop:The Definitive Guide) 气象数据集下载脚本
已过时,无法使用从网上找到一个脚本,修改了一下 #!/bin/bash CURRENT_DIR=$(cd `dirname $0`; pwd) [ -e $CURRENT_DIR/ncdc ] || ...
Introduction to Windows 8: The Definitive Guide for Developer
<Windows 8应用开发权威指南>介绍 Introduction to Windows 8: The Definitive Guide for Developer 一.封面设计要求及文 ...
MONGODB的内部构造 FROM 《MONGODB THE DEFINITIVE GUIDE》
今天下载了<MongoDB The Definitive Guide>电子版,浏览了里面的内容,还是挺丰富的.是官网文档实际应用方面的一个补充.和官方文档类似,介绍MongoDB的内部原理 ...

随机推荐

HDU 1849 Rabbit and Grass
题解:因为棋子可重叠,所以就等于取石子问题,即尼姆博弈,SG[i]=i,直接将输入数据异或即可. #include <cstdio> int main(){ int SG,n,a; whi ...
笔试题：金额转换，阿拉伯数字的金额转换成中国传统的形式如：（￥1011）－>（一千零一拾一元整）输出
收集这道题目原因是以前做过,但是实现的很麻烦,这次看到别人写的感觉简单易懂. 从一个pdf看到,出处就不贴了 = .= public class RenMingBi { private static ...
Qt控制台和带窗口的区别_mickelfeng_新浪博客
Qt控制台和带窗口的区别_mickelfeng_新浪博客 t控制台和带窗口的区别 (2012-04-30 10:50:53) 标签: 杂谈分类: C/C ...
CodeForces 150B- Quantity of Strings 推算..
假设 k = 5 , n>k , (1,2,3,4,5) -> 1=5,2=4,3任意 (2,3,4,5,6) -> 2=6,3=5,4任意...综合上面的可得出1=3=5,2 ...
js设置奇偶行数样式
$(document).ready(function () { odd = { "background": "none" }; //奇数样式 even = { ...
Html.raw(转帖)
Razor 在JS中嵌入后台变量 HTML 中定义全局变量 @{int CurrentUserId =ViewBag.CurrentUserId;} JS中取值方式var CurrentUserId ...
移除GridView中的重复项
1. The HTML Markup <div> <asp:GridView ID="GridView1" runat="server" Au ...
C#关键字列表
YII2 使用js
1.在 /backend/assets/ 中新建一个文件 CollectionAsset.php <?php /** * @link http://www.yiiframework.com/ * ...
FormView用法
功能描述: 学生可以对相应学校机构进行投诉建议. form表单 class SuggestForm(forms.Form): TYPE_CHOICES = ( (0, u'学校'), (1, u'学院 ...

Hadoop: the definitive guide 第三版 拾遗 第十章 之Pig

概述：

示例一：

示例二：

Hadoop: the definitive guide 第三版 拾遗 第十章 之Pig的更多相关文章

随机推荐

热门专题

Hadoop: the definitive guide 第三版拾遗第十章之Pig

Hadoop: the definitive guide 第三版拾遗第十章之Pig的更多相关文章