实现Hadoop的Writable接口Implementing Writable interface of Hadoop
As we saw in the previous posts, Hadoop makes an heavy use of network transmissions for executing its jobs. As Doug Cutting (the creator of Hadoop) explaines in this post on the Lucene mailing list, java.io.Serializable is too heavy for Hadoop's needs and so a new interface has been developed: Writable. Every object you need to emit from mapper to reducers or as an output has to implement this interface in order to make Hadoop trasmit the data from/to the nodes in the cluster.
Hadoop comes with several wrappers around primitive types and widely used classes in Java:
| Java primitive | Writable implementation |
|---|---|
| boolean | BooleanWritable |
| byte | ByteWritable |
| short | ShortWritable |
| int | IntWritable VIntWritable |
| float | FloatWritable |
| long | LongWritable VLongWritable |
| double | DoubleWritable |
| Java class | Writable implementation |
|---|---|
| String | Text |
| byte[] | BytesWritable |
| Object | ObjectWritable |
| null | NullWritable |
| Java collection | Writable implementation |
|---|---|
| array | ArrayWritable ArrayPrimitiveWritable TwoDArrayWritable |
| Map | MapWritable |
| SortedMap | SortedMapWritable |
| enum | EnumWritable |
For example, if we need a mapper to emit a String, we need to use a Text object wrapped around the string we want to emit.
The interface Writable defines two methods:
- public void write(DataOutput dataOutput) throws IOException
- public void readFields(DataInput dataInput) throws IOException
The first method, write() is used for writing the data onto the stream, while the second method, readFields(), is used for reading data from the stream. The wrappers we saw above just send and receive their binary representation over a stream.
Since Hadoop needs also to sort data while in the shuffle-and-sort phase, it needs also the Comparable interface to be implemented, so it defines the WritableComparable interface which is an interface that implements both Writable and Comparable.
If we need to emit a custom object which has no default wrapper, we need to create a class that implements the WritableComparable interface. In the mean example we saw on this post, we used the SumCount class, which is a class that implements WritableComparable (the source code is available on github):
public class SumCount implements WritableComparable<SumCount> {
DoubleWritable sum;
IntWritable count;
public SumCount() {
set(new DoubleWritable(0), new IntWritable(0));
}
public SumCount(Double sum, Integer count) {
set(new DoubleWritable(sum), new IntWritable(count));
}
public void set(DoubleWritable sum, IntWritable count) {
this.sum = sum;
this.count = count;
}
public DoubleWritable getSum() {
return sum;
}
public IntWritable getCount() {
return count;
}
public void addSumCount(SumCount sumCount) {
set(new DoubleWritable(this.sum.get() + sumCount.getSum().get()), new IntWritable(this.count.get() + sumCount.getCount().get()));
}
@Override
public void write(DataOutput dataOutput) throws IOException {
sum.write(dataOutput);
count.write(dataOutput);
}
@Override
public void readFields(DataInput dataInput) throws IOException {
sum.readFields(dataInput);
count.readFields(dataInput);
}
@Override
public int compareTo(SumCount sumCount) {
// compares the first of the two values
int comparison = sum.compareTo(sumCount.sum);
// if they're not equal, return the value of compareTo between the "sum" value
if (comparison != 0) {
return comparison;
}
// else return the value of compareTo between the "count" value
return count.compareTo(sumCount.count);
}
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
SumCount sumCount = (SumCount) o;
return count.equals(sumCount.count) && sum.equals(sumCount.sum);
}
@Override
public int hashCode() {
int result = sum.hashCode();
result = 31 * result + count.hashCode();
return result;
}
}
As we can see, it's very easy to code the two methods defined by the Writable interface: they just call the write() and readFields() method of the primitive types of the properties of SumCount class; it's important setting the properties in the same order in both read() and writeFields(), otherwise it will not work. The other methods of this class are the getters, setters and the methods needed by the Comparable interface, which should be nothing new to a Java developer.
from: http://andreaiacono.blogspot.com/2014/05/implementing-writable-interface-of.html
实现Hadoop的Writable接口Implementing Writable interface of Hadoop的更多相关文章
- 如何实现多个接口Implementing Multiple Interface
4.实现多个接口Implementing Multiple Interface 接口的优势:马克-to-win:类可以实现多个接口.与之相反,类只能继承一个超类(抽象类或其他类). A class c ...
- Hadoop中序列化与Writable接口
学习笔记,整理自<Hadoop权威指南 第3版> 一.序列化 序列化:序列化是将 内存 中的结构化数据 转化为 能在网络上传输 或 磁盘中进行永久保存的二进制流的过程:反序列化:序列化的逆 ...
- hadoop中的序列化与Writable接口
本文地址:http://www.cnblogs.com/archimedes/p/hadoop-writable-interface.html,转载请注明源地址. 简介 序列化和反序列化就是结构化对象 ...
- Hadoop序列化与Writable接口(一)
Hadoop序列化与Writable接口(一) 序列化 序列化(serialization)是指将结构化的对象转化为字节流,以便在网络上传输或者写入到硬盘进行永久存储:相对的反序列化(deserial ...
- Hadoop Serialization hadoop序列化详解(最新版) (1)【java和hadoop序列化比较和writable接口】
初学java的人肯定对java序列化记忆犹新.最开始很多人并不会一下子理解序列化的意义所在.这样子是因为很多人还是对java最底层的特性不是特别理解,当你经验丰富,对java理解更加深刻之后,你就会发 ...
- eclipse 提交作业到JobTracker Hadoop的数据类型要求必须实现Writable接口
问:在eclipse中的写的代码如何提交作业到JobTracker中的哪?答:(1)在eclipse中调用的job.waitForCompletion(true)实际上执行如下方法 connect() ...
- Hadoop基础-序列化与反序列化(实现Writable接口)
Hadoop基础-序列化与反序列化(实现Writable接口) 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.序列化简介 1>.什么是序列化 序列化也称串行化,是将结构化 ...
- Hadoop序列化与Writable接口(二)
Hadoop序列化与Writable接口(二) 上一篇文章Hadoop序列化与Writable接口(一)介绍了Hadoop序列化,Hadoop Writable接口以及如何定制自己的Writable类 ...
- hadoop学习第四天-Writable和WritableComparable序列化接口的使用&&MapReduce中传递javaBean的简单例子
一. 为什么javaBean要继承Writable和WritableComparable接口? 1. 如果一个javaBean想要作为MapReduce的key或者value,就一定要实现序列化,因为 ...
随机推荐
- Linux 的软件安装目录
Linux 的软件安装目录是也是有讲究的,理解这一点,在对系统管理是有益的 /usr:系统级的目录,可以理解为C:/Windows/,/usr/lib理解为C:/Windows/System32. / ...
- chrome浏览器视频插件
以前安装chrome浏览器flash插件是将libflashplayer.so拷贝到chrome浏览器的plugins目录下.但最近好像不行了. 于是换了另一插件:pepperflashplugin- ...
- poj2492 A Bug's Life(带权并查集)
题目链接 http://poj.org/problem?id=2492 题意 虫子有两种性别,有n只虫子,编号1~n,输入m组数据,每组数据包含a.b两只虫子,表示a.b为不同性别的虫子,根据输入的m ...
- STM32 串口通信
1. 中断说明 TXE(Tansmit Data Register empty interrupt) - 发送数据寄存器空,产生中断.当使能TXE后,只要Tx DR空了,就会产生中断.---写寄存器D ...
- Python中 *args 和 **kwargs 的区别
先来看个例子: def foo(*args, **kwargs): print 'args = ', args print 'kwargs = ', kwargs print '----------- ...
- [BZOJ3560]DZY Loves Math V(欧拉函数)
https://www.cnblogs.com/zwfymqz/p/9332753.html 由于欧拉函数是积性函数,可以用乘法分配律变成对每个质因子分开算最后乘起来.再由欧拉函数公式和分配律发现就是 ...
- BZOJ2673 [Wf2011]Chips Challenge 费用流 zkw费用流 网络流
https://darkbzoj.cf/problem/2673 有一个芯片,芯片上有N*N(1≤N≤40)个插槽,可以在里面装零件. 有些插槽不能装零件,有些插槽必须装零件,剩下的插槽随意. 要求装 ...
- hdu 4745 区间dp
题意:求一个环的最长回文序列,是序列不是串 链接:点我 起点是可以任意的, 所以只要求出每个区间的最长回文序列之后取max(dp[1][i]+dp[i+1][n]),即可得最终答案 本来是想扩展两倍的 ...
- zookeeper【5】分布式锁
我们常说的锁是单进程多线程锁,在多线程并发编程中,用于线程之间的数据同步,保护共享资源的访问.而分布式锁,指在分布式环境下,保护跨进程.跨主机.跨网络的共享资源,实现互斥访问,保证一致性. 架构图: ...
- bzoj 1176: [Balkan2007]Mokia&&2683: 简单题 -- cdq分治
2683: 简单题 Time Limit: 50 Sec Memory Limit: 128 MB Description 你有一个N*N的棋盘,每个格子内有一个整数,初始时的时候全部为0,现在需要 ...