实现Hadoop的Writable接口Implementing Writable interface of Hadoop

As we saw in the previous posts, Hadoop makes an heavy use of network transmissions for executing its jobs. As Doug Cutting (the creator of Hadoop) explaines in this post on the Lucene mailing list, java.io.Serializable is too heavy for Hadoop's needs and so a new interface has been developed: Writable. Every object you need to emit from mapper to reducers or as an output has to implement this interface in order to make Hadoop trasmit the data from/to the nodes in the cluster.

Hadoop comes with several wrappers around primitive types and widely used classes in Java:

Java primitive	Writable implementation
boolean	BooleanWritable
byte	ByteWritable
short	ShortWritable
int	IntWritable VIntWritable
float	FloatWritable
long	LongWritable VLongWritable
double	DoubleWritable

Java class	Writable implementation
String	Text
byte[]	BytesWritable
Object	ObjectWritable
null	NullWritable

Java collection	Writable implementation
array	ArrayWritable ArrayPrimitiveWritable TwoDArrayWritable
Map	MapWritable
SortedMap	SortedMapWritable
enum	EnumWritable

For example, if we need a mapper to emit a String, we need to use a Text object wrapped around the string we want to emit.

The interface Writable defines two methods:

public void write(DataOutput dataOutput) throws IOException
public void readFields(DataInput dataInput) throws IOException

The first method, write() is used for writing the data onto the stream, while the second method, readFields(), is used for reading data from the stream. The wrappers we saw above just send and receive their binary representation over a stream.
Since Hadoop needs also to sort data while in the shuffle-and-sort phase, it needs also the Comparable interface to be implemented, so it defines the WritableComparable interface which is an interface that implements both Writable and Comparable.
If we need to emit a custom object which has no default wrapper, we need to create a class that implements the WritableComparable interface. In the mean example we saw on this post, we used the SumCount class, which is a class that implements WritableComparable (the source code is available on github):

public class SumCount implements WritableComparable<SumCount> {

    DoubleWritable sum;

    IntWritable count;

    public SumCount() {

        set(new DoubleWritable(0), new IntWritable(0));

    }

    public SumCount(Double sum, Integer count) {

        set(new DoubleWritable(sum), new IntWritable(count));

    }

    public void set(DoubleWritable sum, IntWritable count) {

        this.sum = sum;

        this.count = count;

    }

    public DoubleWritable getSum() {

        return sum;

    }

    public IntWritable getCount() {

        return count;

    }

    public void addSumCount(SumCount sumCount) {

        set(new DoubleWritable(this.sum.get() + sumCount.getSum().get()), new IntWritable(this.count.get() + sumCount.getCount().get()));

    }

    @Override

    public void write(DataOutput dataOutput) throws IOException {

        sum.write(dataOutput);

        count.write(dataOutput);

    }

    @Override

    public void readFields(DataInput dataInput) throws IOException {

        sum.readFields(dataInput);

        count.readFields(dataInput);

    }

    @Override

    public int compareTo(SumCount sumCount) {

        // compares the first of the two values

        int comparison = sum.compareTo(sumCount.sum);

         // if they're not equal, return the value of compareTo between the "sum" value

        if (comparison != 0) {

            return comparison;

        }

        // else return the value of compareTo between the "count" value

        return count.compareTo(sumCount.count);

    }

    @Override

    public boolean equals(Object o) {

        if (this == o) return true;

        if (o == null || getClass() != o.getClass()) return false;

        SumCount sumCount = (SumCount) o;

        return count.equals(sumCount.count) && sum.equals(sumCount.sum);

    }

    @Override

    public int hashCode() {

        int result = sum.hashCode();

        result = 31 * result + count.hashCode();

        return result;

    }

}

As we can see, it's very easy to code the two methods defined by the Writable interface: they just call the write() and readFields() method of the primitive types of the properties of SumCount class; it's important setting the properties in the same order in both read() and writeFields(), otherwise it will not work. The other methods of this class are the getters, setters and the methods needed by the Comparable interface, which should be nothing new to a Java developer.

from: http://andreaiacono.blogspot.com/2014/05/implementing-writable-interface-of.html

实现Hadoop的Writable接口Implementing Writable interface of Hadoop的更多相关文章

如何实现多个接口Implementing Multiple Interface
4.实现多个接口Implementing Multiple Interface 接口的优势:马克-to-win:类可以实现多个接口.与之相反,类只能继承一个超类(抽象类或其他类). A class c ...
Hadoop中序列化与Writable接口
学习笔记,整理自<Hadoop权威指南第3版> 一.序列化序列化:序列化是将内存中的结构化数据转化为能在网络上传输或磁盘中进行永久保存的二进制流的过程:反序列化:序列化的逆 ...
hadoop中的序列化与Writable接口
本文地址:http://www.cnblogs.com/archimedes/p/hadoop-writable-interface.html,转载请注明源地址. 简介序列化和反序列化就是结构化对象 ...
Hadoop序列化与Writable接口(一)
Hadoop序列化与Writable接口(一) 序列化序列化(serialization)是指将结构化的对象转化为字节流,以便在网络上传输或者写入到硬盘进行永久存储:相对的反序列化(deserial ...
Hadoop Serialization hadoop序列化详解(最新版) (1)【java和hadoop序列化比较和writable接口】
初学java的人肯定对java序列化记忆犹新.最开始很多人并不会一下子理解序列化的意义所在.这样子是因为很多人还是对java最底层的特性不是特别理解,当你经验丰富,对java理解更加深刻之后,你就会发 ...
eclipse 提交作业到JobTracker Hadoop的数据类型要求必须实现Writable接口
问:在eclipse中的写的代码如何提交作业到JobTracker中的哪?答:(1)在eclipse中调用的job.waitForCompletion(true)实际上执行如下方法 connect() ...
Hadoop基础-序列化与反序列化（实现Writable接口）
Hadoop基础-序列化与反序列化(实现Writable接口) 作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.序列化简介 1>.什么是序列化序列化也称串行化,是将结构化 ...
Hadoop序列化与Writable接口(二)
Hadoop序列化与Writable接口(二) 上一篇文章Hadoop序列化与Writable接口(一)介绍了Hadoop序列化,Hadoop Writable接口以及如何定制自己的Writable类 ...
hadoop学习第四天-Writable和WritableComparable序列化接口的使用&&MapReduce中传递javaBean的简单例子
一. 为什么javaBean要继承Writable和WritableComparable接口? 1. 如果一个javaBean想要作为MapReduce的key或者value,就一定要实现序列化,因为 ...

随机推荐

C#中 EF(EntityFramework) 性能优化
现在工作中很少使用原生的sql了,大多数的时候都在使用EF.刚开始的时候,只是在注重功能的实现,最近一段时间在做服务端接口开发.开发的时候也是像之前一样,键盘噼里啪啦的一顿敲,接口秒秒钟上线,但是到联 ...
mysql 触发器(Trigger)简明总结和使用实例
一,什么触发器 1,个人理解触发器,从字面来理解,一触即发的一个器,简称触发器(哈哈,个人理解),举个例子吧,好比天黑了,你开灯了,你看到东西了.你放炮仗,点燃了,一会就炸了.2,官方定义触发器(tr ...
Ubuntu16.04 下的网易云出现网络异常、无法播放，界面无响应问题的统一解决
能够在Linux系统下体验到原生界面的网易云音乐是件不错的事情,但是它总是经常性的出现网络异常,界面无响应的问题为了听歌的体验,进行深入探究: 首先通过终端启用网易云音乐:sudo netease- ...
Java 初相识
Java是如何出现的呢?这就要回到1991年,那时候随着单片机的发展,出现了很多微型的系统,Sun公司在这个时候就成立的一个项目组,成员就有我们熟知的“Java之父” 詹姆斯·高斯林,起初的目标是为了 ...
Ajax与传统Web开发的区别
基本概念 1.1,Ajax AJAX:即“Asynchronous Javascript And XML”(异步的JavaScript和XML),是指一种创建交互式网页应用的网页开发技术,尤其是在一种 ...
【SQL】181. Employees Earning More Than Their Managers
The Employee table holds all employees including their managers. Every employee has an Id, and there ...
[Luogu4724][模板]三维凸包(增量构造法)
1.向量点积同二维,x1y1+x2y2+x3y3.向量叉积是行列式形式,(y1z2-z1y2,z1x2-x1z2,x1y2-y1x2). 2.增量构造法: 1)首先定义,一个平面由三个点唯一确定.一个 ...
BZOJ.2916.[POI1997]Monochromatic Triangles(三元环)
题目链接 \(Description\) n个点的完全图,其中有m条边用红边相连,其余边为蓝色.求其中三边同色的三角形个数. \(Solution\) 直接求同色除了n^3 不会.. 三角形总数是C ...
撩课-Java每天5道面试题第13天
撩课Java+系统架构点击开始学习 96.JDBC操作数据库的步骤 ? .加载数据库驱动 .创建并获取数据库链接 .创建jdbc statement对象 .设置sql语句 .设置sql语句中的参数(使 ...
【洛谷】2607： [ZJOI2008]骑士【树形DP】【基环树】
P2607 [ZJOI2008]骑士题目描述 Z国的骑士团是一个很有势力的组织,帮会中汇聚了来自各地的精英.他们劫富济贫,惩恶扬善,受到社会各界的赞扬. 最近发生了一件可怕的事情,邪恶的Y国发动了一 ...

实现Hadoop的Writable接口Implementing Writable interface of Hadoop

实现Hadoop的Writable接口Implementing Writable interface of Hadoop的更多相关文章

随机推荐

热门专题