Scala开发Hadoop示例

import org.apache.hadoop.conf.{Configuration, Configured};

import org.apache.hadoop.util.{ToolRunner, Tool};

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.io.{LongWritable, Text, IntWritable};

import org.apache.hadoop.mapreduce.{Reducer, Mapper, Job};

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

/**

 * Created with IntelliJ IDEA.

 * User: riley

 * Date: 8/26/13

 * Time: 1:58 PM

 */

object WordCount extends Configured with Tool

{

    class Map extends Mapper[LongWritable, Text, Text, IntWritable]

    {

        private val one: IntWritable = new IntWritable(1);

        private var word: Text;

        override def map(key: LongWritable, rowLine: Text, context: Mapper[LongWritable, Text, Text, IntWritable]#Context)

        {

            val line = rowLine.toString();

            if (line.isEmpty) return;

            val tokens: Array[String] = line.split(" ");

            for (item: String <- tokens) {

                word.set(item);

                context.write(word, one);

            }

        }

    }

    class Reduce extends Reducer[Text, IntWritable, Text, IntWritable]

    {

        private var count: IntWritable = new IntWritable();

        override def reduce(key: Text, values: Iterable[IntWritable], context: Reducer[Text, IntWritable, Text, IntWritable]#Context)

        {

            var sum: Int = 0;

            for (i: IntWritable <- values) sum = sum + i.get();

            count.set(sum);

            context.write(key, count);

        }

    }

    def run(args: Array[String]) =

    {

        val conf = super.getConf();

        val job = new Job(conf, "WordCount");

        job.setJarByClass(this.getClass);

        job.setOutputKeyClass(classOf[Text]);

        job.setOutputValueClass(classOf[IntWritable]);

        job.setMapperClass(classOf[Map]);

        job.setReducerClass(classOf[Reduce]);

        job.setCombinerClass(classOf[Reduce]);

        FileInputFormat.addInputPath(job, new Path(args(0)));

        FileOutputFormat.setOutputPath(job, new Path(args(1)));

        val status = job.waitForCompletion(true);

        if (status) 0 else 1;

    }

    def main(args: Array[String])

    {

        val conf: Configuration = new Configuration();

        System.exit(ToolRunner.run(conf, this, args));

    }

}

Scala开发Hadoop示例的更多相关文章

使用scala开发spark入门总结
使用scala开发spark入门总结一.spark简单介绍关于spark的介绍网上有很多,可以自行百度和google,这里只做简单介绍.推荐简单介绍连接:http://blog.jobbole.c ...
通过IDEA搭建scala开发环境开发spark应用程序
一.idea社区版安装scala插件因为idea默认不支持scala开发环境,所以当需要使用idea搭建scala开发环境时,首先需要安装scala插件,具体安装办法如下. 1.打开idea,点击c ...
IDEA搭建scala开发环境开发spark应用程序
通过IDEA搭建scala开发环境开发spark应用程序一.idea社区版安装scala插件因为idea默认不支持scala开发环境,所以当需要使用idea搭建scala开发环境时,首先需要安 ...
Scala系统学习(二)：Scala开发环境安装配置
Scala可以安装在任何基于UNIX/Linux或基于Windows的系统上.在您的机器上开始安装Scala之前,必须在计算机上安装Java 1.8或更高版本. 下面请按照以下步骤安装Scala. 步 ...
转】[1.0.2] 详解基于maven管理-scala开发的spark项目开发环境的搭建与测试
场景好的,假设项目数据调研与需求分析已接近尾声,马上进入Coding阶段了,辣么在Coding之前需要干马呢?是的,“统一开发工具.开发环境的搭建与本地测试.测试环境的搭建与测试” - 本文详细记录 ...
IDEA 支持scala开发
IDEA支持scala开发,需要安装scala插件,并且pom.xml也需要添加对应依赖. 1. 安装scala插件下载地址:https://plugins.jetbrains.com/plugin ...
Java开发Hbase示例
Java开发Hbase示例使用Hbase操作数据 package com.sunteng.clickidc.test; import java.io.IOException; import java ...
Linux下使用Eclipse开发Hadoop应用程序
在前面一篇文章中介绍了如果在完全分布式的环境下搭建Hadoop0.20.2,现在就再利用这个环境完成开发. 首先用hadoop这个用户登录linux系统(hadoop用户在前面一篇文章中创建的),然后 ...
Spark集群 + Akka + Kafka + Scala 开发(3) : 开发一个Akka + Spark的应用
前言在Spark集群 + Akka + Kafka + Scala 开发(1) : 配置开发环境中,我们已经部署好了一个Spark的开发环境. 在Spark集群 + Akka + Kafka + S ...

随机推荐

python 抓取网页一部分
import re import requests from bs4 import BeautifulSoup response = requests.get("https://jecvay ...
FastAdmin 如何升级？
FastAdmin 如何升级? 官方推荐使用 git 升级 FastAdmin. 升级 FastAdmin 核心代码 git stash git pull git stash pop 更新前端组件比 ...
bzoj 2178 圆的面积并——辛普森积分
题目:https://www.lydsy.com/JudgeOnline/problem.php?id=2178 把包含的圆去掉.横坐标不相交的一段一段圆分开算.算辛普森的时候预处理 f( ) ,比如 ...
J2EE项目在weblogic下的改动
1.struts所有配置文件放到classes根目录下 2〉java.lang.ClassCastException:weblogic.xml.jaxp.RegistryDocumentBuilder ...
从内存的角度观察堆、栈、全局区（静态区）（static）、文字常量区、程序代码区
之前写了一篇堆栈的,这里再补充下内存其他的区域 1.栈区(stack)— 由编译器自动分配释放 ,存放函数的参数值,局部变量的值等.其操作方式类似于数据结构中的栈. 2.堆区(heap) — 一般由程 ...
UCML 2.0 For ASP.NET开发平台简介
互联网时代,我们能跟上网络变革的步伐吗?我们的产品领先于竞争对手吗?我们能够满足日益个性化的客户需求吗? 采用新的软件开发方法是我们的首要选择. 第一个全面支持ASP.NET的应用框架开发平台诞生了— ...
初学FPGA一些建议
数字电路: 这是大学里的基本课程 ,涵盖了一般数字电路的组合电路.时序电路.寄存器传输.储存器以及可编程逻辑电路(FPGA 就是其中一种),还有比较好的添加了计算机的指令集结构.处理器设计等计算机方面 ...
Form 总结
禁止input自动完成下拉 //ie: autocomplete="off" //ff: disableautocomplete <input size="40&q ...
struts2学习(8)struts标签1（数据标签、控制标签）
一.struts2标签简介: struts标签很多,功能强大,这是优点: 但是缺点的话,性能方面可能会,各方面速度啊啥的会降低:有人比较测试,struts性能比jstl低很多: 二.struts2 ...
020：Buffer Pool 、压缩页、CheckPoint、Double Write、Change Buffer
一. 缓冲池(Buffer Pool) 1.1 缓冲池介绍每次读写数据都是通过 Buffer Pool : 当Buffer Pool 中没有用户所需要的数据时,才去硬盘中获取: 通过 innodb_ ...

Scala开发Hadoop示例

Scala开发Hadoop示例的更多相关文章

随机推荐

热门专题