1、POJO方式

public class WordCountPojo {
public static class Word{
private String word;
private int frequency; public Word() {
} public Word(String word, int frequency) {
this.word = word;
this.frequency = frequency;
} public String getWord() {
return word;
} public void setWord(String word) {
this.word = word;
} public int getFrequency() {
return frequency;
} public void setFrequency(int frequency) {
this.frequency = frequency;
} @Override
public String toString() {
return "Word=" + word + " freq=" + frequency;
}
} /**
* Implements the string tokenizer that splits sentences into words as a user-defined
* FlatMapFunction. The function takes a line (String) and splits it into
* multiple Word objects.
*/
public static final class Tokenizer implements FlatMapFunction<String, Word> { @Override
public void flatMap(String value, Collector<Word> out) {
// normalize and split the line
String[] tokens = value.toLowerCase().split("\\W+"); // emit the pairs
for (String token : tokens) {
if (token.length() > 0) {
out.collect(new Word(token, 1));
}
}
}
} public static void main(String args[]) throws Exception {
final ParameterTool params = ParameterTool.fromArgs(args); // set up the execution environment
final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); // make parameters available in the web interface
env.getConfig().setGlobalJobParameters(params); // get input data
DataSet<String> text;
if (params.has("input")) {
// read the text file from given input path
text = env.readTextFile(params.get("input"));
} else {
// get default test text data
System.out.println("Executing WordCount example with default input data set.");
System.out.println("Use --input to specify file input.");
text = WordCountData.getDefaultTextLineDataSet(env);
} DataSet<Word> counts = text
// split up the lines into Word objects (with frequency = 1)
.flatMap(new Tokenizer())
// group by the field word and sum up the frequency
.groupBy("word")
.reduce(new ReduceFunction<Word>() {
@Override
public Word reduce(Word value1, Word value2) throws Exception {
return new Word(value1.word, value1.frequency + value2.frequency);
}
});
if (params.has("output")) {
counts.writeAsText(params.get("output"), FileSystem.WriteMode.OVERWRITE);
// execute program
env.execute("WordCount-Pojo Example");
} else {
System.out.println("Printing result to stdout. Use --output to specify output path.");
counts.print();
}
} }

2、元组方式

public class WordCount {

    /**
* Implements the string tokenizer that splits sentences into words as a user-defined
* FlatMapFunction. The function takes a line (String) and splits it into
* multiple pairs in the form of "(word,1)" ({@code Tuple2<String, Integer>}).
*/
public static final class Tokenizer implements FlatMapFunction<String, Tuple2<String, Integer>> {
@Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) throws Exception {
// normalize and split the line
String[] tokens = value.toLowerCase().split("\\W+"); // emit the pairs
for (String token : tokens) {
if (token.length() > 0) {
out.collect(new Tuple2<>(token, 1));
}
}
}
} public static void main(String args[]) throws Exception {
final ParameterTool params = ParameterTool.fromArgs(args); // set up the execution environment
final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); // make parameters available in the web interface
env.getConfig().setGlobalJobParameters(params); // get input data
DataSet<String> text;
if (params.has("input")) {
// read the text file from given input path
text = env.readTextFile(params.get("input"));
} else {
// get default test text data
System.out.println("Executing WordCount example with default input data set.");
System.out.println("Use --input to specify file input.");
text = WordCountData.getDefaultTextLineDataSet(env);
} DataSet<Tuple2<String,Integer>> counts = text
// split up the lines in pairs (2-tuples) containing: (word,1)
.flatMap(new Tokenizer())
// group by the tuple field "0" and sum up tuple field "1"
.groupBy(0)
.reduce(new ReduceFunction<Tuple2<String, Integer>>() {
@Override
public Tuple2<String, Integer> reduce(Tuple2<String, Integer> value1, Tuple2<String, Integer> value2) throws Exception {
return new Tuple2<>(value1.f0,value1.f1+value2.f1);
}
}); //等效于sum(1)
// .sum(1);
// emit result
if(params.has("output")){
counts.writeAsCsv(params.get("output"),"\n"," ");
// execute program
env.execute("WordCount batch");
}else {
System.out.println("Printing result to stdout. Use --output to specify output path.");
counts.print();
} }
}

flink batch wordcount的更多相关文章

  1. Flink Batch SQL 1.10 实践

    Flink作为流批统一的计算框架,在1.10中完成了大量batch相关的增强与改进.1.10可以说是第一个成熟的生产可用的Flink Batch SQL版本,它一扫之前Dataset的羸弱,从功能和性 ...

  2. Flink实例-Wordcount详细步骤

    link实例之Wordcount详细步骤 1.我的IDE是IntelliJ IDEA.在官网上https://www.jetbrains.com/idea/下载最新版2018.2的IDEA,如下图.破 ...

  3. Apache Flink - Batch(DataSet API)

    Flink DataSet API编程指南: Flink中的DataSet程序是实现数据集转换的常规程序(例如,过滤,映射,连接,分组).数据集最初是从某些来源创建的(例如,通过读取文件或从本地集合创 ...

  4. [Flink]Flink1.6三种运行模式安装部署以及实现WordCount

    前言 Flink三种运行方式:Local.Standalone.On Yarn.成功部署后分别用Scala和Java实现wordcount 环境 版本:Flink 1.6.2 集群环境:Hadoop2 ...

  5. 【Flink】Flink基础之WordCount实例(Java与Scala版本)

    简述 WordCount(单词计数)作为大数据体系的标准示例,一直是入门的经典案例,下面用java和scala实现Flink的WordCount代码: 采用IDEA + Maven + Flink 环 ...

  6. hadoop记录-[Flink]Flink三种运行模式安装部署以及实现WordCount(转载)

    [Flink]Flink三种运行模式安装部署以及实现WordCount 前言 Flink三种运行方式:Local.Standalone.On Yarn.成功部署后分别用Scala和Java实现word ...

  7. Apache Flink Quickstart

    Apache Flink 是新一代的基于 Kappa 架构的流处理框架,近期底层部署结构基于 FLIP-6 做了大规模的调整,我们来看一下在新的版本(1.6-SNAPSHOT)下怎样从源码快速编译执行 ...

  8. Apache 流框架 Flink,Spark Streaming,Storm对比分析(一)

    本文由  网易云发布. 1.Flink架构及特性分析 Flink是个相当早的项目,开始于2008年,但只在最近才得到注意.Flink是原生的流处理系统,提供high level的API.Flink也提 ...

  9. Flink的高可用集群环境

    Flink的高可用集群环境 Flink简介 Flink核心是一个流式的数据流执行引擎,其针对数据流的分布式计算提供了数据分布,数据通信以及容错机制等功能. 因现在主要Flink这一块做先关方面的学习, ...

随机推荐

  1. 开放接口的安全验证方案(AES+RSA)

    http://wubaoguo.com/2015/08/21/%E5%BC%80%E6%94%BE%E6%8E%A5%E5%8F%A3%E7%9A%84%E5%AE%89%E5%85%A8%E9%AA ...

  2. opencv使用cv::Mat_和push_back

    cv::Mat left_image; right_image.push_back(cv::Mat((cv::Mat_<float>(1, 3) << ori.x, ori.y ...

  3. Access-Control-Allow-Origin 响应一个携带身份信息(Credential)的HTTP请求时,必需指定具体的域,不能用通配符

    https://www.cnblogs.com/raind/p/10771778.html Access-Control-Allow-Origin.HTTP响应头,指定服务器端允许进行跨域资源访问的来 ...

  4. 史上最全NOIP初赛知识点

    CSP-J/S 第一轮知识点选讲 \(NOIP\)(全国青少年信息学奥林匹克竞赛)于2019年取消.取而代之的是由\(CCF\)推出的非专业级软件能力认证,也就是现在的\(CSP-J/S\).作为一名 ...

  5. numpy cookbook

    1.第一章 import numpy as np import matplotlib.pyplot as plt import scipy import PIL import scipy.misc l ...

  6. Attention篇(二)

    主要是对<Attention is all you need>的分析 结合:http://www.cnblogs.com/robert-dlut/p/8638283.html  以及自己的 ...

  7. 浅析RPO漏洞攻击原理

    RPO的全称为Relative Path Overwrite,也就是相对路径覆盖,利用客户端和服务端的差异,通过相对路径来引入我们想要的js或者css文件,从而实现某种攻击. 就目前来看此攻击方法依赖 ...

  8. cordova生成签名的APK

    所有的Android应用程序在发布之前都要求用一个证书进行数字签名,anroid系统是不会安装没有进行签名的程序(安全考虑,可以查找相关文档) 签名过程详情见:https://www.cnblogs. ...

  9. loj 2719 「NOI2018」冒泡排序 - 组合数学

    题目传送门 传送门 题目大意 (相信大家都知道) 显然要考虑一个排列$p$合法的充要条件. 考虑这样一个构造$p$的过程.设排列$p^{-1}_{i}$满足$p_{p^{-1}_i} = i$. 初始 ...

  10. 【操作系统之十四】iptables扩展模块

    1.iprange 使用iprange扩展模块可以指定"一段连续的IP地址范围",用于匹配报文的源地址或者目标地址.--src-range:匹配报文的源地址所在范围--dst-ra ...