第一个flink application
官网参考:https://ci.apache.org/projects/flink/flink-docs-release-1.10/#api-references
导入maven依赖
需要注意的是,如果使用scala写程序,导入的依赖跟java是不一样的
Maven Dependencies
You can add the following dependencies to your pom.xml to include Apache Flink in your project. These dependencies include a local execution environment and thus support local testing. Scala API: To use the Scala API, replace the flink-java artifact id with flink-scala_2. and flink-streaming-java_2. with flink-streaming-scala_2..
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>1.8.</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.</artifactId>
<version>1.8.</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_2.</artifactId>
<version>1.8.</version>
</dependency>
批处理wordcount示例(DataSet API)
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.util.Collector; public class WordCount { // 批量处理示例代码
public static void main(String[] args) throws Exception {
String inputPath = "E:\\flink\\words.txt";
String outputPath = "E:\\flink\\result";
//获取运行环境
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
//读取文件
DataSet<String> text = env.readTextFile(inputPath); DataSet<Tuple2<String, Integer>> counts =
// split up the lines in pairs (2-tuples) containing: (word,1)
text.flatMap(new Tokenizer())
// group by the tuple field "0" and sum up tuple field "1"
.groupBy(0) //以tuple的第一个字段分组
.sum(1);//以tuple的第二个字段计算总和 //setParallelism来设置并行度,类似spark。如果不设置并行度,将以多线程的形式输出,生成多个文件
counts.writeAsCsv(outputPath, "\n", " ").setParallelism(1); env.execute("Batch WordCount Example"); } // 自定义函数,也可以不在这里自定义,直接卸载上面flatMap()中也可以
public static class Tokenizer implements FlatMapFunction<String, Tuple2<String, Integer>> { @Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
// normalize and split the line
String[] tokens = value.toLowerCase().split(","); for (String token : tokens) {
if (token.length() > 0) {
//包装成tuple2
out.collect(new Tuple2<String, Integer>(token, 1));
}
}
}
}
}
流式处理wordcount示例(DataStream API)
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.util.Collector; /**
* 滑动窗口计算
* 通过socket模拟产生单词数据
* flink对数据进行统计计算
*/
public class SocketWindowWordCount { public static void main(String[] args) throws Exception {
//获取socket的端口号
int port;
try {
ParameterTool parameterTool = ParameterTool.fromArgs(args);
port = parameterTool.getInt("port");
}catch (Exception e){
System.out.println("No port set. use default port 9000");
port = 9999;
} //获取运行环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
String hostname = "master01.hadoop.mobile.cn";
String delimiter = "\n";
DataStreamSource<String> text = env.socketTextStream(hostname, port, delimiter);
//跟spark一样,使用flatmap算子来操作
//输入数据为string类型,输出为自定义的WordWithCount类型对象
DataStream<WordWithCount> windowCounts = text.flatMap(new FlatMapFunction<String, WordWithCount>() {
public void flatMap(String value, Collector<WordWithCount> out) throws Exception {
String[] splits = value.split(" ");
for (String word : splits) {
out.collect(new WordWithCount(word, 1L));
}
}
}).keyBy("word")
.timeWindow(Time.seconds(10), Time.seconds(5))//指定时间窗口大小为10秒,指定时间间隔为5秒
//每隔1秒统计前2秒的数据
.sum("count"); //把数据打印到控制台并且设置并行度
windowCounts.print().setParallelism(1);
System.out.println(System.currentTimeMillis());
env.execute("Socket window count");
} public static class WordWithCount{
public String word;
public long count;
public WordWithCount(){}
public WordWithCount(String word,long count){
this.word = word;
this.count = count;
}
@Override
public String toString() {
return "WordWithCount{" +
"word='" + word + '\'' +
", count=" + count +
'}';
}
} }
关于keyby算子:
/**
* Partitions the operator state of a {@link DataStream} using field expressions.
* A field expression is either the name of a public field or a getter method with parentheses
* of the {@link DataStream}'s underlying type. A dot can be used to drill
* down into objects, as in {@code "field1.getInnerField2()" }.
*
* @param fields
* One or more field expressions on which the state of the {@link DataStream} operators will be
* partitioned.
* @return The {@link DataStream} with partitioned state (i.e. KeyedStream)
* keyby用于分组的,接收的为变长参数,所以key可以指定一个或者多个字段。
* 此外在指定key的时候可以直接指定该字段的名字(但是要求为public类型的,否则报错如下:
* Exception in thread "main" org.apache.flink.api.common.InvalidProgramException: This type (GenericType<SocketWindowWordCount.WordWithCount>) cannot be used as key.
* at org.apache.flink.api.common.operators.Keys$ExpressionKeys.<init>(Keys.java:330)
* at org.apache.flink.streaming.api.datastream.DataStream.keyBy(DataStream.java:337)
* at SocketWindowWordCount.main(SocketWindowWordCount.java:41)
)
也可以通过getter方法来获取
**/
public KeyedStream<T, Tuple> keyBy(String... fields) {
return keyBy(new Keys.ExpressionKeys<>(fields, getType()));
}
flink table sql处理
package com.kong.flink; import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.java.BatchTableEnvironment; import java.util.ArrayList; public class FlinkSqlWordCount { public static void main(String[] args) throws Exception {
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
//创建一个tableEnvironment
BatchTableEnvironment tableEnv = BatchTableEnvironment.create(env); //将word封装成对象
String words = "hello,flink,hello,ksw";
ArrayList<WordCount> list = new ArrayList<>();
String[] split = words.split(",");
for (String word : split) {
list.add(new WordCount(word, 1L));
} //生成DataSet,类似spark并行化一个集合生成rdd
DataSet<WordCount> inputDataSet = env.fromCollection(list);
//将dataset转换为table
// * @param dataSet The {@link DataSet} to be converted.
// * @param fields The field names of the resulting {@link Table}.
//第一个参数表示我们要转换为table的dataSet;第二个参数表示table对应的字段名字
Table table = tableEnv.fromDataSet(inputDataSet, "word,frequency");
table.printSchema();
tableEnv.createTemporaryView("WordCount", table);
// tableEnv.createTemporaryView("wordCount",inputDataSet,"word,count");
Table table1 = tableEnv.sqlQuery("select word as word, sum(frequency) as frequency from WordCount GROUP BY word");
DataSet<WordCount> resultDataSet = tableEnv.toDataSet(table1, WordCount.class);
resultDataSet.printToErr();
} public static class WordCount {
public String word;
public long frequency;//这里不能用count表示,属于flink sql保留关键词...参考:https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/sql/index.html#reserved-keywords //这个无参构造方法必须要有,要不会报错...参考:https://ci.apache.org/projects/flink/flink-docs-release-1.10/zh/dev/api_concepts.html#pojo
//org.apache.flink.table.api.ValidationException: Too many fields referenced from an atomic type.
public WordCount() {
} public WordCount(String word, long frequency) {
this.word = word;
this.frequency = frequency;
} @Override
public String toString() {
return word + ", " + frequency;
}
}
}
第一个flink application的更多相关文章
- 构建一个flink程序,从kafka读取然后写入MYSQL
最近flink已经变得比较流行了,所以大家要了解flink并且使用flink.现在最流行的实时计算应该就是flink了,它具有了流计算和批处理功能.它可以处理有界数据和无界数据,也就是可以处理永远生产 ...
- iOS开发之 Xcode 6 创建一个Empty Application
参考链接http://jingyan.baidu.com/article/2a138328bd73f2074b134f6d.html Xcode 6 正式版如何创建一个Empty Applicatio ...
- Xcode7 通过 Single View Application 得到一个 Empty Application 工程
方法: 创建一个 Empty Application 工程 下面还是详细的说一下通过一个 Single View Application 工程得到一个 Empty Application 工程的方法: ...
- Xcode7.2中如何添加一个Empty Application模板
大熊猫猪·侯佩原创或翻译作品.欢迎转载,转载请注明出处. 如果觉得写的不好请多提意见,如果觉得不错请多多支持点赞.谢谢! hopy ;) Xcode 6.0正式版之后已经没有所谓的Empty Appl ...
- Flink从入门到放弃(入门篇2)-本地环境搭建&构建第一个Flink应用
戳更多文章: 1-Flink入门 2-本地环境搭建&构建第一个Flink应用 3-DataSet API 4-DataSteam API 5-集群部署 6-分布式缓存 7-重启策略 8-Fli ...
- Extend一个web application没有反应怎么办?
通过SharePoint管理中心Extend一个web application的时候, 点完确定按钮后,没有反应,怎么回事? [解决方法] 多等一会,不要连续点. 等待的过程中看看iis, 过一会 ...
- 一个flink作业的调优
最近接手了一个flink作业,另外一个同事断断续续有的没的写了半年的,不着急,也一直没上线,最近突然要上线,扔给我,要调通上线. 现状是: 1.代码跑不动,资源给的不少,但是就是频繁反压. 2.che ...
- 在 Cloudera Data Flow 上运行你的第一个 Flink 例子
文档编写目的 Cloudera Data Flow(CDF) 作为 Cloudera 一个独立的产品单元,围绕着实时数据采集,实时数据处理和实时数据分析有多个不同的功能模块,如下图所示: 图中 4 个 ...
- 怎么确定一个Flink job的资源
怎么确定一个Flink job的资源 Slots && parallelism 一个算子的parallelism 是5 ,那么这个算子就需要5个slot, 公式 :一个算子的paral ...
随机推荐
- isEqual判断相等性
1.isEqual方法用来判断两个比较者的内存地址是否一样.为了细分,有isEqualToString.isEqualToNumber.isEuqalToValue等,使用时一定要精确使用,比如虽然N ...
- 吴裕雄 Bootstrap 前端框架开发——Bootstrap 辅助类:"text-danger" 类的文本样式
<!DOCTYPE html> <html> <head> <meta charset="utf-8"> <title> ...
- 学习进度-16 python爬虫
爬虫是一个程序,这个程序的目的就是为了抓取万维网信息资源,比如你日常使用的谷歌等搜索引擎,搜索结果就全都依赖爬虫来定时获取 从百度可以看出来 爬虫与python关系很紧密, 爬虫的目标对象也很丰富,不 ...
- 4.8.2.JSDOM对象控制HTML元素详解
1 <!DOCTYPE html> <html> <head> <meta charset="utf-8" /> <title ...
- [CISCN2019 华北赛区 Day1 Web2]ikun
知识点:逻辑漏洞.jwt密钥破解.python反序列化漏洞 进入靶机查看源码: 提示需要买到lv6,注册账号发现给了1000块钱,根据ctf套路应该是用很低的价格买很贵的lv6,首页翻了几页都没发现l ...
- 「PA2014」Fiolki
传送门 Bzoj 解题思路 构造法. 对于每一次的倾倒操作,连边 \(newnode\to u,newnode\to v\). 最后所有的反应都会在构造出来的树上的对应两点的 \(\text{LCA} ...
- Servlet 3.0 新特性概述
Servlet 3.0 新特性概述 Servlet 3.0 作为 Java EE 6 规范体系中一员,随着 Java EE 6 规范一起发布.该版本在前一版本(Servlet 2.5)的基础上提供了若 ...
- mysql多实例双主部署
本文引自公司技术文档,仅作为记录. 背景 有的时候需要进行数据库环境的隔离,以及节省服务器资源一台mysql安装多个数据库实例,一个实例下构建多个数据库 安装mysql yum -y install ...
- Redis Cluster 4.0.9 集群安装搭建
Redis Cluster 4.0.9集群搭建步骤:yum install -y gcc g++ gcc-c++ make openssl cd redis-4.0.9 make mkdir -p / ...
- 微信小程序IOS真机调试发生了SSL 错误,无法建立与该服务器的安全连接
小程序 真机调试 IOS request:fail 发生了SSL 错误,无法建立与该服务器的安全连接,解决方法服务器中打开Powerhell,执行以下代码,然后重启服务器 # Enables TLS ...