1. Create environment for stream computing

  1. StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
  2. env.getConfig().disableSysoutLogging();
  3. env.getConfig().setRestartStrategy(RestartStrategies.fixedDelayRestart(4, 10000));
  4. env.enableCheckpointing(5000); // create a checkpoint every 5 seconds
  5. env.getConfig().setGlobalJobParameters(parameterTool); // make parameters available in the web interface
  6. env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
  1. public static StreamExecutionEnvironment getExecutionEnvironment() {
  2. if (contextEnvironmentFactory != null) {
  3. return contextEnvironmentFactory.createExecutionEnvironment();
  4. }
  5.  
  6. // because the streaming project depends on "flink-clients" (and not the other way around)
  7. // we currently need to intercept the data set environment and create a dependent stream env.
  8. // this should be fixed once we rework the project dependencies
  9.  
  10. ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  11. if (env instanceof ContextEnvironment) {
  12. return new StreamContextEnvironment((ContextEnvironment) env);
  13. } else if (env instanceof OptimizerPlanEnvironment || env instanceof PreviewPlanEnvironment) {
  14. return new StreamPlanEnvironment(env);
  15. } else {
  16. return createLocalEnvironment();
  17. }
  18. }

2.  Now we need to add the data source for further computing

  1. DataStream<KafkaEvent> input = env
  2. .addSource( new FlinkKafkaConsumer010<>(
  3. parameterTool.getRequired("input-topic"),
  4. new KafkaEventSchema(),
  5. parameterTool.getProperties()).assignTimestampsAndWatermarks(new CustomWatermarkExtractor()))
  6. .keyBy("word")
  7. .map(new RollingAdditionMapper());
  1. public <OUT> DataStreamSource<OUT> addSource(SourceFunction<OUT> function) {
  2. return addSource(function, "Custom Source");
  3. }
  1. @SuppressWarnings("unchecked")
  2. public <OUT> DataStreamSource<OUT> addSource(SourceFunction<OUT> function, String sourceName, TypeInformation<OUT> typeInfo) {
  3.  
  4. if (typeInfo == null) {
  5. if (function instanceof ResultTypeQueryable) {
  6. typeInfo = ((ResultTypeQueryable<OUT>) function).getProducedType();
  7. } else {
  8. try {
  9. typeInfo = TypeExtractor.createTypeInfo(
  10. SourceFunction.class,
  11. function.getClass(), 0, null, null);
  12. } catch (final InvalidTypesException e) {
  13. typeInfo = (TypeInformation<OUT>) new MissingTypeInfo(sourceName, e);
  14. }
  15. }
  16. }
  17.  
  18. boolean isParallel = function instanceof ParallelSourceFunction;
  19.  
  20. clean(function);
  21. StreamSource<OUT, ?> sourceOperator;
  22. if (function instanceof StoppableFunction) {
  23. sourceOperator = new StoppableStreamSource<>(cast2StoppableSourceFunction(function));
  24. } else {
  25. sourceOperator = new StreamSource<>(function);
  26. }
  27.  
  28. return new DataStreamSource<>(this, typeInfo, sourceOperator, isParallel, sourceName);
  29. }
  1. public <R> SingleOutputStreamOperator<R> map(MapFunction<T, R> mapper) {
  2.  
  3. TypeInformation<R> outType = TypeExtractor.getMapReturnTypes(clean(mapper), getType(),
  4. Utils.getCallLocationName(), true);
  5.  
  6. return transform("Map", outType, new StreamMap<>(clean(mapper)));
  7. }
  1. public <R> SingleOutputStreamOperator<R> transform(String operatorName, TypeInformation<R> outTypeInfo, OneInputStreamOperator<T, R> operator) {
  2.  
  3. // read the output type of the input Transform to coax out errors about MissingTypeInfo
  4. transformation.getOutputType();
  5.  
  6. OneInputTransformation<T, R> resultTransform = new OneInputTransformation<>(
  7. this.transformation,
  8. operatorName,
  9. operator,
  10. outTypeInfo,
  11. environment.getParallelism());
  12.  
  13. @SuppressWarnings({ "unchecked", "rawtypes" })
  14. SingleOutputStreamOperator<R> returnStream = new SingleOutputStreamOperator(environment, resultTransform);
  15.  
  16. getExecutionEnvironment().addOperator(resultTransform);
  17.  
  18. return returnStream;
  19. }
  1. @Internal
  2. public void addOperator(StreamTransformation<?> transformation) {
  3. Preconditions.checkNotNull(transformation, "transformation must not be null.");
  4. this.transformations.add(transformation);
  5. }
  1. protected final List<StreamTransformation<?>> transformations = new ArrayList<>();
  1. public KeyedStream<T, Tuple> keyBy(String... fields) {
  2. return keyBy(new Keys.ExpressionKeys<>(fields, getType()));
  3. }
  4.  
  5. private KeyedStream<T, Tuple> keyBy(Keys<T> keys) {
  6. return new KeyedStream<>(this, clean(KeySelectorUtil.getSelectorForKeys(keys,
  7. getType(), getExecutionConfig())));
  8. }

3. The data from data source will be streamed into Flink Distributed Computing Runtime and the computed result will be transfered to data Sink.

  1. input.addSink( new FlinkKafkaProducer010<>(
  2. parameterTool.getRequired("output-topic"),
  3. new KafkaEventSchema(),
  4. parameterTool.getProperties()));
  1. public DataStreamSink<T> addSink(SinkFunction<T> sinkFunction) {
  2.  
  3. // read the output type of the input Transform to coax out errors about MissingTypeInfo
  4. transformation.getOutputType();
  5.  
  6. // configure the type if needed
  7. if (sinkFunction instanceof InputTypeConfigurable) {
  8. ((InputTypeConfigurable) sinkFunction).setInputType(getType(), getExecutionConfig());
  9. }
  10.  
  11. StreamSink<T> sinkOperator = new StreamSink<>(clean(sinkFunction));
  12.  
  13. DataStreamSink<T> sink = new DataStreamSink<>(this, sinkOperator);
  14.  
  15. getExecutionEnvironment().addOperator(sink.getTransformation());
  16. return sink;
  17. }
  1. @Internal
  2. public void addOperator(StreamTransformation<?> transformation) {
  3. Preconditions.checkNotNull(transformation, "transformation must not be null.");
  4. this.transformations.add(transformation);
  5. }
  1. protected final List<StreamTransformation<?>> transformations = new ArrayList<>();

4. The last step is to start executing.

  1. env.execute("Kafka 0.10 Example");

The mapper computing template is defined as blow.

  1. private static class RollingAdditionMapper extends RichMapFunction<KafkaEvent, KafkaEvent> {
  2.  
  3. private static final long serialVersionUID = 1180234853172462378L;
  4.  
  5. private transient ValueState<Integer> currentTotalCount;
  6.  
  7. @Override
  8. public KafkaEvent map(KafkaEvent event) throws Exception {
  9. Integer totalCount = currentTotalCount.value();
  10.  
  11. if (totalCount == null) {
  12. totalCount = 0;
  13. }
  14. totalCount += event.getFrequency();
  15.  
  16. currentTotalCount.update(totalCount);
  17.  
  18. return new KafkaEvent(event.getWord(), totalCount, event.getTimestamp());
  19. }
  20.  
  21. @Override
  22. public void open(Configuration parameters) throws Exception {
  23. currentTotalCount = getRuntimeContext().getState(new ValueStateDescriptor<>("currentTotalCount", Integer.class));
  24. }
  25. }

http://www.debugrun.com/a/LjK8Nni.html

Flink Flow的更多相关文章

  1. 在 Cloudera Data Flow 上运行你的第一个 Flink 例子

    文档编写目的 Cloudera Data Flow(CDF) 作为 Cloudera 一个独立的产品单元,围绕着实时数据采集,实时数据处理和实时数据分析有多个不同的功能模块,如下图所示: 图中 4 个 ...

  2. Flink Internals

    https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals   Memory Management (Batch API) In ...

  3. Peeking into Apache Flink's Engine Room

    http://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html   Join Processin ...

  4. Flink - Juggling with Bits and Bytes

    http://www.36dsj.com/archives/33650 http://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-B ...

  5. Flink资料(3)-- Flink一般架构和处理模型

    Flink一般架构和处理模型 本文翻译自General Architecture and Process Model ----------------------------------------- ...

  6. Flink资料(2)-- 数据流容错机制

    数据流容错机制 该文档翻译自Data Streaming Fault Tolerance,文档描述flink在流式数据流图上的容错机制. ------------------------------- ...

  7. Flink架构、原理与部署测试

    Apache Flink是一个面向分布式数据流处理和批量数据处理的开源计算平台,它能够基于同一个Flink运行时,提供支持流处理和批处理两种类型应用的功能. 现有的开源计算方案,会把流处理和批处理作为 ...

  8. [Note] Apache Flink 的数据流编程模型

    Apache Flink 的数据流编程模型 抽象层次 Flink 为开发流式应用和批式应用设计了不同的抽象层次 状态化的流 抽象层次的最底层是状态化的流,它通过 ProcessFunction 嵌入到 ...

  9. Apache Flink 分布式执行

    Flink 的分布式执行过程包含两个重要的角色,master 和 worker,参与 Flink 程序执行的有多个进程,包括 Job Manager,Task Manager 以及 Job Clien ...

随机推荐

  1. CoreText 图文混排

    基本原理 https://www.cnblogs.com/purple-sweet-pottoes/p/5109413.html CoreText(一):基本用法  https://blog.csdn ...

  2. python学习,day2:列表的复制,字符串的处理

    ---恢复内容开始--- 元组(tuple)是只读列表,不能修改,列表用中括号,元组用小括号.只能用index和count两个命令. ---恢复内容结束--- 字符串处理的代码 # coding=ut ...

  3. Vue.js路由跳转带参数到模板组件。

    从SalesOrderQuery组件跳到SalesOrder组件,并且通过params属性携带数据. handleClick(row) { //alert(row.FSaleName);//获取该行F ...

  4. nginx常用配置3

    ## 六.浏览器本地缓存配置 语法:expires 60 s|m|h|d ```动静分离效果: server { listen 80; server_name localhost; location ...

  5. selenium+Python(处理html5的视频播放)

    Webdriver支持在指定的浏览器测试HTML5,另外可以用JavaScript来测试这些功能,这样就可以在任何浏览器上测试HTML5 多数浏览器使用控件来播放视频,但是不同浏览器需要使用不同的插件 ...

  6. ps如何替换有透明图片的颜色

    修改透明图片的颜色 首先用魔棒工具点选颜色区域,然后再在菜单中找到 图像-调整-替换颜色,就可以选任意想要的颜色

  7. 在页面中嵌入svg的几种方法

    //在页面中嵌入svg的方法1:使用 <embed> 标签<embed> 标签被所有主流的浏览器支持,并允许使用脚本.注释:当在 HTML 页面中嵌入 SVG 时使用 < ...

  8. MySQL 主重复 时时

    MySQL .7开启Enhanced Multi-Threaded Slave配置: #slave slave-parallel-type=LOGICAL_CLOCK slave master_inf ...

  9. 存储器的保护(一)——《x86汇编语言:从实模式到保护模式》读书笔记18

    本文是原书第12章的学习笔记. 说句题外话,这篇博文是补写的,因为让我误删了,可恶的是CSDN的回收站里找不到! 好吧,那就再写一遍,我有坚强的意志.司马迁曰:“文王拘而演<周易>:仲尼厄 ...

  10. DateTime.Now与DateTime.Today的区别

    区别如下图: DateTime.Now: 不仅显示日期 还显示当前时间: DateTime.Today: 只显示当前日期,没有时间