第一个flink application

官网参考：https://ci.apache.org/projects/flink/flink-docs-release-1.10/#api-references

导入maven依赖

需要注意的是，如果使用scala写程序，导入的依赖跟java是不一样的

Maven Dependencies

You can add the following dependencies to your pom.xml to include Apache Flink in your project. These dependencies include a local execution environment and thus support local testing.

Scala API: To use the Scala API, replace the flink-java artifact id with flink-scala_2. and flink-streaming-java_2. with flink-streaming-scala_2..

<dependency>

  <groupId>org.apache.flink</groupId>

  <artifactId>flink-java</artifactId>

  <version>1.8.</version>

</dependency>

<dependency>

  <groupId>org.apache.flink</groupId>

  <artifactId>flink-streaming-java_2.</artifactId>

  <version>1.8.</version>

</dependency>

<dependency>

  <groupId>org.apache.flink</groupId>

  <artifactId>flink-clients_2.</artifactId>

  <version>1.8.</version>

</dependency>

批处理wordcount示例（DataSet API）

import org.apache.flink.api.common.functions.FlatMapFunction;

import org.apache.flink.api.java.DataSet;

import org.apache.flink.api.java.ExecutionEnvironment;

import org.apache.flink.api.java.tuple.Tuple2;

import org.apache.flink.util.Collector;

public class WordCount {

    // 批量处理示例代码

    public static void main(String[] args) throws Exception {

        String inputPath = "E:\\flink\\words.txt";

        String outputPath = "E:\\flink\\result";

        //获取运行环境

        ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

        //读取文件

        DataSet<String> text = env.readTextFile(inputPath);

        DataSet<Tuple2<String, Integer>> counts =

                // split up the lines in pairs (2-tuples) containing: (word,1)

                text.flatMap(new Tokenizer())

                        // group by the tuple field "0" and sum up tuple field "1"

                        .groupBy(0) //以tuple的第一个字段分组

                        .sum(1);//以tuple的第二个字段计算总和

        //setParallelism来设置并行度，类似spark。如果不设置并行度，将以多线程的形式输出，生成多个文件

        counts.writeAsCsv(outputPath, "\n", " ").setParallelism(1);

        env.execute("Batch WordCount Example");

    }

    // 自定义函数，也可以不在这里自定义，直接卸载上面flatMap()中也可以

    public static class Tokenizer implements FlatMapFunction<String, Tuple2<String, Integer>> {

        @Override

        public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {

            // normalize and split the line

            String[] tokens = value.toLowerCase().split(",");

            for (String token : tokens) {

                if (token.length() > 0) {

                    //包装成tuple2

                    out.collect(new Tuple2<String, Integer>(token, 1));

                }

            }

        }

    }

}

流式处理wordcount示例（DataStream API）

import org.apache.flink.api.common.functions.FlatMapFunction;

import org.apache.flink.api.java.utils.ParameterTool;

import org.apache.flink.streaming.api.datastream.DataStream;

import org.apache.flink.streaming.api.datastream.DataStreamSource;

import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

import org.apache.flink.streaming.api.windowing.time.Time;

import org.apache.flink.util.Collector;

/**

 *  滑动窗口计算

 * 通过socket模拟产生单词数据

 * flink对数据进行统计计算

 */

public class SocketWindowWordCount {

    public static void main(String[] args) throws Exception {

        //获取socket的端口号

        int port;

        try {

            ParameterTool parameterTool = ParameterTool.fromArgs(args);

            port = parameterTool.getInt("port");

        }catch (Exception e){

            System.out.println("No port set. use default port 9000");

            port = 9999;

        }

        //获取运行环境

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        String hostname = "master01.hadoop.mobile.cn";

        String delimiter = "\n";

        DataStreamSource<String> text  = env.socketTextStream(hostname, port, delimiter);

        //跟spark一样，使用flatmap算子来操作

        //输入数据为string类型，输出为自定义的WordWithCount类型对象

        DataStream<WordWithCount> windowCounts = text.flatMap(new FlatMapFunction<String, WordWithCount>() {

            public void flatMap(String value, Collector<WordWithCount> out) throws Exception {

                String[] splits = value.split(" ");

                for (String word : splits) {

                    out.collect(new WordWithCount(word, 1L));

                }

            }

        }).keyBy("word")

                .timeWindow(Time.seconds(10), Time.seconds(5))//指定时间窗口大小为10秒，指定时间间隔为5秒

                //每隔1秒统计前2秒的数据

                .sum("count");

        //把数据打印到控制台并且设置并行度

        windowCounts.print().setParallelism(1);

        System.out.println(System.currentTimeMillis());

        env.execute("Socket window count");

    }

    public static class WordWithCount{

        public String word;

        public long count;

        public  WordWithCount(){}

        public WordWithCount(String word,long count){

            this.word = word;

            this.count = count;

        }

        @Override

        public String toString() {

            return "WordWithCount{" +

                    "word='" + word + '\'' +

                    ", count=" + count +

                    '}';

        }

    }

}

关于keyby算子：

    /**

     * Partitions the operator state of a {@link DataStream} using field expressions.

     * A field expression is either the name of a public field or a getter method with parentheses

     * of the {@link DataStream}'s underlying type. A dot can be used to drill

     * down into objects, as in {@code "field1.getInnerField2()" }.

     *

     * @param fields

     *            One or more field expressions on which the state of the {@link DataStream} operators will be

     *            partitioned.

     * @return The {@link DataStream} with partitioned state (i.e. KeyedStream)

     * keyby用于分组的，接收的为变长参数，所以key可以指定一个或者多个字段。

     *    此外在指定key的时候可以直接指定该字段的名字（但是要求为public类型的，否则报错如下：

     *    Exception in thread "main" org.apache.flink.api.common.InvalidProgramException: This type (GenericType<SocketWindowWordCount.WordWithCount>) cannot be used as key.

     *    at org.apache.flink.api.common.operators.Keys$ExpressionKeys.<init>(Keys.java:330)

     *    at org.apache.flink.streaming.api.datastream.DataStream.keyBy(DataStream.java:337)

     *    at SocketWindowWordCount.main(SocketWindowWordCount.java:41)

     ）

     也可以通过getter方法来获取

     **/

    public KeyedStream<T, Tuple> keyBy(String... fields) {

        return keyBy(new Keys.ExpressionKeys<>(fields, getType()));

    }

flink table sql处理

package com.kong.flink;

import org.apache.flink.api.java.DataSet;

import org.apache.flink.api.java.ExecutionEnvironment;

import org.apache.flink.table.api.Table;

import org.apache.flink.table.api.java.BatchTableEnvironment;

import java.util.ArrayList;

public class FlinkSqlWordCount {

    public static void main(String[] args) throws Exception {

        ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

        //创建一个tableEnvironment

        BatchTableEnvironment tableEnv = BatchTableEnvironment.create(env);

        //将word封装成对象

        String words = "hello,flink,hello,ksw";

        ArrayList<WordCount> list = new ArrayList<>();

        String[] split = words.split(",");

        for (String word : split) {

            list.add(new WordCount(word, 1L));

        }

        //生成DataSet,类似spark并行化一个集合生成rdd

        DataSet<WordCount> inputDataSet = env.fromCollection(list);

        //将dataset转换为table

        //     * @param dataSet The {@link DataSet} to be converted.

        //     * @param fields The field names of the resulting {@link Table}.

        //第一个参数表示我们要转换为table的dataSet;第二个参数表示table对应的字段名字

        Table table = tableEnv.fromDataSet(inputDataSet, "word,frequency");

        table.printSchema();

        tableEnv.createTemporaryView("WordCount", table);

//        tableEnv.createTemporaryView("wordCount",inputDataSet,"word,count");

        Table table1 = tableEnv.sqlQuery("select word as word, sum(frequency) as frequency from WordCount GROUP BY word");

        DataSet<WordCount> resultDataSet = tableEnv.toDataSet(table1, WordCount.class);

        resultDataSet.printToErr();

    }

    public static class WordCount {

        public String word;

        public long frequency;//这里不能用count表示，属于flink sql保留关键词...参考：https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/sql/index.html#reserved-keywords

        //这个无参构造方法必须要有，要不会报错...参考：https://ci.apache.org/projects/flink/flink-docs-release-1.10/zh/dev/api_concepts.html#pojo

        //org.apache.flink.table.api.ValidationException: Too many fields referenced from an atomic type.

        public WordCount() {

        }

        public WordCount(String word, long frequency) {

            this.word = word;

            this.frequency = frequency;

        }

        @Override

        public String toString() {

            return word + ", " + frequency;

        }

    }

}

第一个flink application的更多相关文章

构建一个flink程序,从kafka读取然后写入MYSQL
最近flink已经变得比较流行了,所以大家要了解flink并且使用flink.现在最流行的实时计算应该就是flink了,它具有了流计算和批处理功能.它可以处理有界数据和无界数据,也就是可以处理永远生产 ...
iOS开发之 Xcode 6 创建一个Empty Application
参考链接http://jingyan.baidu.com/article/2a138328bd73f2074b134f6d.html Xcode 6 正式版如何创建一个Empty Applicatio ...
Xcode7 通过 Single View Application 得到一个 Empty Application 工程
方法: 创建一个 Empty Application 工程下面还是详细的说一下通过一个 Single View Application 工程得到一个 Empty Application 工程的方法: ...
Xcode7.2中如何添加一个Empty Application模板
大熊猫猪·侯佩原创或翻译作品.欢迎转载,转载请注明出处. 如果觉得写的不好请多提意见,如果觉得不错请多多支持点赞.谢谢! hopy ;) Xcode 6.0正式版之后已经没有所谓的Empty Appl ...
Flink从入门到放弃(入门篇2)-本地环境搭建&构建第一个Flink应用
戳更多文章: 1-Flink入门 2-本地环境搭建&构建第一个Flink应用 3-DataSet API 4-DataSteam API 5-集群部署 6-分布式缓存 7-重启策略 8-Fli ...
Extend一个web application没有反应怎么办？
通过SharePoint管理中心Extend一个web application的时候, 点完确定按钮后,没有反应,怎么回事? [解决方法] 多等一会,不要连续点. 等待的过程中看看iis, 过一会 ...
一个flink作业的调优
最近接手了一个flink作业,另外一个同事断断续续有的没的写了半年的,不着急,也一直没上线,最近突然要上线,扔给我,要调通上线. 现状是: 1.代码跑不动,资源给的不少,但是就是频繁反压. 2.che ...
在 Cloudera Data Flow 上运行你的第一个 Flink 例子
文档编写目的 Cloudera Data Flow(CDF) 作为 Cloudera 一个独立的产品单元,围绕着实时数据采集,实时数据处理和实时数据分析有多个不同的功能模块,如下图所示: 图中 4 个 ...
怎么确定一个Flink job的资源
怎么确定一个Flink job的资源 Slots && parallelism 一个算子的parallelism 是5 ,那么这个算子就需要5个slot, 公式 :一个算子的paral ...

随机推荐

2_02_MSSQL课程_where查询和like模糊查询
1.where 条件过滤常见的表达式过滤:比如: select * from 表 where Id>10; 多条件过滤: and or not (优先级:not > and > ...
Linux命令：date命令
date命令作用:显示和设置系统的日期和时间一.设置系统日期时间格式:date [MMDDhhmm[[CC]YY][.ss]] 举例:将当前系统时间改为 2020年10月1日12点10分 # da ...
Android View转换成图片保存
package zhangphil.viewtoimage; import java.io.File;import java.io.FileOutputStream; import android.o ...
ubuntu18.04 LAMP DVWA
一.基本擦作: sudo apt-get install lamp-server^ sudo chmod 777 /var/www #也有可能是/var/www/html,访问127.0.0.1验证是 ...
Spark调优（二）数据本地化
Application任务执行流程: 在Spark Application提交后,Driver会根据action算子划分成一个个的job,然后对每一个job划分成一个个的stage,stage内部 ...
火狐中添加selenium IDE
在火狐中添加selenium IDE 1.下载selenium IDE,此处下载的是selenium-ide-2.5.0.xpi 2.在火狐中,打开菜单-->附加组件-->用户附加组件的工 ...
C++ 11 ：override 关键字的使用
override 关键字作用:在成员函数声明或定义中, override 确保该函数为虚函数并覆写来自基类的虚函数. 位置:函数调用运算符之后,函数体或纯虚函数标识 "= 0" ...
变相降价的iPhone，能挽救苹果在中国的命运吗？
人无千日好,花无百样红.当年iPhone的横空出世不仅开辟了智能手机时代,还间接导致了诺基亚.黑莓等手机品牌的没落.十余年来,苹果凭借iPhone活得风光无限,并成为全球首个市值超万亿美元的公司.但进 ...
node - multer 加图片后缀
var multer = require('multer') var storage = multer.diskStorage({ destination: function (req, file ...
VUE - 引入 npm 安装的模块以及 uuid模块的使用
<template> <div> <form @submit.prevent="addTodo"> <in ...

第一个flink application

导入maven依赖

批处理wordcount示例（DataSet API）

流式处理wordcount示例（DataStream API）

flink table sql处理

第一个flink application的更多相关文章

随机推荐

热门专题