聊聊flink的CsvTableSource

　　序
　　
　　本文主要研究一下flink的CsvTableSource
　　
　　TableSource
　　
　　flink-table_2.11-1.7.1-sources.jar!/org/apache/flink/table/sources/TableSource.scala
　　
　　trait TableSource[T] {
　　
　　/** Returns the [[TypeInformation]] for the return type of the [[TableSource]].
　　
　　* The fields of the return type are mapped to the table schema based on their name.
　　
　　*
　　
　　* @return The type of the returned [[DataSet]] or [[DataStream]].
　　
　　*/
　　
　　def getReturnType: TypeInformation[T]
　　
　　/**
　　
　　* Returns the schema of the produced table.
　　
　　*
　　
　　* @return The [[TableSchema]] of the produced table.
　　
　　*/
　　
　　def getTableSchema: TableSchema
　　
　　/**
　　
　　* Describes the table source.
　　
　　*
　　
　　* @return A String explaining the [[TableSource]].
　　
　　*/
　　
　　def explainSource(): String =
　　
　　TableConnectorUtil.generateRuntimeName(getClass, getTableSchema.getFieldNames)
　　
　　}
　　
　　TableSource定义了三个方法，分别是getReturnType、getTableSchema、explainSource
　　
　　BatchTableSource
　　
　　flink-table_2.11-1.7.1-sources.jar!/org/apache/flink/table/sources/BatchTableSource.scala
　　
　　trait BatchTableSource[T] extends TableSource[T] {
　　
　　/**
　　
　　* Returns the data of the table as a [[DataSet]].
　　
　　*
　　
　　* NOTE: This method is for internal use only for defining a [[TableSource]].
　　
　　* Do not use it in Table API programs.
　　
　　*/
　　
　　def getDataSet(execEnv: ExecutionEnvironment): DataSet[T]
　　
　　}
　　
　　BatchTableSource继承了TableSource，它定义了getDataSet方法
　　
　　StreamTableSource
　　
　　flink-table_2.11-1.7.1-sources.jar!/org/apache/flink/table/sources/StreamTableSource.scala
　　
　　trait StreamTableSource[T] extends TableSource[T] {
　　
　　/**
　　
　　* Returns the data of the table as a [[DataStream]].
　　
　　*
　　
　　* NOTE: This method is for internal use only for defining a [[TableSource]].
　　
　　* Do not use it in Table API programs.
　　
　　*/
　　
　　def getDataStream(execEnv: StreamExecutionEnvironment): DataStream[T]
　　
　　}
　　
　　StreamTableSource继承了TableSource，它定义了getDataStream方法
　　
　　CsvTableSource
　　
　　flink-table_2.11-1.7.1-sources.jar!/org/apache/flink/table/sources/CsvTableSource.scala
　　
　　class CsvTableSource private (
　　
　　private val path: String,
　　
　　private val fieldNames: Array[String],
　　
　　private val fieldTypes: Array[TypeInformation[_]],
　　
　　private val selectedFields: Array[Int],
　　
　　private val fieldDelim: String,
　　
　　private val rowDelim: String,
　　
　　private val quoteCharacter: Character,
　　
　　private val ignoreFirstLine: Boolean,
　　
　　private val ignoreComments: String,
　　
　　private val lenient: Boolean)
　　
　　extends BatchTableSource[Row]
　　
　　with StreamTableSource[Row]
　　
　　with ProjectableTableSource[Row] {
　　
　　def this(
　　
　　path: String,
　　
　　fieldNames: Array[String],
　　
　　fieldTypes: Array[TypeInformation[_]],
　　
　　fieldDelim: String = CsvInputFormat.DEFAULT_FIELD_DELIMITER,
　　
　　rowDelim: String = CsvInputFormat.DEFAULT_LINE_DELIMITER,
　　
　　quoteCharacter: Character = null,
　　
　　ignoreFirstLine: Boolean = false,
　　
　　ignoreComments: String = null,
　　
　　lenient: Boolean = false)www.michenggw.com = {
　　
　　this(
　　
　　path,
　　
　　fieldNames,
　　
　　fieldTypes,
　　
　　fieldTypes.indices.toArray, // initially, all fields are returned
　　
　　fieldDelim,
　　
　　rowDelim,
　　
　　quoteCharacter,
　　
　　ignoreFirstLine,
　　
　　ignoreComments,
　　
　　lenient)
　　
　　}
　　
　　def this(path: String, fieldNames: Array[String]www.fengshen157.com/, fieldTypes: Array[TypeInformation[_]]) = {
　　
　　this(path, fieldNames, fieldTypes, CsvInputFormat.DEFAULT_FIELD_DELIMITER,
　　
　　CsvInputFormat.DEFAULT_LINE_DELIMITER, null, false, null, false)
　　
　　}
　　
　　if (fieldNames.length != fieldTypes.length) {
　　
　　throw new TableException("Number of field names and field types must be equal.")
　　
　　}
　　
　　private val selectedFieldTypes = selectedFields.map(fieldTypes(_))
　　
　　private val selectedFieldNames = selectedFields.map(fieldNames(_))
　　
　　private val returnType: RowTypeInfo = new RowTypeInfo(selectedFieldTypes, selectedFieldNames)
　　
　　override def getDataSet(execEnv: ExecutionEnvironment): DataSet[Row] = {
　　
　　execEnv.createInput(createCsvInput(), returnType).name(explainSource())
　　
　　}
　　
　　/** Returns the [[RowTypeInfo]] for the return type of the [[CsvTableSource]]. */
　　
　　override def getReturnType: www.leyouzaixian2.com RowTypeInfo = returnType
　　
　　override def getDataStream(streamExecEnv: StreamExecutionEnvironment): DataStream[Row] = {
　　
　　streamExecEnv.createInput(createCsvInput(), returnType).name(explainSource())
　　
　　}
　　
　　/** Returns the schema of the produced table. */
　　
　　override def getTableSchema = new TableSchema(fieldNames, fieldTypes)
　　
　　/** Returns a copy of [[TableSource]] with ability to project fields */
　　
　　override def projectFields(fields: Array[Int]): CsvTableSource = {
　　
　　val selectedFields = if (fields.isEmpty) Array(0) else fields
　　
　　new CsvTableSource(
　　
　　path,
　　
　　fieldNames,
　　
　　fieldTypes,
　　
　　selectedFields,
　　
　　fieldDelim,
　　
　　rowDelim,
　　
　　quoteCharacter,
　　
　　ignoreFirstLine,
　　
　　ignoreComments,
　　
　　lenient)
　　
　　}
　　
　　private def createCsvInput(): RowCsvInputFormat = {
　　
　　val inputFormat = new RowCsvInputFormat(
　　
　　new Path(path),
　　
　　selectedFieldTypes,
　　
　　rowDelim,
　　
　　fieldDelim,
　　
　　selectedFields)
　　
　　inputFormat.setSkipFirstLineAsHeader(ignoreFirstLine)
　　
　　inputFormat.setLenient(www.dasheng178.com lenient)
　　
　　if (quoteCharacter != null) {
　　
　　inputFormat.enableQuotedStringParsing(quoteCharacter)
　　
　　}
　　
　　if (ignoreComments != null) {
　　
　　inputFormat.setCommentPrefix(ignoreComments)
　　
　　}
　　
　　inputFormat
　　
　　}
　　
　　override def equals(other: Any): Boolean = other match {
　　
　　case that: CsvTableSource => returnType == that.returnType &&
　　
　　path == that.path &&
　　
　　fieldDelim == that.fieldDelim &&
　　
　　rowDelim == that.rowDelim &&
　　
　　quoteCharacter == that.quoteCharacter &&
　　
　　ignoreFirstLine == that.ignoreFirstLine &&
　　
　　ignoreComments == that.ignoreComments &&
　　
　　lenient == that.lenient
　　
　　case _ => false
　　
　　}
　　
　　override def hashCode(www.hengda157.com): Int = {
　　
　　returnType.hashCode()
　　
　　}
　　
　　override def explainSource(): String = {
　　
　　s"CsvTableSource(" +
　　
　　s"read fields: ${getReturnType.getFieldNames.mkString(", ")})"
　　
　　}
　　
　　}
　　
　　CsvTableSource同时实现了BatchTableSource及StreamTableSource接口；getDataSet方法使用ExecutionEnvironment.createInput创建DataSet；getDataStream方法使用StreamExecutionEnvironment.createInput创建DataStream
　　
　　ExecutionEnvironment.createInput及StreamExecutionEnvironment.createInput接收的InputFormat为RowCsvInputFormat，通过createCsvInput创建而来
　　
　　getTableSchema方法返回的TableSchema通过fieldNames及fieldTypes创建；getReturnType方法返回的RowTypeInfo通过selectedFieldTypes及selectedFieldNames创建；explainSource方法这里返回的是CsvTableSource开头的字符串
　　
　　小结
　　
　　TableSource定义了三个方法，分别是getReturnType、getTableSchema、explainSource；BatchTableSource继承了TableSource，它定义了getDataSet方法；StreamTableSource继承了TableSource，它定义了getDataStream方法
　　
　　CsvTableSource同时实现了BatchTableSource及StreamTableSource接口；getDataSet方法使用ExecutionEnvironment.createInput创建DataSet；getDataStream方法使用StreamExecutionEnvironment.createInput创建DataStream
　　
　　ExecutionEnvironment.createInput及StreamExecutionEnvironment.createInput接收的InputFormat为RowCsvInputFormat，通过createCsvInput创建而来；getTableSchema方法返回的TableSchema通过fieldNames及fieldTypes创建；getReturnType方法返回的RowTypeInfo通过selectedFieldTypes及selectedFieldNames创建；explainSource方法这里返回的是CsvTableSource开头的字符串

聊聊flink的CsvTableSource的更多相关文章

聊聊flink的NetworkEnvironmentConfiguration
本文主要研究一下flink的NetworkEnvironmentConfiguration NetworkEnvironmentConfiguration flink-1.7.2/flink-runt ...
聊聊flink Table的groupBy操作
本文主要研究一下flink Table的groupBy操作 Table.groupBy flink-table_2.11-1.7.0-sources.jar!/org/apache/flink/tab ...
聊聊flink的AsyncWaitOperator
序本文主要研究一下flink的AsyncWaitOperator AsyncWaitOperatorflink-streaming-java_2.11-1.7.0-sources.jar!/org/a ...
聊聊flink的Async I/O
// This example implements the asynchronous request and callback with Futures that have the // inter ...
聊聊flink的log.file配置
本文主要研究一下flink的log.file配置 log4j.properties flink-release-1.6.2/flink-dist/src/main/flink-bin/conf/log ...
[case49]聊聊flink的checkpoint配置
序本文主要研究下flink的checkpoint配置实例 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecut ...
聊聊flink的BlobStoreService
序本文主要研究一下flink的BlobStoreService BlobView flink-release-1.7.2/flink-runtime/src/main/java/org/apache ...
[源码分析] 从源码入手看 Flink Watermark 之传播过程
[源码分析] 从源码入手看 Flink Watermark 之传播过程 0x00 摘要本文将通过源码分析,带领大家熟悉Flink Watermark 之传播过程,顺便也可以对Flink整体逻辑有一个 ...
Flink与Spark Streaming在与kafka结合的区别！
本文主要是想聊聊flink与kafka结合.当然,单纯的介绍flink与kafka的结合呢,比较单调,也没有可对比性,所以的准备顺便帮大家简单回顾一下Spark Streaming与kafka的结合. ...

随机推荐

python基础——字符串
Python的核心数据类型--字符串常见字符串常量和表达式操作解释 s = '' 空字符串 s = "dodo's" 双引号和单引号 s = 'd\no\p\td\x00o' ...
CentOS7安装及配置vsftpd (FTP服务器)
CentOS7安装及配置vsftpd (FTP服务器) 1.安装vsftpd 1 yum -y install vsftpd 2.设置开机启动 1 systemctl enable vsftpd 3. ...
socket编程为什么需要htonl(), ntohl(), ntohs()，htons() 函数-------转载
在C/C++写网络程序的时候,往往会遇到字节的网络顺序和主机顺序的问题.这是就可能用到htons(), ntohl(), ntohs(),htons()这4个函数. 网络字节顺序与本地字节顺序之间的转 ...
thinkphp5登录并保存session、根据不同用户权限跳转不同页面
本文讲述如何在thinkphp5中完成登录并保存session.然后根据不同的用户权限跳转相应页面功能的实现.我也在学习thinkphp源码的路上,记录一下并与大家分享.完成该步骤主要有以下三个步骤完 ...
vue关于img src动态赋值问题
解决方法: 加个require()就可以了 <img :src="require('../assets/images/'+imgsrc+'.png')"/>
3星|麦肯锡合伙人《从1到N》：PPT讲稿，图表不错，讲解不够深入
从1到N:企业数字化生存指南两位作者是麦肯锡合伙人.全书插图比较多,图做的还比较有水平.但是相关文字不够深入,我读后的感觉是:图表不是两位执笔者做的,他们对细节不清楚,对图表涉及到的行业也缺乏深入的 ...
Spring学习（3）：Spring概述（转载）
1. Spring是什么? Spring是一个开源的轻量级Java SE(Java 标准版本)/Java EE(Java 企业版本)开发应用框架,其目的是用于简化企业级应用程序开发. 在面向对象思想中 ...
二维DCT变换
DCT(Discrete Consine Transform),又叫离散余弦变换,它的第二种类型,经常用于信号和图像数据的压缩.经过DCT变换后的数据能量非常集中,一般只有左上角的数值是非零的,也就是 ...
剑指offer-二维数组中的查找01
题目描述在一个二维数组中(每个一维数组的长度相同),每一行都按照从左到右递增的顺序排序,每一列都按照从上到下递增的顺序排序.请完成一个函数,输入这样的一个二维数组和一个整数,判断数组中是否含有该整数 ...
如何把node更新到最新的稳定版本
先装n,再用n把node升级到最新稳定版 $ npm install -g n $ n stable

聊聊flink的CsvTableSource

聊聊flink的CsvTableSource的更多相关文章

随机推荐

热门专题