聊聊flink的CsvTableSource

　　序
　　
　　本文主要研究一下flink的CsvTableSource
　　
　　TableSource
　　
　　flink-table_2.11-1.7.1-sources.jar!/org/apache/flink/table/sources/TableSource.scala
　　
　　trait TableSource[T] {
　　
　　/** Returns the [[TypeInformation]] for the return type of the [[TableSource]].
　　
　　* The fields of the return type are mapped to the table schema based on their name.
　　
　　*
　　
　　* @return The type of the returned [[DataSet]] or [[DataStream]].
　　
　　*/
　　
　　def getReturnType: TypeInformation[T]
　　
　　/**
　　
　　* Returns the schema of the produced table.
　　
　　*
　　
　　* @return The [[TableSchema]] of the produced table.
　　
　　*/
　　
　　def getTableSchema: TableSchema
　　
　　/**
　　
　　* Describes the table source.
　　
　　*
　　
　　* @return A String explaining the [[TableSource]].
　　
　　*/
　　
　　def explainSource(): String =
　　
　　TableConnectorUtil.generateRuntimeName(getClass, getTableSchema.getFieldNames)
　　
　　}
　　
　　TableSource定义了三个方法，分别是getReturnType、getTableSchema、explainSource
　　
　　BatchTableSource
　　
　　flink-table_2.11-1.7.1-sources.jar!/org/apache/flink/table/sources/BatchTableSource.scala
　　
　　trait BatchTableSource[T] extends TableSource[T] {
　　
　　/**
　　
　　* Returns the data of the table as a [[DataSet]].
　　
　　*
　　
　　* NOTE: This method is for internal use only for defining a [[TableSource]].
　　
　　* Do not use it in Table API programs.
　　
　　*/
　　
　　def getDataSet(execEnv: ExecutionEnvironment): DataSet[T]
　　
　　}
　　
　　BatchTableSource继承了TableSource，它定义了getDataSet方法
　　
　　StreamTableSource
　　
　　flink-table_2.11-1.7.1-sources.jar!/org/apache/flink/table/sources/StreamTableSource.scala
　　
　　trait StreamTableSource[T] extends TableSource[T] {
　　
　　/**
　　
　　* Returns the data of the table as a [[DataStream]].
　　
　　*
　　
　　* NOTE: This method is for internal use only for defining a [[TableSource]].
　　
　　* Do not use it in Table API programs.
　　
　　*/
　　
　　def getDataStream(execEnv: StreamExecutionEnvironment): DataStream[T]
　　
　　}
　　
　　StreamTableSource继承了TableSource，它定义了getDataStream方法
　　
　　CsvTableSource
　　
　　flink-table_2.11-1.7.1-sources.jar!/org/apache/flink/table/sources/CsvTableSource.scala
　　
　　class CsvTableSource private (
　　
　　private val path: String,
　　
　　private val fieldNames: Array[String],
　　
　　private val fieldTypes: Array[TypeInformation[_]],
　　
　　private val selectedFields: Array[Int],
　　
　　private val fieldDelim: String,
　　
　　private val rowDelim: String,
　　
　　private val quoteCharacter: Character,
　　
　　private val ignoreFirstLine: Boolean,
　　
　　private val ignoreComments: String,
　　
　　private val lenient: Boolean)
　　
　　extends BatchTableSource[Row]
　　
　　with StreamTableSource[Row]
　　
　　with ProjectableTableSource[Row] {
　　
　　def this(
　　
　　path: String,
　　
　　fieldNames: Array[String],
　　
　　fieldTypes: Array[TypeInformation[_]],
　　
　　fieldDelim: String = CsvInputFormat.DEFAULT_FIELD_DELIMITER,
　　
　　rowDelim: String = CsvInputFormat.DEFAULT_LINE_DELIMITER,
　　
　　quoteCharacter: Character = null,
　　
　　ignoreFirstLine: Boolean = false,
　　
　　ignoreComments: String = null,
　　
　　lenient: Boolean = false)www.michenggw.com = {
　　
　　this(
　　
　　path,
　　
　　fieldNames,
　　
　　fieldTypes,
　　
　　fieldTypes.indices.toArray, // initially, all fields are returned
　　
　　fieldDelim,
　　
　　rowDelim,
　　
　　quoteCharacter,
　　
　　ignoreFirstLine,
　　
　　ignoreComments,
　　
　　lenient)
　　
　　}
　　
　　def this(path: String, fieldNames: Array[String]www.fengshen157.com/, fieldTypes: Array[TypeInformation[_]]) = {
　　
　　this(path, fieldNames, fieldTypes, CsvInputFormat.DEFAULT_FIELD_DELIMITER,
　　
　　CsvInputFormat.DEFAULT_LINE_DELIMITER, null, false, null, false)
　　
　　}
　　
　　if (fieldNames.length != fieldTypes.length) {
　　
　　throw new TableException("Number of field names and field types must be equal.")
　　
　　}
　　
　　private val selectedFieldTypes = selectedFields.map(fieldTypes(_))
　　
　　private val selectedFieldNames = selectedFields.map(fieldNames(_))
　　
　　private val returnType: RowTypeInfo = new RowTypeInfo(selectedFieldTypes, selectedFieldNames)
　　
　　override def getDataSet(execEnv: ExecutionEnvironment): DataSet[Row] = {
　　
　　execEnv.createInput(createCsvInput(), returnType).name(explainSource())
　　
　　}
　　
　　/** Returns the [[RowTypeInfo]] for the return type of the [[CsvTableSource]]. */
　　
　　override def getReturnType: www.leyouzaixian2.com RowTypeInfo = returnType
　　
　　override def getDataStream(streamExecEnv: StreamExecutionEnvironment): DataStream[Row] = {
　　
　　streamExecEnv.createInput(createCsvInput(), returnType).name(explainSource())
　　
　　}
　　
　　/** Returns the schema of the produced table. */
　　
　　override def getTableSchema = new TableSchema(fieldNames, fieldTypes)
　　
　　/** Returns a copy of [[TableSource]] with ability to project fields */
　　
　　override def projectFields(fields: Array[Int]): CsvTableSource = {
　　
　　val selectedFields = if (fields.isEmpty) Array(0) else fields
　　
　　new CsvTableSource(
　　
　　path,
　　
　　fieldNames,
　　
　　fieldTypes,
　　
　　selectedFields,
　　
　　fieldDelim,
　　
　　rowDelim,
　　
　　quoteCharacter,
　　
　　ignoreFirstLine,
　　
　　ignoreComments,
　　
　　lenient)
　　
　　}
　　
　　private def createCsvInput(): RowCsvInputFormat = {
　　
　　val inputFormat = new RowCsvInputFormat(
　　
　　new Path(path),
　　
　　selectedFieldTypes,
　　
　　rowDelim,
　　
　　fieldDelim,
　　
　　selectedFields)
　　
　　inputFormat.setSkipFirstLineAsHeader(ignoreFirstLine)
　　
　　inputFormat.setLenient(www.dasheng178.com lenient)
　　
　　if (quoteCharacter != null) {
　　
　　inputFormat.enableQuotedStringParsing(quoteCharacter)
　　
　　}
　　
　　if (ignoreComments != null) {
　　
　　inputFormat.setCommentPrefix(ignoreComments)
　　
　　}
　　
　　inputFormat
　　
　　}
　　
　　override def equals(other: Any): Boolean = other match {
　　
　　case that: CsvTableSource => returnType == that.returnType &&
　　
　　path == that.path &&
　　
　　fieldDelim == that.fieldDelim &&
　　
　　rowDelim == that.rowDelim &&
　　
　　quoteCharacter == that.quoteCharacter &&
　　
　　ignoreFirstLine == that.ignoreFirstLine &&
　　
　　ignoreComments == that.ignoreComments &&
　　
　　lenient == that.lenient
　　
　　case _ => false
　　
　　}
　　
　　override def hashCode(www.hengda157.com): Int = {
　　
　　returnType.hashCode()
　　
　　}
　　
　　override def explainSource(): String = {
　　
　　s"CsvTableSource(" +
　　
　　s"read fields: ${getReturnType.getFieldNames.mkString(", ")})"
　　
　　}
　　
　　}
　　
　　CsvTableSource同时实现了BatchTableSource及StreamTableSource接口；getDataSet方法使用ExecutionEnvironment.createInput创建DataSet；getDataStream方法使用StreamExecutionEnvironment.createInput创建DataStream
　　
　　ExecutionEnvironment.createInput及StreamExecutionEnvironment.createInput接收的InputFormat为RowCsvInputFormat，通过createCsvInput创建而来
　　
　　getTableSchema方法返回的TableSchema通过fieldNames及fieldTypes创建；getReturnType方法返回的RowTypeInfo通过selectedFieldTypes及selectedFieldNames创建；explainSource方法这里返回的是CsvTableSource开头的字符串
　　
　　小结
　　
　　TableSource定义了三个方法，分别是getReturnType、getTableSchema、explainSource；BatchTableSource继承了TableSource，它定义了getDataSet方法；StreamTableSource继承了TableSource，它定义了getDataStream方法
　　
　　CsvTableSource同时实现了BatchTableSource及StreamTableSource接口；getDataSet方法使用ExecutionEnvironment.createInput创建DataSet；getDataStream方法使用StreamExecutionEnvironment.createInput创建DataStream
　　
　　ExecutionEnvironment.createInput及StreamExecutionEnvironment.createInput接收的InputFormat为RowCsvInputFormat，通过createCsvInput创建而来；getTableSchema方法返回的TableSchema通过fieldNames及fieldTypes创建；getReturnType方法返回的RowTypeInfo通过selectedFieldTypes及selectedFieldNames创建；explainSource方法这里返回的是CsvTableSource开头的字符串

聊聊flink的CsvTableSource的更多相关文章

聊聊flink的NetworkEnvironmentConfiguration
本文主要研究一下flink的NetworkEnvironmentConfiguration NetworkEnvironmentConfiguration flink-1.7.2/flink-runt ...
聊聊flink Table的groupBy操作
本文主要研究一下flink Table的groupBy操作 Table.groupBy flink-table_2.11-1.7.0-sources.jar!/org/apache/flink/tab ...
聊聊flink的AsyncWaitOperator
序本文主要研究一下flink的AsyncWaitOperator AsyncWaitOperatorflink-streaming-java_2.11-1.7.0-sources.jar!/org/a ...
聊聊flink的Async I/O
// This example implements the asynchronous request and callback with Futures that have the // inter ...
聊聊flink的log.file配置
本文主要研究一下flink的log.file配置 log4j.properties flink-release-1.6.2/flink-dist/src/main/flink-bin/conf/log ...
[case49]聊聊flink的checkpoint配置
序本文主要研究下flink的checkpoint配置实例 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecut ...
聊聊flink的BlobStoreService
序本文主要研究一下flink的BlobStoreService BlobView flink-release-1.7.2/flink-runtime/src/main/java/org/apache ...
[源码分析] 从源码入手看 Flink Watermark 之传播过程
[源码分析] 从源码入手看 Flink Watermark 之传播过程 0x00 摘要本文将通过源码分析,带领大家熟悉Flink Watermark 之传播过程,顺便也可以对Flink整体逻辑有一个 ...
Flink与Spark Streaming在与kafka结合的区别！
本文主要是想聊聊flink与kafka结合.当然,单纯的介绍flink与kafka的结合呢,比较单调,也没有可对比性,所以的准备顺便帮大家简单回顾一下Spark Streaming与kafka的结合. ...

随机推荐

Android：制作聊天气泡点9图
步骤一:选择res下的一张图片,右击选择“Create 9-Patch File” 步骤二:确定点9图的名字,只能修改.9.png之前的信息步骤三:在同目录下会生成刚才创建的点9图,双击打开进行编辑 ...
Redis可视化客户端管理Web UI工具收集
https://github.com/uglide/RedisDesktopManager(推荐,全平台支持的桌面UI工具) Web方案: https://github.com/ErikDubbelb ...
angular之$watch() $watchGroup()和$watchCollection()
$watch $watch主要是用来监听一个对象,在对象发生变化时触发某个事件. 用法: $scope.$watch(watchFn,watchAction, deepWatch) 接下来讲一下这几个 ...
Django之Models的class Meta
模型元数据是“任何不是字段的数据”,比如排序选项(ordering),数据库表名(db_table)或者人类可读的单复数名称(verbose_name 和verbose_name_plural).在模 ...
Mac下布置appium环境
1.下载或者更新Homebrew:homebrew官网 macOS 不可或缺的套件管理器 $ /usr/bin/ruby -e "$(curl -fsSL https://raw.githu ...
java计算工龄
计算工龄原则:若是2000-10-12作为开始工作时间,则到下一年的2001-10-13算为一年.有个bug,不满一年的工龄是错误的. import java.util.Date;import jav ...
小白初识 - 归并排序（MergeSort）
归并排序是一种典型的用分治的思想解决问题的排序方式. 它的原理就是:将一个数组从中间分成两半,对分开的两半再分成两半,直到最终分到最小的单位(即单个元素)的时候, 将已经分开的数据两两合并,并且在合并 ...
《算法图解》——第十章 K最近邻算法
第十章 K最近邻算法 1 K最近邻(k-nearest neighbours,KNN)——水果分类 2 创建推荐系统利用相似的用户相距较近,但如何确定两位用户的相似程度呢? ①特征抽取对水果 ...
html页面中完成查找功能
最近在搞一个被很多人改了的框架,天天看代码看的头的晕了,不过感觉进步还挺大的,自己做了一个后台可配置前台查看两个库不同数据范围的东西,还挺满意,那天拿出来分享一下,今天先说一个这几天做的功能,就是ht ...
dede 后台登录以后一片空白
网上说的是找到:include/common.inc.php文件,打开,查找程序代码: //error_reporting(E_ALL); error_reporting(E_ALL || ~E_ ...

聊聊flink的CsvTableSource

聊聊flink的CsvTableSource的更多相关文章

随机推荐

热门专题