聊聊flink的CsvTableSource
序
本文主要研究一下flink的CsvTableSource
TableSource
flink-table_2.11-1.7.1-sources.jar!/org/apache/flink/table/sources/TableSource.scala
trait TableSource[T] {
/** Returns the [[TypeInformation]] for the return type of the [[TableSource]].
* The fields of the return type are mapped to the table schema based on their name.
*
* @return The type of the returned [[DataSet]] or [[DataStream]].
*/
def getReturnType: TypeInformation[T]
/**
* Returns the schema of the produced table.
*
* @return The [[TableSchema]] of the produced table.
*/
def getTableSchema: TableSchema
/**
* Describes the table source.
*
* @return A String explaining the [[TableSource]].
*/
def explainSource(): String =
TableConnectorUtil.generateRuntimeName(getClass, getTableSchema.getFieldNames)
}
TableSource定义了三个方法,分别是getReturnType、getTableSchema、explainSource
BatchTableSource
flink-table_2.11-1.7.1-sources.jar!/org/apache/flink/table/sources/BatchTableSource.scala
trait BatchTableSource[T] extends TableSource[T] {
/**
* Returns the data of the table as a [[DataSet]].
*
* NOTE: This method is for internal use only for defining a [[TableSource]].
* Do not use it in Table API programs.
*/
def getDataSet(execEnv: ExecutionEnvironment): DataSet[T]
}
BatchTableSource继承了TableSource,它定义了getDataSet方法
StreamTableSource
flink-table_2.11-1.7.1-sources.jar!/org/apache/flink/table/sources/StreamTableSource.scala
trait StreamTableSource[T] extends TableSource[T] {
/**
* Returns the data of the table as a [[DataStream]].
*
* NOTE: This method is for internal use only for defining a [[TableSource]].
* Do not use it in Table API programs.
*/
def getDataStream(execEnv: StreamExecutionEnvironment): DataStream[T]
}
StreamTableSource继承了TableSource,它定义了getDataStream方法
CsvTableSource
flink-table_2.11-1.7.1-sources.jar!/org/apache/flink/table/sources/CsvTableSource.scala
class CsvTableSource private (
private val path: String,
private val fieldNames: Array[String],
private val fieldTypes: Array[TypeInformation[_]],
private val selectedFields: Array[Int],
private val fieldDelim: String,
private val rowDelim: String,
private val quoteCharacter: Character,
private val ignoreFirstLine: Boolean,
private val ignoreComments: String,
private val lenient: Boolean)
extends BatchTableSource[Row]
with StreamTableSource[Row]
with ProjectableTableSource[Row] {
def this(
path: String,
fieldNames: Array[String],
fieldTypes: Array[TypeInformation[_]],
fieldDelim: String = CsvInputFormat.DEFAULT_FIELD_DELIMITER,
rowDelim: String = CsvInputFormat.DEFAULT_LINE_DELIMITER,
quoteCharacter: Character = null,
ignoreFirstLine: Boolean = false,
ignoreComments: String = null,
lenient: Boolean = false)www.michenggw.com = {
this(
path,
fieldNames,
fieldTypes,
fieldTypes.indices.toArray, // initially, all fields are returned
fieldDelim,
rowDelim,
quoteCharacter,
ignoreFirstLine,
ignoreComments,
lenient)
}
def this(path: String, fieldNames: Array[String]www.fengshen157.com/, fieldTypes: Array[TypeInformation[_]]) = {
this(path, fieldNames, fieldTypes, CsvInputFormat.DEFAULT_FIELD_DELIMITER,
CsvInputFormat.DEFAULT_LINE_DELIMITER, null, false, null, false)
}
if (fieldNames.length != fieldTypes.length) {
throw new TableException("Number of field names and field types must be equal.")
}
private val selectedFieldTypes = selectedFields.map(fieldTypes(_))
private val selectedFieldNames = selectedFields.map(fieldNames(_))
private val returnType: RowTypeInfo = new RowTypeInfo(selectedFieldTypes, selectedFieldNames)
override def getDataSet(execEnv: ExecutionEnvironment): DataSet[Row] = {
execEnv.createInput(createCsvInput(), returnType).name(explainSource())
}
/** Returns the [[RowTypeInfo]] for the return type of the [[CsvTableSource]]. */
override def getReturnType: www.leyouzaixian2.com RowTypeInfo = returnType
override def getDataStream(streamExecEnv: StreamExecutionEnvironment): DataStream[Row] = {
streamExecEnv.createInput(createCsvInput(), returnType).name(explainSource())
}
/** Returns the schema of the produced table. */
override def getTableSchema = new TableSchema(fieldNames, fieldTypes)
/** Returns a copy of [[TableSource]] with ability to project fields */
override def projectFields(fields: Array[Int]): CsvTableSource = {
val selectedFields = if (fields.isEmpty) Array(0) else fields
new CsvTableSource(
path,
fieldNames,
fieldTypes,
selectedFields,
fieldDelim,
rowDelim,
quoteCharacter,
ignoreFirstLine,
ignoreComments,
lenient)
}
private def createCsvInput(): RowCsvInputFormat = {
val inputFormat = new RowCsvInputFormat(
new Path(path),
selectedFieldTypes,
rowDelim,
fieldDelim,
selectedFields)
inputFormat.setSkipFirstLineAsHeader(ignoreFirstLine)
inputFormat.setLenient(www.dasheng178.com lenient)
if (quoteCharacter != null) {
inputFormat.enableQuotedStringParsing(quoteCharacter)
}
if (ignoreComments != null) {
inputFormat.setCommentPrefix(ignoreComments)
}
inputFormat
}
override def equals(other: Any): Boolean = other match {
case that: CsvTableSource => returnType == that.returnType &&
path == that.path &&
fieldDelim == that.fieldDelim &&
rowDelim == that.rowDelim &&
quoteCharacter == that.quoteCharacter &&
ignoreFirstLine == that.ignoreFirstLine &&
ignoreComments == that.ignoreComments &&
lenient == that.lenient
case _ => false
}
override def hashCode(www.hengda157.com): Int = {
returnType.hashCode()
}
override def explainSource(): String = {
s"CsvTableSource(" +
s"read fields: ${getReturnType.getFieldNames.mkString(", ")})"
}
}
CsvTableSource同时实现了BatchTableSource及StreamTableSource接口;getDataSet方法使用ExecutionEnvironment.createInput创建DataSet;getDataStream方法使用StreamExecutionEnvironment.createInput创建DataStream
ExecutionEnvironment.createInput及StreamExecutionEnvironment.createInput接收的InputFormat为RowCsvInputFormat,通过createCsvInput创建而来
getTableSchema方法返回的TableSchema通过fieldNames及fieldTypes创建;getReturnType方法返回的RowTypeInfo通过selectedFieldTypes及selectedFieldNames创建;explainSource方法这里返回的是CsvTableSource开头的字符串
小结
TableSource定义了三个方法,分别是getReturnType、getTableSchema、explainSource;BatchTableSource继承了TableSource,它定义了getDataSet方法;StreamTableSource继承了TableSource,它定义了getDataStream方法
CsvTableSource同时实现了BatchTableSource及StreamTableSource接口;getDataSet方法使用ExecutionEnvironment.createInput创建DataSet;getDataStream方法使用StreamExecutionEnvironment.createInput创建DataStream
ExecutionEnvironment.createInput及StreamExecutionEnvironment.createInput接收的InputFormat为RowCsvInputFormat,通过createCsvInput创建而来;getTableSchema方法返回的TableSchema通过fieldNames及fieldTypes创建;getReturnType方法返回的RowTypeInfo通过selectedFieldTypes及selectedFieldNames创建;explainSource方法这里返回的是CsvTableSource开头的字符串
聊聊flink的CsvTableSource的更多相关文章
- 聊聊flink的NetworkEnvironmentConfiguration
本文主要研究一下flink的NetworkEnvironmentConfiguration NetworkEnvironmentConfiguration flink-1.7.2/flink-runt ...
- 聊聊flink Table的groupBy操作
本文主要研究一下flink Table的groupBy操作 Table.groupBy flink-table_2.11-1.7.0-sources.jar!/org/apache/flink/tab ...
- 聊聊flink的AsyncWaitOperator
序本文主要研究一下flink的AsyncWaitOperator AsyncWaitOperatorflink-streaming-java_2.11-1.7.0-sources.jar!/org/a ...
- 聊聊flink的Async I/O
// This example implements the asynchronous request and callback with Futures that have the // inter ...
- 聊聊flink的log.file配置
本文主要研究一下flink的log.file配置 log4j.properties flink-release-1.6.2/flink-dist/src/main/flink-bin/conf/log ...
- [case49]聊聊flink的checkpoint配置
序 本文主要研究下flink的checkpoint配置 实例 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecut ...
- 聊聊flink的BlobStoreService
序 本文主要研究一下flink的BlobStoreService BlobView flink-release-1.7.2/flink-runtime/src/main/java/org/apache ...
- [源码分析] 从源码入手看 Flink Watermark 之传播过程
[源码分析] 从源码入手看 Flink Watermark 之传播过程 0x00 摘要 本文将通过源码分析,带领大家熟悉Flink Watermark 之传播过程,顺便也可以对Flink整体逻辑有一个 ...
- Flink与Spark Streaming在与kafka结合的区别!
本文主要是想聊聊flink与kafka结合.当然,单纯的介绍flink与kafka的结合呢,比较单调,也没有可对比性,所以的准备顺便帮大家简单回顾一下Spark Streaming与kafka的结合. ...
随机推荐
- cdh中hdfs非ha环境迁移Namenode与secondaryNamenode,从uc机器到阿里;
1.停掉外部接入服务: 2 NameNode Metadata备份: 2.1 备份fsimage数据,(该操作适用HA和非HA的NameNode),使用如下命令进行备份: [root@cdh01 df ...
- 【替罪羊树】bzoj3224&luogu3369&cogs1829 [Tyvj 1728]普通平衡树
[替罪羊树]bzoj3224&luogu3369&cogs1829 [Tyvj 1728]普通平衡树 bzoj 洛谷 cogs 先长点芝士 替罪羊树也是一种很好写的平衡树qwq..替罪 ...
- 基于Cocos2d-x-1.0.1的飞机大战游戏开发实例(上)
最近接触过几个版本的cocos2dx,决定每个大变动的版本都尝试一下.本实例模仿微信5.0版本中的飞机大战游戏,如图: 一.工具 1.素材:飞机大战的素材(图片.声音等)来自于网络 2.引擎:coco ...
- selenium自动化之js处理点击事件失效
有时候,元素明明已经找到了,使用click()就是无法触发点击事件(当然,这种情况十分少见,至少我只遇到过一次).下面告诉大家这种场景的解决方案. 使用js代码来点击[博客园]这个按钮 代码: #!/ ...
- Unity3D之AR开发(二)
上一篇给大家介绍了高通AR的使用,接下来给大家分享一下EasyAR EasyAR引擎简介 EasyAR是做好用的且免费的增强现实(Augmented Reality)引擎,EasyAR为Unity开发 ...
- JY播放器【蜻蜓FM电脑端,附带下载功能】
今天给大家带来一款神器----JY播放器.可以不用打开网页就在电脑端听蜻蜓FM的节目,而且可以直接下载,对于我这种强迫症患者来说真的是神器.我是真的不喜欢电脑任务栏上面密密麻麻. 目前已经支持平台(蜻 ...
- 高可用Kubernetes集群-8. 部署kube-scheduler
十.部署kube-scheduler kube-scheduler是Kube-Master相关的3个服务之一,是有状态的服务,会修改集群的状态信息. 如果多个master节点上的相关服务同时生效,则会 ...
- MSCOCO - pycocoDemo 学习版
Reference: https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoDemo.ipynb https://git ...
- unknown2
结对作业 本次结对:211606457 郑沐榕.211406242 杨长元 一.预估与实际 PSP2.1 Personal Software Process Stages 预估耗时(分钟) 实际耗时( ...
- 《C》指针
储存单元: 不同类型的数据所占用的字节不同,上面一个长方形格子表示4个字节 变量: 变量的值,就是存储的内容.变量的名就相当于地址的名.根据变量类型分配空间:通过变量名引用变量的值,程序经过编译将变量 ...