SSIS ->> Control Flow And Data Flow
In the Control Flow, the task is the smallest unit of work, and a task requires completion (success, failure, or just completion) before subsequent tasks are handled.
- Workflow orchestration
- Process-oriented
- Serial or parallel tasks execution
- Synchronous processing
In the Data Flow, the transformation and the adapter are the basic components;
- Information-oriented
- Data correlation and transformation
- Coordinated processing
- Streaming in nature
- Source extraction and destination loading
Multiple components are running at the same time because the Data Flow Transformations are working together in a coordinated streaming fashion, and the data is being transformed in groups (called buffers) as it is passed down from the source to the subsequent transformations.
- Data buffer architecture
- Transformation types
- Transformation communication
- Execution tree
Instead of data being passed down through the transformations, groups of transformations pass over the buffers of data and make in-place changes as defi ned by the transformations.
Blocking nature: Non-blocking (sometimes called streaming), semi-blocking, blocking Communication mechanism: Synchronous and asynchronous
All transformations fall into one of three categories: non-blocking, semi-blocking, or blocking. These terms describe whether data in a transformation is passed downstream in the pipeline immediately, in increments, or after all the data is fully received.
Non-Blocking, Streaming, and Row-Based Transformations
Most of the SSIS transformations are non-blocking. This means that the transformation logic applied in the transformation does not impede the data from moving on to the next transformation after the transformation logic is applied to the row. Two categories of non-blocking transformations exist: streaming and row-based. The difference is whether the SSIS transformation can use internal information and processes to handle its work or whether the transformation has to call an external process to retrieve information it needs for the work. Some transformations can be categorized as streaming or row-based depending on their configuration, which are indicated in the list below.
Streaming transformations are usually able to apply transformation logic quickly, using precached data and processing calculations within the row being worked on. In these transformations, it is usually the case that a transformation will not slip behind the rate of the data being fed to it. These transformations focus their resources on the CPUs, which in most cases are not the bottleneck of an ETL system.
Audit
Cache
Transform
Character Map
Conditional Split
Copy Column
Data Conversion
Derived Column
Lookup (with a full-cache setting)
Multicast
Percent Sampling
Row Count
Script Component (provided the script is not confi gured with an asynchronous output)
Union All (the Union All acts like a streaming transformation but is actually a semi- blocking transformation because it communicates asynchronously)
Row-based:
DQS Cleansing
Export Column
Import Column
Lookup (with a no-cache or partial-cache setting)
OLE DB Command Script Component (where the script interacts with an external component)
Slowly Changing Dimension (each row is looked up against the dimension in the database)
Semi-Blocking Transformations are the ones that hold up records in the Data Flow for a period of time before allowing the memory buffers to be passed downstream.
Data Mining Query
Merge
Merge Join
Pivot
Term Lookup
Unpivot
Union All (also included in the streaming transformations list, but under the covers, the Union All is semi-blocking)
SSIS 2012 can throttle the sources by limiting the requests from the upstream transformations and sources, thereby preventing SSIS from getting into an out-of-memory situation. \
Blocking Transformations
These components require a complete review of the upstream data before releasing any row downstream to the connected transformations and destinations.
Aggregate
Fuzzy Grouping
Fuzzy Lookup
Row Sampling
Sort
Term Extraction
Script Component (when confi gured to receive all rows before sending any downstream)
Synchronous and Asynchronous Transformation Outputs
synchronous and asynchronous refer more to the relationship between the Input and Output Component connections and buffers.
A transformation output is asynchronous if the buffers used in the input are different from the buffers used in the output. In other words, many of the transformations cannot both perform the specifi ed operation and preserve the buffers (the number of rows or the order of the rows), so a copy of the data must be made to accomplish the desired effect.
All the semi-blocking and blocking transformations already listed have asynchronous outputs by defi nition — none of them can pass input buffers on downstream because the data is held up for processing and reorganized.
A synchronous transformation is one in which the buffers are immediately handed off to the next downstream transformation at the completion of the transformation logic.
Both the Multicast and the Conditional Split can have multiple outputs, but all the outputs are synchronous.
With the exception of the Union All, it functions like a streaming transformation, is really an asynchronous transformation.
Synchronous transformation outputs preserve the sort order of incoming data, whereas some of the asynchronous transformations do not. The Sort, Merge, and Merge Join asynchronous components, of course, have sorted outputs because of their nature, but the Union All, for example, does not.
An execution tree is a logical grouping of Data Flow Components (transformations and adapters) based on their synchronous relationship to one another. Groupings are delineated by asynchronous component outputs that indicate the completion of one execution tree and the start of the next.
the process thread scheduler can assign more than one thread to a single execution tree if threads are available and the execution tree requires intense processor utilization. Each transformation can receive a single thread, so if an execution tree has only two components that participate, then the execution tree can have a maximum of two threads. In addition, each source adapter receives a separate thread.
It is important to modify the EngineThreads property of the Data Flow so that the execution trees are not sharing process threads, and extra threads are available for large or complex execution trees. Furthermore, all the execution trees in a package share the number of processor threads allocated in the EngineThreads property of the Data Flow. A single thread or multiple threads are assigned to an execution tree based on availability of threads and complexity of the execution tree.
The value for EngineThreads does not include the threads allocated for the number of sources in a Data Flow, which are automatically allocated separate threads.
SSIS ->> Control Flow And Data Flow的更多相关文章
- SSIS的 Data Flow 和 Control Flow
Control Flow 和 Data Flow,是SSIS Design中主要用到的两个Tab,理解这两个Tab的作用,对设计更高效的package十分重要. 一,Control Flow 在Con ...
- Spring Cloud Data Flow整合Cloudfoundry UAA服务做权限控制
我最新最全的文章都在南瓜慢说 www.pkslow.com,欢迎大家来喝茶! 1 前言 关于Spring Cloud Data Flow这里不多介绍,有兴趣可以看下面的文章.本文主要介绍如何整合Dat ...
- SSIS Data Flow 的 Execution Tree 和 Data Pipeline
一,Execution Tree 执行树是数据流组件(转换和适配器)基于同步关系所建立的逻辑分组,每一个分组都是一个执行树的开始和结束,也可以将执行树理解为一个缓冲区的开始和结束,即缓冲区的整个生命周 ...
- [转]Data Flow How-to Topics (SSIS)
本文转自:http://technet.microsoft.com/en-us/library/ms137612(v=sql.90).aspx This section contains proced ...
- SSIS Data Flow优化
一,数据流设计优化 数据流有两个特性:流和在内存缓冲区中处理数据,根据数据流的这两个特性,对数据流进行优化. 1,流,同时对数据进行提取,转换和加载操作 流,就是在source提取数据时,转换组件处理 ...
- 微软BI 之SSIS 系列 - 理解Data Flow Task 中的同步与异步, 阻塞,半阻塞和全阻塞以及Buffer 缓存概念
开篇介绍 在 SSIS Dataflow 数据流中的组件可以分为 Synchronous 同步和 Asynchronous 异步这两种类型. 同步与异步 Synchronous and Asynchr ...
- SSIS ->> Data Flow Design And Tuning
Requirements: Source and destination system impact Processing time windows and performance Destinati ...
- Data Flow的Error Output
一,在Data Flow Task中,对于Error Row的处理通过Error Output Tab配置的. 1,操作失败的类型:Error(Conversion) 和 Truncation. 2, ...
- Intel® Threading Building Blocks (Intel® TBB) Developer Guide 中文 Parallelizing Data Flow and Dependence Graphs并行化data flow和依赖图
https://www.threadingbuildingblocks.org/docs/help/index.htm Parallelizing Data Flow and Dependency G ...
随机推荐
- jQuery+css3弹出框插件
先来看DEMO:https://codepen.io/jonechen/pen/regjGG 插件的开发很简单,运用了CSS3的动画效果,并且弹出框的内容可以自定义.插件的默认配置参数有三个: var ...
- 向Array中添加堆排序
堆排序思路 堆排序是一种树形选择排序方法(注意下标是从1开始的,也就是R[1...n]). 1) 初始堆: 将原始数组调整成大根堆的方法——筛选算法:比较R[2i].R[2i+1]和R[i],将最大者 ...
- 延迟加载 ERROR org.hibernate.LazyInitializationException:42 - could not initialize proxy - ...
no Session问题,即延迟加载 延迟加载的问题是指当我们调用完action中的某个方法,在jsp页面要显示我们想要的信息的时候,发现在dao中打开的session已经关闭了. 如下图,第一个箭头 ...
- windows phone和android,ios的touch事件兼容
1.开发背景 最近用html5写了个小游戏,中间踩过无数坑,有很多甚至百度都百度不到答案,可见html5还真是不成熟,兼容性的复杂度比ie6有过之而无不及,性能那个渣简直无力吐槽.. 好了,吐槽结束, ...
- python 可变参数
原文地址:http://docs.pythontab.com/python/python3.4/controlflow.html#tut-functions 一个最不常用的选择是可以让函数调用可变个数 ...
- linux I/O
一) I/O调度程序的总结 1) 当向设备写入数据块或是从设备读出数据块时,请求都被安置在一个队列中等待完成. 2) 每个块设备都有它自己的队列. 3) I/O调度程序负责维护 ...
- wordpress无法安装这个包。: PCLZIP_ERR_MISSING_FILE (-4) : Missing archive file 'C:\WINDOWS\TEMP/wordpress-4.tmp'
朋友的wp博客好久没管理了,让ytkah帮忙打理一下,进到后台发现版本还是3.9的,那是比较早以前的版本了,早该升级了. 在升级wordpress时出现以下错误: 无法安装这个包: PCLZIP_ER ...
- 通过HTML条件注释判断IE版本的HTML语句详解<!--[if IE]> <![endif]-->
我们常常会在网页的HTML里面看到形如[if lte IE 9]……[endif]的代码,表示的是限定某些浏览器版本才能执行的语句,那么这些判断语句的规则是什么呢?请看下文: <!--[if ! ...
- POC
大概就是原型验证的意思 验证概念 编辑 概念验证(Proof of concept,简称POC)是对某些想法的一个不完整的实现,以证明其可行性,示范其原理,其目的是为了验证一些概念或理论.在计算机安全 ...
- 为什么主流网站无法捕获 XSS 漏洞?
二十多年来,跨站脚本(简称 XSS)漏洞一直是主流网站的心头之痛.为什么过了这么久,这些网站还是对此类漏洞束手无策呢? 对于最近 eBay 网站曝出的跨站脚本漏洞,你有什么想法?为什么会出现这样的漏网 ...