The data flow in Scrapy is controlled by the execution engine, and goes like this:
1. The Engine gets the initial Requests to crawl from the Spider.
2. The Engine schedules the Requests in the Scheduler and asks for the next Requests to crawl.
3. The Scheduler returns the next Requests to the Engine.
4. The Engine sends the Requests to the Downloader, passing through the Downloader Middlewares (see
process_request()).
5. Once the page finishes downloading the Downloader generates a Response (with that page) and sends it to the
Engine, passing through the Downloader Middlewares (see process_response()).
6. The Engine receives the Response from the Downloader and sends it to the Spider for processing, passing
through the Spider Middleware (see process_spider_input()).
7. The Spider processes the Response and returns scraped items and new Requests (to follow) to the Engine,
passing through the Spider Middleware (see process_spider_output()).
8. The Engine sends processed items to Item Pipelines, then send processed Requests to the Scheduler and asks
for possible next Requests to crawl.
9. The process repeats (from step 1) until there are no more requests from the Scheduler.

scrapy Data flow的更多相关文章

  1. SSIS Data Flow优化

    一,数据流设计优化 数据流有两个特性:流和在内存缓冲区中处理数据,根据数据流的这两个特性,对数据流进行优化. 1,流,同时对数据进行提取,转换和加载操作 流,就是在source提取数据时,转换组件处理 ...

  2. Data Flow的Error Output

    一,在Data Flow Task中,对于Error Row的处理通过Error Output Tab配置的. 1,操作失败的类型:Error(Conversion) 和 Truncation. 2, ...

  3. SSIS Data Flow 的 Execution Tree 和 Data Pipeline

    一,Execution Tree 执行树是数据流组件(转换和适配器)基于同步关系所建立的逻辑分组,每一个分组都是一个执行树的开始和结束,也可以将执行树理解为一个缓冲区的开始和结束,即缓冲区的整个生命周 ...

  4. SSIS的 Data Flow 和 Control Flow

    Control Flow 和 Data Flow,是SSIS Design中主要用到的两个Tab,理解这两个Tab的作用,对设计更高效的package十分重要. 一,Control Flow 在Con ...

  5. Intel® Threading Building Blocks (Intel® TBB) Developer Guide 中文 Parallelizing Data Flow and Dependence Graphs并行化data flow和依赖图

    https://www.threadingbuildingblocks.org/docs/help/index.htm Parallelizing Data Flow and Dependency G ...

  6. SSIS ->> Data Flow Design And Tuning

    Requirements: Source and destination system impact Processing time windows and performance Destinati ...

  7. SSIS ->> Control Flow And Data Flow

    In the Control Flow, the task is the smallest unit of work, and a task requires completion (success, ...

  8. Data Flow ->> Union All

    Wrox的<Professional Microsoft SQL Server 2012 Integration Services>一书中再讲Merge的时候有这样一段解释: This t ...

  9. Data Flow ->> Import Column & Export Column

    这两个transformation的作用是把DT_TEXT, DT_NTEXT, DT_IMAGE类型的数据在文件系统和数据库间导出或者导入.比如把某个数据库表的image类型的字段导出到文件系统成为 ...

随机推荐

  1. 28.Hibernate-HQL查询.md

    目录 1.概述 2.HQL实例 3.Criteria 查询 4.SQL本地查询 5.分页 1.概述 1)Get/load主键查询 2)对象导航查询 3)HQL查询, Hibernate Query l ...

  2. [maven] "Dynamic Web Module 3.0 requires Java 1.6 or newer." OR "JAX-RS (REST Web Services) 2.0 requires Java 1.6 or newer."

    在网上下载的开源工程,用maven构建的时候报错: Dynamic Web Module 3.0 requires Java 1.6 or newer. JAX-RS (REST Web Servic ...

  3. C# WinForm窗体及其控件的自适应

    3步骤: 1.在需要自适应的Form中实例化全局变量   AutoSizeFormClass.cs源码在下方 AutoSizeFormClass asc = new AutoSizeFormClass ...

  4. IP路由实验之---Telnet远程登陆

    实验设备:一台华三路由器,一台PC 骤一,为路由器端口配置 IP 地址 <H3C>system-view #进入系统视图 [H3C] / #进入0/0端口 [H3C-Ethernet-/] ...

  5. 十三、Visitor 访问者设计模式

    需求:将数据结果与处理分开 设计原理: 代码清单: Element public interface Element { void accept(Visitor visitor); } Entry p ...

  6. wdk驱动开发的特点

    本文介绍WDK开发的一些特点.与应用层开发的差异性,不能混为一谈. 一.函数的调用点 在内核编程中,一个函数往往有多个调用点,而应用层中一个函数一般只在main里面有调用点.内核函数调用点一般在: 1 ...

  7. iOS之Safari调试webView/H5页面

    之前做过混合开发,用的是JavaScriptCore+OC+UIWebView. Safari调试功能真的很有用,通过它可以轻松定位问题的所在,下面说说怎么调试. 开启Safari开发菜单 在Mac的 ...

  8. 关于requests库中文编码问题

    转自:代码分析Python requests库中文编码问题 Python reqeusts在作为代理爬虫节点抓取不同字符集网站时遇到的一些问题总结. 简单说就是中文乱码的问题.   如果单纯的抓取微博 ...

  9. 定义java中的变量

    四种类型 1.整数 2.小数 3.字符 4.布尔值 八种 整数(byte   字节1   范围-128~127 )    (short   字节 2)    (int    字节4)     (lon ...

  10. [字符串]TrBBnsformBBtion

    TrBBnsformBBtion Let us consider the following operations on a string consisting of A and B: Select ...