scrapy Data flow

The data flow in Scrapy is controlled by the execution engine, and goes like this:
1. The Engine gets the initial Requests to crawl from the Spider.
2. The Engine schedules the Requests in the Scheduler and asks for the next Requests to crawl.
3. The Scheduler returns the next Requests to the Engine.
4. The Engine sends the Requests to the Downloader, passing through the Downloader Middlewares (see
process_request()).
5. Once the page finishes downloading the Downloader generates a Response (with that page) and sends it to the
Engine, passing through the Downloader Middlewares (see process_response()).
6. The Engine receives the Response from the Downloader and sends it to the Spider for processing, passing
through the Spider Middleware (see process_spider_input()).
7. The Spider processes the Response and returns scraped items and new Requests (to follow) to the Engine,
passing through the Spider Middleware (see process_spider_output()).
8. The Engine sends processed items to Item Pipelines, then send processed Requests to the Scheduler and asks
for possible next Requests to crawl.
9. The process repeats (from step 1) until there are no more requests from the Scheduler.
scrapy Data flow的更多相关文章
- SSIS Data Flow优化
一,数据流设计优化 数据流有两个特性:流和在内存缓冲区中处理数据,根据数据流的这两个特性,对数据流进行优化. 1,流,同时对数据进行提取,转换和加载操作 流,就是在source提取数据时,转换组件处理 ...
- Data Flow的Error Output
一,在Data Flow Task中,对于Error Row的处理通过Error Output Tab配置的. 1,操作失败的类型:Error(Conversion) 和 Truncation. 2, ...
- SSIS Data Flow 的 Execution Tree 和 Data Pipeline
一,Execution Tree 执行树是数据流组件(转换和适配器)基于同步关系所建立的逻辑分组,每一个分组都是一个执行树的开始和结束,也可以将执行树理解为一个缓冲区的开始和结束,即缓冲区的整个生命周 ...
- SSIS的 Data Flow 和 Control Flow
Control Flow 和 Data Flow,是SSIS Design中主要用到的两个Tab,理解这两个Tab的作用,对设计更高效的package十分重要. 一,Control Flow 在Con ...
- Intel® Threading Building Blocks (Intel® TBB) Developer Guide 中文 Parallelizing Data Flow and Dependence Graphs并行化data flow和依赖图
https://www.threadingbuildingblocks.org/docs/help/index.htm Parallelizing Data Flow and Dependency G ...
- SSIS ->> Data Flow Design And Tuning
Requirements: Source and destination system impact Processing time windows and performance Destinati ...
- SSIS ->> Control Flow And Data Flow
In the Control Flow, the task is the smallest unit of work, and a task requires completion (success, ...
- Data Flow ->> Union All
Wrox的<Professional Microsoft SQL Server 2012 Integration Services>一书中再讲Merge的时候有这样一段解释: This t ...
- Data Flow ->> Import Column & Export Column
这两个transformation的作用是把DT_TEXT, DT_NTEXT, DT_IMAGE类型的数据在文件系统和数据库间导出或者导入.比如把某个数据库表的image类型的字段导出到文件系统成为 ...
随机推荐
- linux创建快捷方式ln命令
创建快捷方式命令 ln -s 源文件 目标目录 //目标目录可以是完整路径,也可以是当前目录下的路径 ln 源文件 目标目录 在桌面上添加一个,创建一个文件夹(这里是work)的快捷方式 //源 cd ...
- python入门 -- 学习笔记3
习题21:函数可以返回东西 过程解析: 1.定义函数:如def add(形参)函数 2.调用函数: add(实参) 别忘记加() 3.在函数调用的时候将实参的值传给形参,代入到函数中进行计算,r ...
- combox省市县三级联动
/** * Name 获取省份(初始化) */ function showProvince(id1, id2, id3) { var paramData = {}; $.ajax({ url: osp ...
- Selenium Webdriver元素定位的八种常用方式(转载)
转载自 https://www.cnblogs.com/qingchunjun/p/4208159.html 在使用selenium webdriver进行元素定位时,通常使用findElement或 ...
- OpenJudge NOI 4976 硬币
http://noi.openjudge.cn/ch0207/4976/ 描述 宇航员Bob有一天来到火星上,他有收集硬币的习惯.于是他将火星上所有面值的硬币都收集起来了,一共有n种,每种只有一个:面 ...
- TZOJ 2099 Sightseeing tour(网络流判混合图欧拉回路)
描述 The city executive board in Lund wants to construct a sightseeing tour by bus in Lund, so that to ...
- 762. Prime Number of Set Bits in Binary Representation二进制中有质数个1的数量
[抄题]: Given two integers L and R, find the count of numbers in the range [L, R] (inclusive) having a ...
- Django实现微信消息推送
一 所需准备条件 微信公众号的分类 微信消息推送 公众号 已认证公众号 服务号 已认证服务号 企业号 基于:微信认证服务号 主动推送微信消息.前提:关注服务号环境:沙箱环境 沙箱环境地址: https ...
- RFC
一.简介 二.常用 1)TLSv1.2 Protocol https://tools.ietf.org/html/rfc5246
- vscode快捷键的中文版
自己整理了一份vscode快捷键的中文版本