Python --- Scrapy 命令（转）

Scrapy 命令分为两种： 全局命令 和 项目命令 。

全局命令：在哪里都能使用。

项目命令：必须在爬虫项目里面才能使用。

全局命令

C:\Users\AOBO>scrapy -h

Scrapy 1.2. - no active project

Usage:

  scrapy <command> [options] [args]

Available commands:

  bench         Run quick benchmark test

  commands

  fetch         Fetch a URL using the Scrapy downloader

  genspider     Generate new spider using pre-defined templates

  runspider     Run a self-contained spider (without creating a project)

  settings      Get settings values

  shell         Interactive scraping console

  startproject  Create new project

  version       Print Scrapy version

  view          Open URL in browser, as seen by Scrapy

  [ more ]      More commands available when run from project directory

Use "scrapy <command> -h" to see more info about a command

startproject ：创建一个爬虫项目： scrapy startproject demo （ demo 创建的爬虫项目的名字）
runspider 运用单独一个爬虫文件： scrapy runspider abc.py
veiw 下载一个网页的源代码，并在默认的文本编辑器中打开这个源代码： scrapy view http://www.aobossir.com/
shell 进入交互终端，用于爬虫的调试（如果你不调试，那么就不常用）： scrapy shell http://www.baidu.com --nolog （ --nolog 不显示日志信息）
version 查看版本：（ scrapy version ）
bench 测试本地硬件性能（工作原理：）： scrapy bench （如果遇到问题：解决问题: import win32api ImportError: DLL load failed ，到这里查看解决办法。）

项目命令

（进入项目路径，才能看到项目命令）

D:\BaiduYunDownload\first>scrapy -h

Scrapy 1.2. - project: first

Usage:

  scrapy <command> [options] [args]

Available commands:

  bench         Run quick benchmark test

  check         Check spider contracts

  commands

  crawl         Run a spider

  edit          Edit spider

  fetch         Fetch a URL using the Scrapy downloader

  genspider     Generate new spider using pre-defined templates

  list          List available spiders

  parse         Parse URL (using its spider) and print the results

  runspider     Run a self-contained spider (without creating a project)

  settings      Get settings values

  shell         Interactive scraping console

  startproject  Create new project

  version       Print Scrapy version

  view          Open URL in browser, as seen by Scrapy

Use "scrapy <command> -h" to see more info about a command

D:\BaiduYunDownload\first>

genspider 创建一个爬虫文件，我们在爬虫项目里面才能创建爬虫文件（这个命令用的非常多）（ startproject ：创建一个爬虫项目）。创建爬虫文件是按照以下模板来创建的，使用 scrapy genspider -l 命令查看有哪些模板。

D:\BaiduYunDownload\first>scrapy genspider -l

Available templates:

  basic

  crawl

  csvfeed

  xmlfeed

D:\BaiduYunDownload\first>

basic 基础 crawl 自动爬虫 csvfeed 用来处理csv文件 xmlfeed 用来处理xml文件

按照 basic 模板创建一个名为 f1 的爬虫文件： scrapy genspider -t basic f1 ，创建了一个 f1.py 文件。

check测试爬虫文件、或者说：检测一个爬虫，如果结果是：OK，那么说明结果没有问题。：scrapy check f1
crawl运行一个爬虫文件。： scrapy crawl f1 或者 scrapy crawl f1 --nolog
list列出当前爬虫项目下所有的爬虫文件： scrapy list
edit使用编辑器打开爬虫文件（Windows上似乎有问题，Linux上没有问题）： scrapy edit f1

scrapy使用

scrapy startproject myproject
cd myproject
scrapy genspider -t basic stackoverflow stackoverflow.com/questions?sort=votes
scrapy crawl stackoverflow -o items.json（运行爬虫并把结果保存为json，也支持csv）

Python --- Scrapy 命令（转）的更多相关文章

Python -- Scrapy 命令行工具（command line tools）
结合scrapy 官方文档,进行学习,并整理了部分自己学习实践的内容 Scrapy是通过 scrapy 命令行工具进行控制的. 这里我们称之为 “Scrapy tool” 以用来和子命令进行区分. 对 ...
python爬虫scrapy命令工具学习之篇三
命令行工具(Command line tools) 全局命令 startproject settings runspider shell fetch view version 项目命令 crawl c ...
Python.Scrapy.14-scrapy-source-code-analysis-part-4
Scrapy 源代码分析系列-4 scrapy.commands 子包子包scrapy.commands定义了在命令scrapy中使用的子命令(subcommand): bench, check, ...
Python.Scrapy.11-scrapy-source-code-analysis-part-1
Scrapy 源代码分析系列-1 spider, spidermanager, crawler, cmdline, command 分析的源代码版本是0.24.6, url: https://gith ...
Python -- Scrapy 框架简单介绍（Scrapy 安装及项目创建）
Python -- Scrapy 框架简单介绍最近在学习python 爬虫,先后了解学习urllib.urllib2.requests等,后来发现爬虫也有很多框架,而推荐学习最多就是Scrapy框架 ...
Scrapy命令行工具简介
Windows 10家庭中文版,Python 3.6.4,virtualenv 16.0.0,Scrapy 1.5.0, 在最初使用Scrapy时,使用编辑器或IDE手动编写模块来创建爬虫(Spide ...
使用Scrapy命令行工具【导出JSON文件】时编码设置
Windows 10家庭中文版,Python 3.6.4,virtualenv 16.0.0,Scrapy 1.5.0, 使用scrapy命令行工具建立了爬虫项目(startproject),并使用s ...
(19)python scrapy框架
安装scrapy pycharm 建个纯python工程 settings里环境变量设置 C:\Python27;C:\Python27\Scripts; 下载win32api https://so ...
二、Scrapy命令行工具
本文转载自以下链接:https://scrapy-chs.readthedocs.io/zh_CN/latest/topics/commands.html Scrapy是通过 scrapy 命令行工具 ...

随机推荐

Android大神博客
https://github.com/yeungeek/awesome-android-person Android大神受Trinea的开源项目的启发和参考,也准备列一列Android圈里的大神们. ...
ORACLE的字符串操作函数
字符函数——返回字符值这些函数全都接收的是字符族类型的参数(CHR 除外)并且返回字符值.除了特别说明的之外,这些函数大部分返回VARCHAR2类型的数值.字符函数的返回类型所受的限制和基本数据库类 ...
getServletConfig().getInitParameter("count1") java.lang.NullPointerException
通常在doget中 System.out.println(getServletConfig()); System.out.println(getServletConfig().getInitParam ...
Ubuntu 16.04下使用Wine安装Notepad++
说明: 1.使用的Wine版本是深度出品(Deepin),已经精简了很多没用的配置,使启动能非常快,占用资源小. 2.关于没有.wine文件夹的解决方法:在命令行上运行winecfg: 下载: (链接 ...
双网卡环境导致Oracle连接异常
现在就是流行向最高水平看齐,这次项目的部署,好好的SQL Server扔了(有正版授权的企业版,神啊...),逢人就夸:“俺们那上的可是最顶级的Oracle Database System!”.看了看 ...
ofstream的使用方法
ofstream的使用方法ofstream是从内存到硬盘,ifstream是从硬盘到内存,其实所谓的流缓冲就是内存空间; 在C 中,有一个stream这个类,所有的I/O都以这个“流”类为基础的,包 ...
英语词组instead of的用法
nstead of 是个短语介词.Instead of 的意思是“代替……”.“而不……”, 在语言的实际运用中,instead o功能与连词十分相似,现归纳如下: 1.跟名词:I give him ...
日历插件js,jquery
常用的日历插件 DatePicker My97DatePicker 文章来源:刘俊涛的博客地址:http://www.cnblogs.com/lovebing 欢迎关注,有问题一起学习欢迎留言. ...
【LeetCode】Partition List ——链表排序问题
[题目] Given a linked list and a value x, partition it such that all nodes less than x come before nod ...
mongodb: 安装建/删库，表
mongodb的安装下载mongodb www.mongodb.org 下载最新stable版解压文件 3.不用编译,解压之后本身就是编译后的二进制可执行文件解压之后,目录格式如下在bin目录 ...