Scrapy shell使用

注意：容易出现403错误，实际爬取时不会出现。

response - a Response object containing the last fetched page

>>>response.xpath('//title/text()').extract()

return a list of selectors

>>>for index, link in enumerate(links):

... args = (index, link.xpath('@href').extract(), link.xpath('img/@src').extract()) ... print 'Link number %d points to url %s and image %s' % args

Link number 0 points to url [u'image1.html'] and image [u'image1_thumb.jpg'] Link number 1 points to url [u'image2.html'] and image [u'image2_thumb.jpg'] Link number 2 points to url [u'image3.html'] and image [u'image3_thumb.jpg'] Link number 3 points to url [u'image4.html'] and image [u'image4_thumb.jpg'] Link number 4 points to url [u'image5.html'] and image [u'image5_thumb.jpg']

enumerate() 函数一般用在 for 循环当中。

>>> seq = ['one', 'two', 'three'] >>> for element in seq: ... print i, seq[i] ... i +=1 ... 0 one 1 two 2 three

one 1 two 2 three

suppose you want to extract all <p> elements inside <div> elements. First, you would get all <div> elements:

>>> divs = response.xpath('//div')

note the dot prefixing the .//p XPath):

>>> for p in divs.xpath('.//p'): # extracts all <p> inside ... print p.extract()

Another common case would be to extract all direct <p> children:

>>> for p in divs.xpath('p'): ... print p.extract()

在程序中使用shell

from scrapy.shell import inspect_response inspect_response(response, self)

Ctrl-D (or Ctrl-Z in Windows) to exit the shell and resume the crawling:

xpath最外层最好用单引号！

shell 本地html，方便调试（但别取名为index.html）

scrapy shell ./path/to/file.html ,即使在本目录，也必须要加./，不能直接 shell file.html scrapy shell ../other/path/to/file.html scrapy shell /absolute/path/to/file.html

Scrapy shell使用的更多相关文章

Scrapy shell调试网页的信息
通过scrapy shell "http://www.thinkive.cn:10000/zentaopms/www/index.php?m=user&f=login"
scrapy shell 中文网站输出报错.记录.
UnicodeDecodeError: 'gbk' codec can't decode bytes in position 381-382: illegal multibyte sequence 上 ...
安装ipython，使用scrapy shell来验证xpath选择的结果 | How to install iPython and how does it work with Scrapy Shell
1. scrapy shell 是scrapy包的一个很好的交互性工具,目前我使用它主要用于验证xpath选择的结果.安装好了scrapy之后,就能够直接在cmd上操作scrapy shell了. 具 ...
python爬虫scrapy之scrapy终端(Scrapy shell)
Scrapy终端是一个交互终端,供您在未启动spider的情况下尝试及调试您的爬取代码. 其本意是用来测试提取数据的代码,不过您可以将其作为正常的Python终端,在上面测试任何的Python代码. ...
Scrapy Shell的使用
Scrapy终端是一个交互终端,我们可以在未启动spider的情况下尝试及调试代码,也可以用来测试XPath或CSS表达式,查看他们的工作方式,方便我们爬取的网页中提取的数据. 如果安装了 IPyth ...
14.Scrapy Shell
Scrapy终端是一个交互终端,我们可以在未启动spider的情况下尝试及调试代码,也可以用来测试XPath或CSS表达式,查看他们的工作方式,方便我们爬取的网页中提取的数据. 如果安装了 IPyth ...
scrapy shell的作用
1.可以方便我们做一些数据提取的测试代码: 2.如果想要执行scrapy命令,那么毫无疑问,肯定是要先进入到scrapy所在的环境中: 3.如果想要读取某个项目的配置信息,那么应该先进入到这个项目中. ...
Scrapy shell调试返回403错误
一.问题描述有时候用scrapy shell来调试很方便,但是有些网站有防爬虫机制,所以使用scrapy shell会返回403,比如下面 C:\Users\fendo>scrapy shel ...
scrapy shell
一.scrapy shell 1.安装pip install Jupyter 2.在pycharm中的启动命令: scrapy shell 注:启动后关键字高亮显示 3.查看response 执行sc ...
在Scrapy项目【内外】使用scrapy shell命令抓取某网站首页的初步情况
Windows 10家庭中文版,Python 3.6.3,Scrapy 1.5.0, 时隔一月,再次玩Scrapy项目,希望这次可以玩的更进一步. 本文展示使用在 Scrapy项目内.项目外scrap ...

随机推荐

从0x00到0xFF的含义以及二进制到10进制的转换（转）
转载自: http://www.cnblogs.com/brice/p/5343322.html 对于二进制来说,8位二进制我们称之为一个字节,二进制的表达范围值是从0b00000000-0b1111 ...
[Tools] Batch Create Markdown Files from a Template with Node.js and Mustache
Creating Markdown files from a template is a straightforward process with Node.js and Mustache. You ...
JRE与JVM、JDK的区别
JRE与JVM.JDK的区别一. 详细介绍1.JVM -- java virtual machine JVM就是我们常说的java虚拟机,它是整个java实现跨平台的最核心的部分,所有的java程 ...
Android蓝牙音乐获取歌曲信息
由于我在蓝牙开发方面没有多少经验,如果只是获取一下蓝牙设备名称和连接状态那么前面的那篇文章就已经足够了,接下来的内容是转自一个在蓝牙音乐方面颇有经验的开发者的博客,他的这篇文章对我帮助很大. 今天,先 ...
Unity3D优化之合并网格
原文地址点击这里
jQuery-mobile 学习笔记之三（事件监听）
续上触摸事件 - 当用户触摸屏幕时触发(敲击和滑动) 滚动事件 - 当上下滚动时触发方向事件 - 当设备垂直或水平旋转时触发页面事件 - 当页面被显示.隐藏.创建.载入以及/或卸载时触发一.初 ...
一个用于将sql脚本转换成实体类的js代码
以前写过一段C#,苦于编译才能用.这样的小工具最好是用脚本语言来编写,易于执行,也易于修改. js 代码 convert.js ------------------------------------ ...
Linux中使用Crontab定时监测维护Tomcat应用程序的方法
Linux中使用Crontab定时监测维护Tomcat应用程序的方法功能需求: 定时监测发布的某项应用程序是否可用,如果不可用,立即执行处理措施,实现自动化运维工作. 监测的应用接口: 新闻接口.天 ...
python基础篇---实战---用户登入注册程序
一.首先了解需求: 1.支持多个用户登入 2.登入成功后显示欢迎,并退出程序 3.登入三次失败后,退出程序,并在下次程序启动尝试登入时,该用户名依然是锁定状态二.文件代码如下: f = open(& ...
memcache基础知识-stats参数
安装memcache: #tar -xvf libevent-1.4.13-stable.tar.gz#cd libevent-1.4.13-stable#./configure && ...

Scrapy shell使用

Scrapy shell使用的更多相关文章

随机推荐

热门专题