Scrapy shell使用

注意：容易出现403错误，实际爬取时不会出现。

response - a Response object containing the last fetched page

>>>response.xpath('//title/text()').extract()

return a list of selectors

>>>for index, link in enumerate(links):

... args = (index, link.xpath('@href').extract(), link.xpath('img/@src').extract()) ... print 'Link number %d points to url %s and image %s' % args

Link number 0 points to url [u'image1.html'] and image [u'image1_thumb.jpg'] Link number 1 points to url [u'image2.html'] and image [u'image2_thumb.jpg'] Link number 2 points to url [u'image3.html'] and image [u'image3_thumb.jpg'] Link number 3 points to url [u'image4.html'] and image [u'image4_thumb.jpg'] Link number 4 points to url [u'image5.html'] and image [u'image5_thumb.jpg']

enumerate() 函数一般用在 for 循环当中。

>>> seq = ['one', 'two', 'three'] >>> for element in seq: ... print i, seq[i] ... i +=1 ... 0 one 1 two 2 three

one 1 two 2 three

suppose you want to extract all <p> elements inside <div> elements. First, you would get all <div> elements:

>>> divs = response.xpath('//div')

note the dot prefixing the .//p XPath):

>>> for p in divs.xpath('.//p'): # extracts all <p> inside ... print p.extract()

Another common case would be to extract all direct <p> children:

>>> for p in divs.xpath('p'): ... print p.extract()

在程序中使用shell

from scrapy.shell import inspect_response inspect_response(response, self)

Ctrl-D (or Ctrl-Z in Windows) to exit the shell and resume the crawling:

xpath最外层最好用单引号！

shell 本地html，方便调试（但别取名为index.html）

scrapy shell ./path/to/file.html ,即使在本目录，也必须要加./，不能直接 shell file.html scrapy shell ../other/path/to/file.html scrapy shell /absolute/path/to/file.html

Scrapy shell使用的更多相关文章

Scrapy shell调试网页的信息
通过scrapy shell "http://www.thinkive.cn:10000/zentaopms/www/index.php?m=user&f=login"
scrapy shell 中文网站输出报错.记录.
UnicodeDecodeError: 'gbk' codec can't decode bytes in position 381-382: illegal multibyte sequence 上 ...
安装ipython，使用scrapy shell来验证xpath选择的结果 | How to install iPython and how does it work with Scrapy Shell
1. scrapy shell 是scrapy包的一个很好的交互性工具,目前我使用它主要用于验证xpath选择的结果.安装好了scrapy之后,就能够直接在cmd上操作scrapy shell了. 具 ...
python爬虫scrapy之scrapy终端(Scrapy shell)
Scrapy终端是一个交互终端,供您在未启动spider的情况下尝试及调试您的爬取代码. 其本意是用来测试提取数据的代码,不过您可以将其作为正常的Python终端,在上面测试任何的Python代码. ...
Scrapy Shell的使用
Scrapy终端是一个交互终端,我们可以在未启动spider的情况下尝试及调试代码,也可以用来测试XPath或CSS表达式,查看他们的工作方式,方便我们爬取的网页中提取的数据. 如果安装了 IPyth ...
14.Scrapy Shell
Scrapy终端是一个交互终端,我们可以在未启动spider的情况下尝试及调试代码,也可以用来测试XPath或CSS表达式,查看他们的工作方式,方便我们爬取的网页中提取的数据. 如果安装了 IPyth ...
scrapy shell的作用
1.可以方便我们做一些数据提取的测试代码: 2.如果想要执行scrapy命令,那么毫无疑问,肯定是要先进入到scrapy所在的环境中: 3.如果想要读取某个项目的配置信息,那么应该先进入到这个项目中. ...
Scrapy shell调试返回403错误
一.问题描述有时候用scrapy shell来调试很方便,但是有些网站有防爬虫机制,所以使用scrapy shell会返回403,比如下面 C:\Users\fendo>scrapy shel ...
scrapy shell
一.scrapy shell 1.安装pip install Jupyter 2.在pycharm中的启动命令: scrapy shell 注:启动后关键字高亮显示 3.查看response 执行sc ...
在Scrapy项目【内外】使用scrapy shell命令抓取某网站首页的初步情况
Windows 10家庭中文版,Python 3.6.3,Scrapy 1.5.0, 时隔一月,再次玩Scrapy项目,希望这次可以玩的更进一步. 本文展示使用在 Scrapy项目内.项目外scrap ...

随机推荐

如何使用 awk 输出文本中的字段和列
首先我们要知道,awk 能够自动将输入的行,分隔为若干字段.每一个字段就是一组字符,它们和其他的字段由一个内部字段分隔符分隔开来. 如果你熟悉 Unix/Linux 或者懂得 bash shell 编 ...
2013年八月GBin1月刊
2013年八月GBin1月刊推荐十款来自极客标签的超棒前端特效[第十二期] 本周,我们带来了极客社区推荐的10款前端特效,仍然是非常有趣的小动态效果的页面生成.喜欢的可以直接将我们的在线调试代码插入 ...
JMS与Spring之二（用message listener container异步收发消息）
转自:http://blog.csdn.net/moonsheep_liu/article/details/6684948
Rails零散知识
1. 邮箱验证VALID_EMAIL_REGEX = /\A[\w+\-.]+@[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]+\z/i validates :email, pres ...
Android Studio 项目中集成百度地图SDK报Native method not found: com.baidu.platform.comjni.map.commonmemcache.JNICommonMemCache.Create:()I错误
Android Studio 项目中集成百度地图SDK报以下错误: java.lang.UnsatisfiedLinkError: Native method not found: com.baidu ...
如何使用angularjs实现文本框获取焦点
<!DOCTYPE html> <html ng-app="myApp"> <head> <title>angularjs-focu ...
解决Cocos2d-x编译错误：无法打开源文件 "extensions/ExtensionExport.h"
#include "base/ccMacros.h"
Building An Effective Marketing Plan
“New ideas are a dime a dozen,” observes Arthur R. Kydd, “and so are new products and new technologi ...
ConfigurationManager读取dll的配置文件
ConfigurationManager读取dll的配置文件最近一个项目,需要发布dll给第三方使用,其中需要一些配置参数. 我们知道.NET的exe工程是自带的App.config文件的,编译之后 ...
最接近WeChat的全屏自定义相机(Custom Camera)
代码地址如下:http://www.demodashi.com/demo/13271.html 一.需求最接近WeChat的全屏自定义相机(Custom Camera),拍照和预览都是全屏尺寸.使用 ...

Scrapy shell使用

Scrapy shell使用的更多相关文章

随机推荐

热门专题