Scrapy Crawl 运行出错 AttributeError: 'xxxSpider' object has no attribute '

按照官方的文档写的demo，只是多了个init函数，最终执行时提示没有_rules这个属性的错误日志如下：

 ......

  File "C:\ProgramData\Anaconda3\lib\site-packages\scrapy\spiders\crawl.py", line 82, in _parse_response

    for request_or_item in self._requests_to_follow(response):

  File "C:\ProgramData\Anaconda3\lib\site-packages\scrapy\spiders\crawl.py", line 60, in _requests_to_follow

    for n, rule in enumerate(self._rules):

AttributeError: 'TestSpider' object has no attribute '_rules'

出问题的spider代码如下：

# -*- coding: utf-8 -*-

import scrapy

from scrapy.linkextractors import LinkExtractor

from scrapy.spiders import CrawlSpider, Rule

from newtest.items import NewtestItem

class TestSpider(CrawlSpider):

    def __init__(self,*args, **kwargs):

        self.headers = {

            'Content-Type':'application/x-www-form-urlencoded; charset=UTF-8',

            'Accept-Encoding':'gzip, deflate',

            'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'

        }

    name = 'test'

    allowed_domains = ['example.com']

    start_urls = ['http://www.example.com']

    rules = (

        # Extract links matching 'category.php' (but not matching 'subsection.php')

        # and follow links from them (since no callback means follow=True by default).

        Rule(LinkExtractor(allow=('category\.php', ), deny=('subsection\.php', ))),

        # Extract links matching 'item.php' and parse them with the spider's method parse_item

        Rule(LinkExtractor(allow=('item\.php', )), callback='parse_item'),

    )

    def parse_item(self, response):

        self.logger.info('Hi, this is an item page! %s', response.url)

        item = scrapy.Item()

        item['id'] = response.xpath('//td[@id="item_id"]/text()').re(r'ID: (\d+)')

        item['name'] = response.xpath('//td[@id="item_name"]/text()').extract()

        item['description'] = response.xpath('//td[@id="item_description"]/text()').extract()

        return item

后来仔细看了下，跟官方不一样的就是自己重写了init初始化方法，而根据这个提示的日志，应该是覆盖了CrawlSpider的init方法但是没有调用父类的init导致_rules这个属性没有声明导致的。我们来看下CrawlSpider的源码：

所以如果我们的Spider是从CrawlSpider继承过来的，并且自己需要实现__init__ 方法的话，记住要调用父类的__init__方法保障能正常初始化crawlspider的属性。

修改后的代码如下：

第11行的super(TestSpider, self).__init__(*args, **kwargs) 是关键：

# -*- coding: utf-8 -*-

import scrapy

from scrapy.linkextractors import LinkExtractor

from scrapy.spiders import CrawlSpider, Rule

from newtest.items import NewtestItem

class TestSpider(CrawlSpider):

    def __init__(self, *args, **kwargs):

        super(TestSpider, self).__init__(*args, **kwargs)  # 这里是关键

        self.headers = {

            'Content-Type':'application/x-www-form-urlencoded; charset=UTF-8',

            'Accept-Encoding':'gzip, deflate',

            'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'

        }

    name = 'test'

    allowed_domains = ['example.com']

    start_urls = ['http://www.example.com']

    rules = (

        # Extract links matching 'category.php' (but not matching 'subsection.php')

        # and follow links from them (since no callback means follow=True by default).

        Rule(LinkExtractor(allow=('category\.php', ), deny=('subsection\.php', ))),

        # Extract links matching 'item.php' and parse them with the spider's method parse_item

        Rule(LinkExtractor(allow=('item\.php', )), callback='parse_item'),

    )

    def parse_item(self, response):

        self.logger.info('Hi, this is an item page! %s', response.url)

        item = scrapy.Item()

        item['id'] = response.xpath('//td[@id="item_id"]/text()').re(r'ID: (\d+)')

        item['name'] = response.xpath('//td[@id="item_name"]/text()').extract()

        item['description'] = response.xpath('//td[@id="item_description"]/text()').extract()

        return item

Scrapy Crawl 运行出错 AttributeError: 'xxxSpider' object has no attribute '_rules' 的问题解决的更多相关文章

Python3.7 Scrapy crawl 运行出错解决方法
安装的是Python3.7,装上依赖包和scrapy后运行爬虫命令出错 File "D:\Python37\lib\site-packages\scrapy\extensions\telne ...
提示AttributeError: 'module' object has no attribute 'HTTPSHandler'解决方法
今天在新机器上安装sqlmap,运行提示AttributeError: 'module' object has no attribute 'HTTPSHandler' 网上找了找资料,发现一篇文章ht ...
python 脚本运行时报错： AttributeError: 'module' object has no attribute ***
最近在编写Python脚本过程中遇到一个问题比较奇怪:Python脚本完全正常没问题,但执行总报错"AttributeError: 'module' object has no attrib ...
Django 运行报异常：AttributeError: 'str' object has no attribute 'get'
Technorati Tags: Python,Django,Web 在使用django.contrib.auth用户机制进行用户的验证.登录.注销操作时,遇到这个异常. 首先是写了一个登录的视图,要 ...
在运行create_list.sh时候报错：AttributeError: 'module' object has no attribute 'LabelMap'
Traceback (most recent call last):File "/opt/xuben-project/caffe/data/VOC0712/../../scripts/cre ...
python3.x运行的坑：AttributeError: 'str' object has no attribute 'decode'
1.Python3.x和Python2.X版本有一些区别,我遇到了两个问题如下: a.第一个报:mysqlclient 1.3版本不对: 解决办法:注释掉这行即可: b.第二个报:字符集的问题: 报错 ...
AttributeError: 'dict' object has no attribute 'has_key'
运行下面的代码: if (locals().has_key('data')): del data gc.collect() 出错: if (locals().has_key('data')): Att ...
Python AttributeError: 'Module' object has no attribute 'STARTF_USESHOWINDOW'
夫学须志也,才须学也,非学无以广才,非志无以成学.--诸葛亮生活有度,自得慈铭 --杜锦阳今天新来的同事安装环境遇到个莫名其妙的问题: AttributeError: 'Module' objec ...
解决opencv：AttributeError: 'NoneType' object has no attribute 'copy'
情况一: 路径中有中文,更改即可情况二:可以运行代码,在运行结束时显示 AttributeError: 'NoneType' object has no attribute 'copy' 因为如果是 ...

随机推荐

java读取pdf和MS Office文档
有时候PDF中的文字无法复制,这可能是因为PDF文件加密了,不过使用PDFBox开源软件就可以把它读出来. 还有一个用于创建PDF文件的项目----iText. PDFBox下面有两个子项目:Font ...
开启Java之旅
学习应用系统的服务器开发,也许并不算什么“旅行”,也不会那么‘愉快’.但是,我希望这次能够同以往有所不同,更加努力地学习J2EE. 从2月份开始,从事web前端开发,并在公司的的项目中,独立完成了4个 ...
如何获取jar包的在执行机上面的路径
背景: 最近在项目中遇到一个小问题, 几行代码就能解决了 String path = this.getClass().getProtectionDomain().getCodeSource().get ...
手机网页制作教程META标签你知道多少？【转+加】
一.天猫 <title>天猫触屏版</title> <meta content="text/html; charset=utf-8" http-equ ...
windows server 2012 R2 远程桌面授权模式尚未配置
windows server 2012 R2 远程桌面授权模式尚未配置,远程桌面服务将在120天内停止工作.如何破解这个宽限期,目前企业7位协议号码均不包含2012 R2以上授权. 那么只能蛋疼的“破 ...
.NetCore 下开发独立的（RPL）含有界面的组件包（四）授权过滤
.NetCore 下开发独立的(RPL)含有界面的组件包 (一)准备工作 .NetCore 下开发独立的(RPL)含有界面的组件包 (二)扩展中间件及服务 .NetCore 下开发独立的(RPL)含 ...
导出CSV乱码
导出CSV,无论是什么格式,excel打卡都是乱码需要加上 echo "\xEF\xBB\xBF"; header("Content-Disposition:attac ...
Python_模块介绍
模块:一组或者一个.py文件实现了某个功能的代码集合模块分为三种: 自定义模块内置标准模块(又称标准库):Python自带的模块开源模块:自己写的模块,有可以供人使用的功能开源模块的集散地:P ...
Python学习（十八）—— 数据库（三）
转载自http://www.cnblogs.com/linhaifeng/articles/7356064.html 一.数据操作 1.插入数据INSERT 1. 插入完整数据(顺序插入) 语法一: ...
AtCoder [Dwango Programming Contest V] E 动态规划多项式
原文链接 https://www.cnblogs.com/zhouzhendong/p/AtCoder-Dwango-Programming-Contest-V-E.html 题意有 $n$ 个数, ...

Scrapy Crawl 运行出错 AttributeError: 'xxxSpider' object has no attribute '_rules' 的问题解决

Scrapy Crawl 运行出错 AttributeError: 'xxxSpider' object has no attribute '_rules' 的问题解决的更多相关文章

随机推荐

热门专题