爬取豆瓣电影top250,出现以下报错:

2018-08-11 22:02:16 [scrapy.core.engine] INFO: Spider opened
2018-08-11 22:02:16 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-08-11 22:02:16 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-08-11 22:02:17 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://movie.douban.com/robots.txt> (referer: None)
2018-08-11 22:02:17 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://movie.douban.com/top250> (referer: None)
2018-08-11 22:02:17 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://movie.douban.com/top250>: HTTP status code is not handled or not
allowed
2018-08-11 22:02:17 [scrapy.core.engine] INFO: Closing spider (finished)

防止反爬机制,伪装user_agent

【1】打开豆瓣top250 :  https://movie.douban.com/top250

【2】F12 打开控制台->刷新页面 ->Network->请求头部找到 User-Agent

在scrapy项目中找到settings.py的  USER_AGENT = ' '  (把注释去掉,加以下内容)

USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3493.3 Safari/537.36'

重新执行即可

scrapy crawl douban_spider

INFO: Ignoring response <403 https://movie.douban.com/top250>: HTTP status code is not handled or not allowed的更多相关文章

  1. python scrapy 报错 DEBUG: Ignoring response 403

    DEBUG: Ignoring response <403 http://movie.douban.com/top250>: HTTP status code is not handled ...

  2. docker 1.12.3版本搭建私有仓库,上传镜像报错:server gave HTTP response to HTTPS client”

    系统环境:centos7 docker版本: 1.12.3(注意版本,可能存在不同版本设置不同的情况) docker registry版本:2.4.1 问题: 成功安装docker registry, ...

  3. http: server gave HTTP response to HTTPS client & Get https://192.168.2.119/v2/: dial tcp 192.168.2.119:443: getsockopt: connection refused

    http: server gave HTTP response to HTTPS client 出现这问题的原因是:Docker自从1.3.X之后docker registry交互默认使用的是HTTP ...

  4. [原]Docker-issue(2) http: server gave HTTP response to HTTPS client

    系统环境 查看 文章末尾 附录 问题点:新建local registry后,push新的image到local registry  未能成功,并报错误: The push refers to repo ...

  5. docker local registry server gave HTTP response to HTTPS client

    server gave HTTP response to HTTPS client报错是在insecure_registry中加入了http前缀,如果本地registry不是https的 就不要加任何 ...

  6. docker registry push错误“server gave HTTP response to HTTPS client”

    系统环境:centos7 docker版本: 1.12.3(注意版本,可能存在不同版本设置不同的情况) docker registry版本:2.4.1 问题: 成功安装docker registry, ...

  7. (七)VMware Harbor 问题:Get https://192.168.3.135:8088/v2/: http:server gave HTTP response to HTTPS client

    (一)问题描述 登陆时,报错 docker Get https://192.168.3.135:8088/v2/: http:server gave HTTP response to HTTPS cl ...

  8. 【解决】http: server gave HTTP response to HTTPS client

    [问题]上传镜像到私有仓库时报错 $ docker push xxx.xxx.xxx.xxx:5000/java-8 The push refers to repository [xxx.xxx.xx ...

  9. springMVC or response redirect https

    <bean class="org.springframework.web.servlet.view.InternalResourceViewResolver"> < ...

随机推荐

  1. *AtCoder Regular Contest 096E - Everything on It

    $n \leq 3000$个酱,丢进拉面里,需要没两碗面的酱一样,并且每个酱至少出现两次,面的数量随意.问方案数.对一给定质数取模. 没法dp就大力容斥辣.. $Ans=\sum_{i=0}^n (- ...

  2. 使用FL2440之问题1

    随机送的usb转串口线(一头usb一头9针,蓝色),写明HL340,装上驱动后运行正常,但电脑设备管理器显示的却是CH340,以前还用过PL2303,百度总结一下他们的区别: CH340,PL2303 ...

  3. list或map 打印成json 方便调试

    private final Logger logger = Logger.getLogger(this.getClass()); logger.info(JSON.toJSONStringWithDa ...

  4. 在 VirtualBox 5.0 系列中让虚拟机支持 USB 3.0 必须开启 APIC

    VirtualBox 5.0 系列正式支持 USB 3.0,能够在宿主机支持 USB 3.0 的情况下,让虚拟机也选择具备 USB 3.0 的功能.但是经过多方试验,发现必须在 VirtualBox ...

  5. 同时在windows和linux环境开发时换行符的处理

    Git 的 core.autocrlf 參數默认为true,即每次 checkin 時,Git 會將純文字類型的檔案中的所有 CRLF 字元轉換為 LF,也就是版本庫中的換行符號一律存成 LF:在 c ...

  6. 实现TTCP (检测TCP吞吐量)

    实现TTCP (检测TCP吞吐量) 应用层协议 为了解决TCP粘包问题以及客户端阻塞问题 设计的应用层协议如下: //告知要发送的数据包个数和长度 struct SessionMessage { in ...

  7. FireDac心得

    usesFireDAC.Phys.MySQL, FireDAC.Stan.Def, FireDAC.DApt, FireDAC.Comp.Client, FireDAC.Comp.UI, FireDA ...

  8. 快速掌握RabbitMQ(一)——RabbitMQ的基本概念、安装和C#驱动

    1 RabbitMQ简介 RabbitMQ是一个由erlang开发的AMQP(Advanced Message Queue )的开源实现,官网地址:http://www.rabbitmq.com.Ra ...

  9. luogu P1351 联合权值

    题目描述 无向连通图G 有n 个点,n - 1 条边.点从1 到n 依次编号,编号为 i 的点的权值为W i ,每条边的长度均为1 .图上两点( u , v ) 的距离定义为u 点到v 点的最短距离. ...

  10. IntelliJ IDEA设置properties文件显示中文

    配置这里: 注意:上面是Default Settings,还需要在Settings中设置成上面一样的.