python 爬虫重复下载二次请求

在写爬虫的时候，难免会遇到报错，比如 4XX ，5XX，有些可能是网络的原因，或者一些其他的原因，这个时候我们希望程序去做第二次下载，

有一种很low的解决方案，比如是用 try except

try:

    -------

except:

    try:

        --------

    except:

        try:

            ------

        except:

            try:

                ------

            except:

                try:

                    ------

                except:

                    try:

                        ------

                    except:

                        ------

有没有看起来更舒服的写法呢？

我们可以用递归实现这个过程

代码如下

request_urls = [

"https://www.baidu.com/",

"https://www.baidu.com/",

"https://www.baidu.com/",

"https://www.ba111111idu.com/",

"https://www.baidu.com/",

"https://www.baidu.com/",

]

def down_load(url,request_max=3):

    print "正在请求的URL是：",url

    result_html = ""

    result_status_code = ""

    try:

        result = session.get(url=url)

        result_html = result.content

        result_status_code = result.status_code

        print result_status_code

    except Exception as e:

        print e

        if request_max >0:

            if result_status_code != 200:

                return down_load(url,request_max-1)

    return result_html

for url in request_urls:

    down_load(url=url,request_max=13)

输出结果：

C:\Python27\python.exe C:/Users/xuchunlin/PycharmProjects/A9_25/auction/test.py

正在请求的URL是： https://www.baidu.com/

200

正在请求的URL是： https://www.baidu.com/

200

正在请求的URL是： https://www.baidu.com/

200

正在请求的URL是： https://www.ba111111idu.com/

HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6208>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

正在请求的URL是： https://www.ba111111idu.com/

HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6438>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

正在请求的URL是： https://www.ba111111idu.com/

HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA65F8>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

正在请求的URL是： https://www.ba111111idu.com/

HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6828>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

正在请求的URL是： https://www.ba111111idu.com/

HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6A90>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

正在请求的URL是： https://www.ba111111idu.com/

HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA62E8>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

正在请求的URL是： https://www.ba111111idu.com/

HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6D30>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

正在请求的URL是： https://www.ba111111idu.com/

HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6DD8>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

正在请求的URL是： https://www.ba111111idu.com/

HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B682B0>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

正在请求的URL是： https://www.ba111111idu.com/

HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B68080>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

正在请求的URL是： https://www.ba111111idu.com/

HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B685C0>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

正在请求的URL是： https://www.ba111111idu.com/

HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B687F0>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

正在请求的URL是： https://www.ba111111idu.com/

HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B68A20>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

正在请求的URL是： https://www.ba111111idu.com/

HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B68C50>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

正在请求的URL是： https://www.baidu.com/

200

正在请求的URL是： https://www.baidu.com/

200

Process finished with exit code 0

python 爬虫重复下载二次请求的更多相关文章

Python爬虫入门（二）之Requests库
Python爬虫入门(二)之Requests库我是照着小白教程做的,所以该篇是更小白教程hhhhhhhh 一.Requests库的简介 Requests 唯一的一个非转基因的 Python HTTP ...
Python爬虫初学（二）—— 爬百度贴吧
Python爬虫初学(二)-- 爬百度贴吧昨天初步接触了爬虫,实现了爬取网络段子并逐条阅读等功能,详见Python爬虫初学(一). 今天准备对百度贴吧下手了,嘿嘿.依然是跟着这个博客学习的,这次仿照 ...
python爬虫脚本下载YouTube视频
python爬虫脚本下载YouTube视频爬虫 python YouTube视频工作环境: python 2.7.13 pip lxml, 安装 pip install lxml,主要用xpath ...
Python爬虫学习：二、爬虫的初步尝试
我使用的编辑器是IDLE,版本为Python2.7.11,Windows平台. 本文是博主原创随笔,转载时请注明出处Maple2cat|Python爬虫学习:二.爬虫的初步尝试 1.尝试抓取指定网页 ...
python爬虫之下载文件的方式总结以及程序实例
python爬虫之下载文件的方式以及下载实例目录第一种方法:urlretrieve方法下载第二种方法:request download 第三种方法:视频文件.大型文件下载实战演示第一种方法: ...
Python爬虫小白---（二）爬虫基础--Selenium PhantomJS
一.前言前段时间尝试爬取了网易云音乐的歌曲,这次打算爬取QQ音乐的歌曲信息.网易云音乐歌曲列表是通过iframe展示的,可以借助Selenium获取到iframe的页面元素, 而QQ音乐采用的是 ...
[记录][python]python爬虫，下载某图片网站的所有图集
随笔仅用于学习交流,转载时请注明出处,http://www.cnblogs.com/CaDevil/p/5958770.html 该随笔是记录我的第一个python程序,一个爬去指定图片站点的所有图集 ...
Python爬虫《爬取get请求的页面数据》
一.urllib库 urllib是Python自带的一个用于爬虫的库,其主要作用就是可以通过代码模拟浏览器发送请求.其常被用到的子模块在Python3中的为urllib.request和urllib. ...
Python 爬虫实战（二）：使用 requests-html
Python 爬虫实战(一):使用 requests 和 BeautifulSoup,我们使用了 requests 做网络请求,拿到网页数据再用 BeautifulSoup 解析,就在前不久,requ ...

随机推荐

vcenter SSO
Additional information: 对 COM 组件的调用返回了错误 HRESULT E_FAIL
1:Winform应用通过mshtml操作IE浏览器DOM时,第一次运行正常,点击第二次时错误信息如下 A first chance exception of type 'System.Runtime ...
Vim 中如何去掉 ^M 字符
基于 DOS/Windows 的文本文件在每一行末尾有一个 CR(回车)和 LF(换行),而 UNIX 文本只有一个换行,即win每行结尾为\r\n,而linux只有一个\n如果win下的文档上传到l ...
Nginx+Tomcat+Memcached 实现集群部署时Session共享
Nginx+Tomcat+Memcached 实现集群部署时Session共享一．简介我们系统经常要保存用户登录信息,有Cookie和Session机制,Cookie客户端保存用户信息,Sessi ...
ssh出错 sign_and_send_pubkey: signing failed: agent refused operation
在服务器添加完公钥之后,ssh服务器然后报了这个错误 sign_and_send_pubkey: signing failed: agent refused operation 然后执行了以下命令才好 ...
Flash：TextField字体不显示/文字不显示/文字丢失
节约大家时间,先说结论: 1.是否文字中包含了\r\n等字符,flash中,\r和\n都会换行.需要过滤掉其中1个 2.是否文本框大小不够,文字被挤到下一行了.设置单行.多行 3.TextFi ...
jqPlot图表插件学习之数据节点高亮和光标提示
一.准备工作首先我们需要到官网下载所需的文件: 官网下载(笔者选择的是jquery.jqplot.1.0.8r1250.zip这个版本) 然后读者需要根据自己的情况新建一个项目并且按照如下的方式加载 ...
python之函数用法capitalize()
# -*- coding: utf-8 -*- #python 27 #xiaodeng #python之函数用法capitalize() #capitalize() #说明:将字符串的第一个字母变成 ...
基于git 客户端使用shell工具
1 定义全局启动命令别名 C:\Program Files\Git\etc\profile.d\aliases.sh alias ls='ls -F --color=auto --show-cont ...
hdu 4122 Alice's mooncake shop (线段树)
题目大意: 一个月饼店每一个小时做出月饼的花费不一样. 储存起来要钱.最多存多久.问你把全部订单做完的最少花费. 思路分析: ans = segma( num[]*(cost[] + (i-j)*s) ...

python 爬虫 重复下载 二次请求

python 爬虫 重复下载 二次请求的更多相关文章

随机推荐

热门专题

python 爬虫重复下载二次请求

python 爬虫重复下载二次请求的更多相关文章