python 爬虫重复下载二次请求

在写爬虫的时候，难免会遇到报错，比如 4XX ，5XX，有些可能是网络的原因，或者一些其他的原因，这个时候我们希望程序去做第二次下载，

有一种很low的解决方案，比如是用 try except

try:

    -------

except:

    try:

        --------

    except:

        try:

            ------

        except:

            try:

                ------

            except:

                try:

                    ------

                except:

                    try:

                        ------

                    except:

                        ------

有没有看起来更舒服的写法呢？

我们可以用递归实现这个过程

代码如下

request_urls = [

"https://www.baidu.com/",

"https://www.baidu.com/",

"https://www.baidu.com/",

"https://www.ba111111idu.com/",

"https://www.baidu.com/",

"https://www.baidu.com/",

]

def down_load(url,request_max=3):

    print "正在请求的URL是：",url

    result_html = ""

    result_status_code = ""

    try:

        result = session.get(url=url)

        result_html = result.content

        result_status_code = result.status_code

        print result_status_code

    except Exception as e:

        print e

        if request_max >0:

            if result_status_code != 200:

                return down_load(url,request_max-1)

    return result_html

for url in request_urls:

    down_load(url=url,request_max=13)

输出结果：

C:\Python27\python.exe C:/Users/xuchunlin/PycharmProjects/A9_25/auction/test.py

正在请求的URL是： https://www.baidu.com/

200

正在请求的URL是： https://www.baidu.com/

200

正在请求的URL是： https://www.baidu.com/

200

正在请求的URL是： https://www.ba111111idu.com/

HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6208>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

正在请求的URL是： https://www.ba111111idu.com/

HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6438>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

正在请求的URL是： https://www.ba111111idu.com/

HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA65F8>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

正在请求的URL是： https://www.ba111111idu.com/

HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6828>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

正在请求的URL是： https://www.ba111111idu.com/

HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6A90>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

正在请求的URL是： https://www.ba111111idu.com/

HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA62E8>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

正在请求的URL是： https://www.ba111111idu.com/

HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6D30>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

正在请求的URL是： https://www.ba111111idu.com/

HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6DD8>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

正在请求的URL是： https://www.ba111111idu.com/

HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B682B0>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

正在请求的URL是： https://www.ba111111idu.com/

HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B68080>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

正在请求的URL是： https://www.ba111111idu.com/

HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B685C0>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

正在请求的URL是： https://www.ba111111idu.com/

HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B687F0>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

正在请求的URL是： https://www.ba111111idu.com/

HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B68A20>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

正在请求的URL是： https://www.ba111111idu.com/

HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B68C50>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

正在请求的URL是： https://www.baidu.com/

200

正在请求的URL是： https://www.baidu.com/

200

Process finished with exit code 0

python 爬虫重复下载二次请求的更多相关文章

Python爬虫入门（二）之Requests库
Python爬虫入门(二)之Requests库我是照着小白教程做的,所以该篇是更小白教程hhhhhhhh 一.Requests库的简介 Requests 唯一的一个非转基因的 Python HTTP ...
Python爬虫初学（二）—— 爬百度贴吧
Python爬虫初学(二)-- 爬百度贴吧昨天初步接触了爬虫,实现了爬取网络段子并逐条阅读等功能,详见Python爬虫初学(一). 今天准备对百度贴吧下手了,嘿嘿.依然是跟着这个博客学习的,这次仿照 ...
python爬虫脚本下载YouTube视频
python爬虫脚本下载YouTube视频爬虫 python YouTube视频工作环境: python 2.7.13 pip lxml, 安装 pip install lxml,主要用xpath ...
Python爬虫学习：二、爬虫的初步尝试
我使用的编辑器是IDLE,版本为Python2.7.11,Windows平台. 本文是博主原创随笔,转载时请注明出处Maple2cat|Python爬虫学习:二.爬虫的初步尝试 1.尝试抓取指定网页 ...
python爬虫之下载文件的方式总结以及程序实例
python爬虫之下载文件的方式以及下载实例目录第一种方法:urlretrieve方法下载第二种方法:request download 第三种方法:视频文件.大型文件下载实战演示第一种方法: ...
Python爬虫小白---（二）爬虫基础--Selenium PhantomJS
一.前言前段时间尝试爬取了网易云音乐的歌曲,这次打算爬取QQ音乐的歌曲信息.网易云音乐歌曲列表是通过iframe展示的,可以借助Selenium获取到iframe的页面元素, 而QQ音乐采用的是 ...
[记录][python]python爬虫，下载某图片网站的所有图集
随笔仅用于学习交流,转载时请注明出处,http://www.cnblogs.com/CaDevil/p/5958770.html 该随笔是记录我的第一个python程序,一个爬去指定图片站点的所有图集 ...
Python爬虫《爬取get请求的页面数据》
一.urllib库 urllib是Python自带的一个用于爬虫的库,其主要作用就是可以通过代码模拟浏览器发送请求.其常被用到的子模块在Python3中的为urllib.request和urllib. ...
Python 爬虫实战（二）：使用 requests-html
Python 爬虫实战(一):使用 requests 和 BeautifulSoup,我们使用了 requests 做网络请求,拿到网页数据再用 BeautifulSoup 解析,就在前不久,requ ...

随机推荐

微信小程序 - （下拉）加载更多数据
注意和后端配合就行了,前端也只能把数据拼接起来! 无论是下拉加载还是加载更多,一样的道理! 注意首次加载传递参数注意每次加载数据数 wxml <view class='table-rank'& ...
Python+H5py实现将SVHN样本库转换为FasterRcnn训练样本
一.上代码 import os import h5py svhnPath = 'D:\\Project\\AIProject\\SVHNClassifier\\data' def loadSvhn(p ...
C#随机数字生成的一种方法
1.参考: public class RandomLongGenerater { public static long New(int bit) { ) { throw new Exception(& ...
python PDF报表服务
先看牛逼的草图知乎上刚看到类似的需求 Python Web导出有排版要求的PDF文件关键技术
MySQL事物系列：1：事物简介
1:事物是一组SQL的集合,要么都执行,要么都不执行.有ACID4个特性,即:原子性.一致性.隔离性.持久性. A(Atomicity)原子性:整个事物是不可分割的工作单位. C(consistenc ...
Spring学习笔记一：基础概念
转载请注明原文地址:http://www.cnblogs.com/ygj0930/p/6774310.html 一:Spring是什么 Spring的主要作用是作为对象的容器. 传统编程中,我们 ...
Linux 防火墙命令的操作命令CentOS
service firewalld status; #查看防火墙状态 systemctl start firewalld.service;#开启防火墙 systemctl stop firewalld ...
C#/Asp.Net 获取各种Url的方法
比如有:http://localhost:60527/WebSite1test/Default2.aspx?QueryString1=1&QueryString2=2 Response.Wri ...
妙用Pixel bender执行复杂运算/普通数据运算传递Vector数组
最近发现pixel bender有两个特殊点: 1.Input Image4,不单单可以用BitmapData来初始化,也可以用Vector.<Number>初始化. 2.ShaderJo ...
V-rep学习笔记：vrep中的实用工具
在V-REP的模型浏览器中可以找到一个工具文件夹tools,点开后会在下面一栏中显示许多方框图,将这些方框拖到场景模型中可以实现一些特定的功能,方便建模或其它操作. Center of mass vi ...

python 爬虫 重复下载 二次请求

python 爬虫 重复下载 二次请求的更多相关文章

随机推荐

热门专题

python 爬虫重复下载二次请求

python 爬虫重复下载二次请求的更多相关文章