转载:http://codewa.com/question/45600.html

Q:How to avoid HTTP error 429 (Too Many Requests) python

Q:如何避免HTTP错误429(请求太多)Python

I am trying to use Python to login to a website and gather information from several webpages and I get the following error:

Traceback (most recent call last):
File "extract_test.py", line 43, in <module>
response=br.open(v)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 203, in open
return self._mech_open(url, data, timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 255, in _mech_open
raise response
mechanize._response.httperror_seek_wrapper: HTTP Error 429: Unknown Response Code

I used time.sleep() and it works, but it seems unintelligent and unreliable, is there any other way to dodge this error?

Here's my code:

import mechanize
import cookielib
import re
first=("example.com/page1")
second=("example.com/page2")
third=("example.com/page3")
fourth=("example.com/page4")
## I have seven URL's I want to open urls_list=[first,second,third,fourth] br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj) # Browser options
br.set_handle_equiv(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False) # Log in credentials
br.open("example.com")
br.select_form(nr=0)
br["username"] = "username"
br["password"] = "password"
br.submit() for url in urls_list:
br.open(url)
print re.findall("Some String")

我试图用Python来登录到一个网站,收集信息,从几个网页,我收到以下错误:

Traceback (most recent call last):
File "extract_test.py", line 43, in <module>
response=br.open(v)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 203, in open
return self._mech_open(url, data, timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 255, in _mech_open
raise response
mechanize._response.httperror_seek_wrapper: HTTP Error 429: Unknown Response Code

我用的时间。sleep()和它的作品,但它似乎愚蠢和不可靠的,有没有其他的方式来逃避这个错误?

这是我的密码:

import mechanize
import cookielib
import re
first=("example.com/page1")
second=("example.com/page2")
third=("example.com/page3")
fourth=("example.com/page4")
## I have seven URL's I want to open urls_list=[first,second,third,fourth] br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj) # Browser options
br.set_handle_equiv(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False) # Log in credentials
br.open("example.com")
br.select_form(nr=0)
br["username"] = "username"
br["password"] = "password"
br.submit() for url in urls_list:
br.open(url)
print re.findall("Some String")
answer1: 回答1:

Receiving a status 429 is not an error, it is the other server "kindly" asking you to please stop spamming requests. Obviously, your rate of requests has been too high and the server is not willing to accept this.

You should not seek to "dodge" this, or even try to circumvent server security settings by trying to spoof your IP, you should simply respect the server's answer by not sending too many requests.

If everything is set up properly, you will also have received a "Retry-after" header along with the 429 response. This header specifies the number of seconds you should wait before making another call. The proper way to deal with this "problem" is to read this header and to sleep your process for that many seconds.

You can find more information on status 429 here: http://tools.ietf.org/html/rfc6585#page-3

429接收的状态是不是一个错误,这是其他服务器的“好心”请你停止滥发请求。显然,您的请求率太高,服务器不愿意接受。

你不应该寻求“道奇”,甚至试图绕过服务器安全设置试图欺骗你的IP,你应该尊重服务器的回答不给太多的要求。

如果一切都设置妥当,您也将收到一个“重试”后的头随着429响应。此标头指定要在调用另一个调用前等待的秒数。处理这个“问题”的正确方法是读取这个头,然后在你的睡眠过程中持续几秒钟。

你可以在这里找到状态429更多信息:HTTP:/ /工具。IETF。org / HTML / rfc6585 # page-3

answer2: 回答2:

Another workaround would be to spoof your IP using some sort of Public VPN or Tor network. This would be assuming the rate-limiting on the server at IP level.

There is a brief blog post demonstrating a way to use tor along with urllib2:

http://blog.flip-edesign.com/?p=119

另一个解决方法是哄骗你的IP使用某种公共VPN或Tor网络。这将假设在IP级别的服务器上的速率限制。

有一个简短的博客文章来演示使用Tor和urllib2一起:

http://blog.flip-edesign.com/?P = 119

answer3: 回答3:

Writing this piece of code fixed my problem:

requests.get(link, headers = {'User-agent': 'your bot 0.1'})

写这段代码修正了我的问题:

要求得到(链接、标题= { 'user-agent ':'你的BOT 0.1 })

【转】HTTP429的更多相关文章

  1. F12 开发人员工具中的控制台错误消息

    使用此参考解释显示在 Internet Explorer 11 的控制台 和调试程序中的错误消息. 简介 使用 F12 开发人员工具进行调试时,错误消息(例如 EC7111 或 HTML1114)将显 ...

随机推荐

  1. python进程池爬取下载美女图片(xpath)--lowbiprogrammer

    # -*- coding: utf-8 -*-import requests,osfrom lxml import etreeimport multiprocessingfrom retrying i ...

  2. u-boot 编译,启动流程分析,移植

    分析u-boot-1.1.6 的启动流程 移植u-boot 2012.04版本到JZ2440开发板 源码百度云链接:https://pan.baidu.com/s/10VnxfDWBqJVGY3SCY ...

  3. fdisk vs df

    fdisk工具是分区工具:df是用来查看文件系统(分区)的使用情况的! 当用来查看分区信息时,较为相似: fdisk侧重于显示分区表的信息: df侧重于显示当前系统中所有文件系统的信息: 常用用法: ...

  4. xargs与管道的区别

    一.直观感受 echo '--help' | cat echo的输出通过管道定向到cat的输入, 然后cat从其标准输入中读取待处理的文本内容, 输出结果: --help echo '--help' ...

  5. 前端 HTML body标签相关内容 常用标签 超链接标签 a标签

    超链接标签 <a> 超级链接<a>标记代表一个链接点,是英文anchor(锚点)的简写.它的作用是把当前位置的文本或图片连接到其他的页面.文本或图像,也可以是相同网页上的不同位 ...

  6. what's the python之内置函数

    what's the 内置函数? 内置函数,内置函数就是python本身定义好的,我们直接拿来就可以用的函数.(python中一共有68中内置函数.)     Built-in Functions   ...

  7. python的执行顺序

    为了区分是主动执行(如python test.py)还是被动调用(如import test),python用__name__来进行标识. 当主动执行时,__name__为__main__,当被调用时, ...

  8. Number (int float bool complex)--》int 整型、二进制整型、八进制整型、十六进制整型

    # ### Number (int float bool complex) # (1) int 整型 (正整数 0 负整数) intvar = 15 print(intvar) intvar = 0 ...

  9. Java元注解—— @Retention @Target @Document @Inherited

    java中元注解有四个: @Retention @Target @Document @Inherited: @Retention:注解的保留位置 @Retention(RetentionPolicy. ...

  10. [vue]模拟移动端三级路由: router-link位置体现router的灵活性

    小结 router-link可以随便放 router-view显示的是父组件的直接子组件的内容 想研究下移动三级路由的逻辑, 即 router-link和router-view 点首页--点新闻资讯( ...