转载:http://codewa.com/question/45600.html

Q:How to avoid HTTP error 429 (Too Many Requests) python

Q:如何避免HTTP错误429(请求太多)Python

I am trying to use Python to login to a website and gather information from several webpages and I get the following error:

Traceback (most recent call last):
File "extract_test.py", line 43, in <module>
response=br.open(v)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 203, in open
return self._mech_open(url, data, timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 255, in _mech_open
raise response
mechanize._response.httperror_seek_wrapper: HTTP Error 429: Unknown Response Code

I used time.sleep() and it works, but it seems unintelligent and unreliable, is there any other way to dodge this error?

Here's my code:

import mechanize
import cookielib
import re
first=("example.com/page1")
second=("example.com/page2")
third=("example.com/page3")
fourth=("example.com/page4")
## I have seven URL's I want to open urls_list=[first,second,third,fourth] br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj) # Browser options
br.set_handle_equiv(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False) # Log in credentials
br.open("example.com")
br.select_form(nr=0)
br["username"] = "username"
br["password"] = "password"
br.submit() for url in urls_list:
br.open(url)
print re.findall("Some String")

我试图用Python来登录到一个网站,收集信息,从几个网页,我收到以下错误:

Traceback (most recent call last):
File "extract_test.py", line 43, in <module>
response=br.open(v)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 203, in open
return self._mech_open(url, data, timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 255, in _mech_open
raise response
mechanize._response.httperror_seek_wrapper: HTTP Error 429: Unknown Response Code

我用的时间。sleep()和它的作品,但它似乎愚蠢和不可靠的,有没有其他的方式来逃避这个错误?

这是我的密码:

import mechanize
import cookielib
import re
first=("example.com/page1")
second=("example.com/page2")
third=("example.com/page3")
fourth=("example.com/page4")
## I have seven URL's I want to open urls_list=[first,second,third,fourth] br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj) # Browser options
br.set_handle_equiv(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False) # Log in credentials
br.open("example.com")
br.select_form(nr=0)
br["username"] = "username"
br["password"] = "password"
br.submit() for url in urls_list:
br.open(url)
print re.findall("Some String")
answer1: 回答1:

Receiving a status 429 is not an error, it is the other server "kindly" asking you to please stop spamming requests. Obviously, your rate of requests has been too high and the server is not willing to accept this.

You should not seek to "dodge" this, or even try to circumvent server security settings by trying to spoof your IP, you should simply respect the server's answer by not sending too many requests.

If everything is set up properly, you will also have received a "Retry-after" header along with the 429 response. This header specifies the number of seconds you should wait before making another call. The proper way to deal with this "problem" is to read this header and to sleep your process for that many seconds.

You can find more information on status 429 here: http://tools.ietf.org/html/rfc6585#page-3

429接收的状态是不是一个错误,这是其他服务器的“好心”请你停止滥发请求。显然,您的请求率太高,服务器不愿意接受。

你不应该寻求“道奇”,甚至试图绕过服务器安全设置试图欺骗你的IP,你应该尊重服务器的回答不给太多的要求。

如果一切都设置妥当,您也将收到一个“重试”后的头随着429响应。此标头指定要在调用另一个调用前等待的秒数。处理这个“问题”的正确方法是读取这个头,然后在你的睡眠过程中持续几秒钟。

你可以在这里找到状态429更多信息:HTTP:/ /工具。IETF。org / HTML / rfc6585 # page-3

answer2: 回答2:

Another workaround would be to spoof your IP using some sort of Public VPN or Tor network. This would be assuming the rate-limiting on the server at IP level.

There is a brief blog post demonstrating a way to use tor along with urllib2:

http://blog.flip-edesign.com/?p=119

另一个解决方法是哄骗你的IP使用某种公共VPN或Tor网络。这将假设在IP级别的服务器上的速率限制。

有一个简短的博客文章来演示使用Tor和urllib2一起:

http://blog.flip-edesign.com/?P = 119

answer3: 回答3:

Writing this piece of code fixed my problem:

requests.get(link, headers = {'User-agent': 'your bot 0.1'})

写这段代码修正了我的问题:

要求得到(链接、标题= { 'user-agent ':'你的BOT 0.1 })

【转】HTTP429的更多相关文章

  1. F12 开发人员工具中的控制台错误消息

    使用此参考解释显示在 Internet Explorer 11 的控制台 和调试程序中的错误消息. 简介 使用 F12 开发人员工具进行调试时,错误消息(例如 EC7111 或 HTML1114)将显 ...

随机推荐

  1. scala-数组/列表

    import scala.collection.mutable.ArrayBuffer var ary=Array(1,2,3) println(ary.mkString) println(ary(1 ...

  2. vue打包后出现"Failed to load resource: net::ERR_FILE_NOT_FOUND"错误

    创建vue脚手架搭建项目之后,用npm run build经行打包,运行index.html后出现异常: 打开dist/index.html, 诸如这些的,引入是有问题的, 这边的全部是绝对路径,而本 ...

  3. 内存的一些magic number和debug crt(0xCCCCCCCC和0xCDCDCDCD,debug版本的CRT为了方便调试程序的初始值)

    调试过debug版本的vc程序的人一定对0xCCCCCCCC和0xCDCDCDCD这样的内存很有印象.这是debug版本的CRT为了方便调试程序,在分配出来还没有初始化的时候提供的初始值. 实际上,W ...

  4. java 集合(五)MapDemo

    package cn.sasa.demo3; import java.util.HashMap; import java.util.Iterator; import java.util.LinkedH ...

  5. 重读《深入理解Java虚拟机》二、Java如何分配和回收内存?Java垃圾收集器如何工作?

    线程私有的内存区域随用户线程的结束而回收,内存分配编译期已确定,内存分配和回收具有确定性.共享线程随虚拟机的启动.结束而建立和销毁,在运行期进行动态分配.垃圾收集器主要对共享内存区域(堆和方法区)进行 ...

  6. 并查集——易爆物D305

    部分内容摘自博客http://blog.csdn.net/u012881011/article/details/46883863,感谢 易爆物D305             运行时间限制:1000m ...

  7. 王者荣耀里拿个王者有啥了不起,有胆就来挑战一下ApsaraCache源码

    王者荣耀大家估计都玩的很溜吧,撸完代码开一局,只要不遇到个猪队友,拿个鲁班后羿估计你们都能爆掉对手的塔吧.大神们打个排位赛拿个王者就和吃饭夹菜一样简单... But...你们玩过Redis和Memca ...

  8. ES6面试题总结

    1.说出至少5个ES6的新特性,并简述它们的作用.(简答题) 1.let关键字,用于声明只在块级作用域起作用的变量: 2.const关键字,用于声明一个常量: 3.结构赋值,一种新的变量赋值方式.常用 ...

  9. (4.1)mysql备份还原——mysql常见故障

    (4.1)mysql备份还原——mysql常见故障 1.常见故障类型 在数据库环境中,常见故障类型: 语句失败,用户进程失败,用户错误 实例失败,介质故障,网络故障 其中最严重的故障主要是用户错误和介 ...

  10. [路径规划] VFF和VFH

    VFF虚拟力场法 #ifndef VFF_HEADER #define VFF_HEADER #include <vector> #include "utils\point.h& ...