爬虫必备—requests

Requests 是使用 Apache2 Licensed 许可证的基于Python开发的HTTP 库，其在Python内置模块的基础上进行了高度的封装，从而使得Pythoner进行网络请求时，变得美好了许多，使用Requests可以轻而易举的完成浏览器可有的任何操作。

1. GET请求

 # 1、无参数实例

 import requests

 ret = requests.get('https://github.com/timeline.json')

 print(ret.url)

 print(ret.text)

 # 2、有参数实例

 import requests

 payload = {'key1': 'value1', 'key2': 'value2'}

 ret = requests.get("http://httpbin.org/get", params=payload)

 print(ret.url)

 print(ret.text)

2. POST请求

 # 1、基本POST实例

 import requests

 payload = {'key1': 'value1', 'key2': 'value2'}

 ret = requests.post("http://httpbin.org/post", data=payload)

 print(ret.text)

 # 2、发送请求头和数据实例

 import requests

 import json

 url = 'https://api.github.com/some/endpoint'

 payload = {'some': 'data'}

 headers = {'content-type': 'application/json'}

 ret = requests.post(url, data=json.dumps(payload), headers=headers)

 print(ret.text)

 print(ret.cookies)

3. 其它请求

 requests.get(url, params=None, **kwargs)

 requests.post(url, data=None, json=None, **kwargs)

 requests.put(url, data=None, **kwargs)

 requests.head(url, **kwargs)

 requests.delete(url, **kwargs)

 requests.patch(url, data=None, **kwargs)

 requests.options(url, **kwargs)

 # 以上方法均是在此方法的基础上构建

 requests.request(method, url, **kwargs)

4. 请求参数

 def request(method, url, **kwargs):

     """Constructs and sends a :class:`Request <Request>`.

     :param method: method for the new :class:`Request` object.

     :param url: URL for the new :class:`Request` object.

     :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.

     :param data: (optional) Dictionary, bytes, or file-like object to send in the body of the :class:`Request`.

     :param json: (optional) json data to send in the body of the :class:`Request`.

     :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.

     :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.

     :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.

         ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``

         or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string

         defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers

         to add for the file.

     :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.

     :param timeout: (optional) How long to wait for the server to send data

         before giving up, as a float, or a :ref:`(connect timeout, read

         timeout) <timeouts>` tuple.

     :type timeout: float or tuple

     :param allow_redirects: (optional) Boolean. Set to True if POST/PUT/DELETE redirect following is allowed.

     :type allow_redirects: bool

     :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.

     :param verify: (optional) whether the SSL cert will be verified. A CA_BUNDLE path can also be provided. Defaults to ``True``.

     :param stream: (optional) if ``False``, the response content will be immediately downloaded.

     :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.

     :return: :class:`Response <Response>` object

     :rtype: requests.Response

     Usage::

       >>> import requests

       >>> req = requests.request('GET', 'http://httpbin.org/get')

       <Response [200]>

     """

参数列表

5. 参数示例

 def param_method_url():

     # requests.request(method='get', url='http://127.0.0.1:8000/test/')

     # requests.request(method='post', url='http://127.0.0.1:8000/test/')

     pass

 def param_param():

     # - 可以是字典

     # - 可以是字符串

     # - 可以是字节（ascii编码以内）

     # requests.request(method='get',

     # url='http://127.0.0.1:8000/test/',

     # params={'k1': 'v1', 'k2': '水电费'})

     # requests.request(method='get',

     # url='http://127.0.0.1:8000/test/',

     # params="k1=v1&k2=水电费&k3=v3&k3=vv3")

     # requests.request(method='get',

     # url='http://127.0.0.1:8000/test/',

     # params=bytes("k1=v1&k2=k2&k3=v3&k3=vv3", encoding='utf8'))

     # 错误

     # requests.request(method='get',

     # url='http://127.0.0.1:8000/test/',

     # params=bytes("k1=v1&k2=水电费&k3=v3&k3=vv3", encoding='utf8'))

     pass

 def param_data():

     # 可以是字典

     # 可以是字符串

     # 可以是字节

     # 可以是文件对象

     # requests.request(method='POST',

     # url='http://127.0.0.1:8000/test/',

     # data={'k1': 'v1', 'k2': '水电费'})

     # requests.request(method='POST',

     # url='http://127.0.0.1:8000/test/',

     # data="k1=v1; k2=v2; k3=v3; k3=v4"

     # )

     # requests.request(method='POST',

     # url='http://127.0.0.1:8000/test/',

     # data="k1=v1;k2=v2;k3=v3;k3=v4",

     # headers={'Content-Type': 'application/x-www-form-urlencoded'}

     # )

     # requests.request(method='POST',

     # url='http://127.0.0.1:8000/test/',

     # data=open('data_file.py', mode='r', encoding='utf-8'), # 文件内容是：k1=v1;k2=v2;k3=v3;k3=v4

     # headers={'Content-Type': 'application/x-www-form-urlencoded'}

     # )

     pass

 def param_json():

     # 将json中对应的数据进行序列化成一个字符串，json.dumps(...)

     # 然后发送到服务器端的body中，并且Content-Type是 {'Content-Type': 'application/json'}

     requests.request(method='POST',

                      url='http://127.0.0.1:8000/test/',

                      json={'k1': 'v1', 'k2': '水电费'})

 def param_headers():

     # 发送请求头到服务器端

     requests.request(method='POST',

                      url='http://127.0.0.1:8000/test/',

                      json={'k1': 'v1', 'k2': '水电费'},

                      headers={'Content-Type': 'application/x-www-form-urlencoded'}

                      )

 def param_cookies():

     # 发送Cookie到服务器端

     requests.request(method='POST',

                      url='http://127.0.0.1:8000/test/',

                      data={'k1': 'v1', 'k2': 'v2'},

                      cookies={'cook1': 'value1'},

                      )

     # 也可以使用CookieJar（字典形式就是在此基础上封装）

     from http.cookiejar import CookieJar

     from http.cookiejar import Cookie

     obj = CookieJar()

     obj.set_cookie(Cookie(version=0, name='c1', value='v1', port=None, domain='', path='/', secure=False, expires=None,

                           discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False,

                           port_specified=False, domain_specified=False, domain_initial_dot=False, path_specified=False)

                    )

     requests.request(method='POST',

                      url='http://127.0.0.1:8000/test/',

                      data={'k1': 'v1', 'k2': 'v2'},

                      cookies=obj)

 def param_files():

     # 发送文件

     # file_dict = {

     # 'f1': open('readme', 'rb')

     # }

     # requests.request(method='POST',

     # url='http://127.0.0.1:8000/test/',

     # files=file_dict)

     # 发送文件，定制文件名

     # file_dict = {

     # 'f1': ('test.txt', open('readme', 'rb'))

     # }

     # requests.request(method='POST',

     # url='http://127.0.0.1:8000/test/',

     # files=file_dict)

     # 发送文件，定制文件名

     # file_dict = {

     # 'f1': ('test.txt', "hahsfaksfa9kasdjflaksdjf")

     # }

     # requests.request(method='POST',

     # url='http://127.0.0.1:8000/test/',

     # files=file_dict)

     # 发送文件，定制文件名

     # file_dict = {

     #     'f1': ('test.txt', "hahsfaksfa9kasdjflaksdjf", 'application/text', {'k1': '0'})

     # }

     # requests.request(method='POST',

     #                  url='http://127.0.0.1:8000/test/',

     #                  files=file_dict)

     pass

 def param_auth():

     from requests.auth import HTTPBasicAuth, HTTPDigestAuth

     ret = requests.get('https://api.github.com/user', auth=HTTPBasicAuth('wupeiqi', 'sdfasdfasdf'))

     print(ret.text)

     # ret = requests.get('http://192.168.1.1',

     # auth=HTTPBasicAuth('admin', 'admin'))

     # ret.encoding = 'gbk'

     # print(ret.text)

     # ret = requests.get('http://httpbin.org/digest-auth/auth/user/pass', auth=HTTPDigestAuth('user', 'pass'))

     # print(ret)

     #

 def param_timeout():

     # ret = requests.get('http://google.com/', timeout=1)

     # print(ret)

     # ret = requests.get('http://google.com/', timeout=(5, 1))

     # print(ret)

     pass

 def param_allow_redirects():

     ret = requests.get('http://127.0.0.1:8000/test/', allow_redirects=False)

     print(ret.text)

 def param_proxies():

     # proxies = {

     # "http": "61.172.249.96:80",

     # "https": "http://61.185.219.126:3128",

     # }

     # proxies = {'http://10.20.1.128': 'http://10.10.1.10:5323'}

     # ret = requests.get("http://www.proxy360.cn/Proxy", proxies=proxies)

     # print(ret.headers)

     # from requests.auth import HTTPProxyAuth

     #

     # proxyDict = {

     # 'http': '77.75.105.165',

     # 'https': '77.75.105.165'

     # }

     # auth = HTTPProxyAuth('username', 'mypassword')

     #

     # r = requests.get("http://www.google.com", proxies=proxyDict, auth=auth)

     # print(r.text)

     pass

 def param_stream():

     ret = requests.get('http://127.0.0.1:8000/test/', stream=True)

     print(ret.content)

     ret.close()

     # from contextlib import closing

     # with closing(requests.get('http://httpbin.org/get', stream=True)) as r:

     # # 在此处理响应。

     # for i in r.iter_content():

     # print(i)

 def requests_session():

     import requests

     session = requests.Session()

     ### 1、首先登陆任何页面，获取cookie

     i1 = session.get(url="http://dig.chouti.com/help/service")

     ### 2、用户登陆，携带上一次的cookie，后台对cookie中的 gpsd 进行授权

     i2 = session.post(

         url="http://dig.chouti.com/login",

         data={

             'phone': "",

             'password': "xxxxxx",

             'oneMonth': ""

         }

     )

     i3 = session.post(

         url="http://dig.chouti.com/link/vote?linksId=8589623",

     )

     print(i3.text)

参数示例

6. requests模拟登陆GitHub

 import requests

 from bs4 import BeautifulSoup

 def login_github():

     """

     通过requests模块模拟浏览器登陆GitHub

     :return:

     """

     # 获取csrf_token

     r1 = requests.get('https://github.com/login')   # 获得get请求的对象

     s1 = BeautifulSoup(r1.text, 'html.parser')      # 使用bs4解析HTML对象

     token = s1.find('input', attrs={'name': 'authenticity_token'}).get('value')     # 获取登陆授权码，即csrf_token

     get_cookies = r1.cookies.get_dict()     # 获取get请求的cookies，post请求时必须携带

     # 发送post登陆请求

     '''

     post登陆参数

     commit    Sign+in

     utf8    ✓

     authenticity_token    E961jQMIyC9NPwL54YPj70gv2hbXWJ…fTUd+e4lT5RAizKbfzQo4eRHsfg==

     login    JackUpDown（用户名）

     password    **********（密码）

     '''

     r2 = requests.post(

         'https://github.com/session',

         data={

             'commit': 'Sign+in',

             'utf8': '✓',

             'authenticity_token': token,

             'login': 'JackUpDown',

             'password': '**********'

         },

         cookies=get_cookies     # 携带get请求的cookies

                        )

     login_cookies = r2.cookies.get_dict()   # 获得登陆成功的cookies，携带此cookies就可以访问任意GitHub页面

     # 携带post cookies跳转任意页面

     r3 = requests.get('https://github.com/settings/emails', cookies=login_cookies)

     print(r3.text)

转载自：

1. http://www.cnblogs.com/wupeiqi/articles/6283017.html

爬虫必备—requests的更多相关文章

网络爬虫必备知识之requests库
就库的范围,个人认为网络爬虫必备库知识包括urllib.requests.re.BeautifulSoup.concurrent.futures,接下来将结对requests库的使用方法进行总结 1. ...
网络爬虫必备知识之urllib库
就库的范围,个人认为网络爬虫必备库知识包括urllib.requests.re.BeautifulSoup.concurrent.futures,接下来将结合爬虫示例分别对urllib库的使用方法进行 ...
网络爬虫必备知识之concurrent.futures库
就库的范围,个人认为网络爬虫必备库知识包括urllib.requests.re.BeautifulSoup.concurrent.futures,接下来将结对concurrent.futures库的使 ...
Python爬虫之requests
爬虫之requests 库的基本用法基本请求: requests库提供了http所有的基本请求方式.例如 r = requests.post("http://httpbin.org/pos ...
第三百二十二节，web爬虫，requests请求
第三百二十二节,web爬虫,requests请求 requests请求,就是用yhthon的requests模块模拟浏览器请求,返回html源码模拟浏览器请求有两种,一种是不需要用户登录或者验证的请 ...
孤荷凌寒自学python第六十七天初步了解Python爬虫初识requests模块
孤荷凌寒自学python第六十七天初步了解Python爬虫初识requests模块 (完整学习过程屏幕记录视频地址在文末) 从今天起开始正式学习Python的爬虫. 今天已经初步了解了两个主要的模块: ...
Python爬虫练习(requests模块)
Python爬虫练习(requests模块) 关注公众号"轻松学编程"了解更多. 一.使用正则表达式解析页面和提取数据 1.爬取动态数据(js格式) 爬取http://fund.e ...
【Python爬虫】爬虫利器 requests 库小结
requests库 Requests 是一个 Python 的 HTTP 客户端库. 支持许多 HTTP 特性,可以非常方便地进行网页请求.网页分析和处理网页资源,拥有许多强大的功能. 本文主要介绍 ...
自定义 scrapy 爬虫的 requests
之前使用 scrapy 抓取数据的时候 ,默认是在逻辑中判断是否执行下一次请求 def parse(self): # 获取所有的url,例如获取到urls中 for url in urls: yiel ...

随机推荐

C#-集合及特殊集合——★★哈希表集合★★
集合的基本信息: System.Collections命名空间包含接口和类,这些接口和类定义各种对象(如列表.队列.位组数.哈希表和字典)的集合. System.Collections.Generic ...
QuantLib 金融计算——收益率曲线之构建曲线（1）
目录 QuantLib 金融计算--收益率曲线之构建曲线(1) YieldTermStructure DiscountCurve DiscountCurve 对象的构造 ZeroCurve ZeroC ...
Python基础部分的疑惑解析——运算符和数据类型（5）
运算符最后得到数值的: 算数运算符赋值运算符最后得到布尔值的: 成员运算符:in not in 逻辑运算符 and or 没有优先级就是按顺序执行比较运算符数据类型 1.整 ...
Python【每日一问】17
问: [基础题]:简述Python的异常处理机制[提高题]:请实现一个函数,将一个字符串中的空格替换成“%20”.例如,当字符串为We Are Happy.则经过替换之后的字符串为We%20Are%2 ...
在 Go 语言中使用 Log 包--转自GCTT
Linux 在许多方面相对于 Windows 来说都是独特的,在 Linux 中编写程序也不例外.标准输出,标准 err 和 null devices 的使用不仅是一个好主意,也是一个原则.如果您的程 ...
C++ class和struct的区别
class和struct定义类唯一的区别就是默认的访问权限. 如果我们使用struct关键字,则定义在第一个访问说明符之前的成员是public的:相反,如果我们使用class关键字,组这些成员是pri ...
（三）Audio子系统之AudioRecord.startRecording
在上一篇文章<(二)Audio子系统之new AudioRecord()>中已经介绍了Audio系统如何创建AudioRecord对象以及输入流,并创建了RecordThread线程,接下 ...
hdu 3709 Balanced Number(平衡数）--数位dp
Balanced Number Time Limit: 10000/5000 MS (Java/Others) Memory Limit: 65535/65535 K (Java/Others) ...
GB2312编码（为什么要加2020H、8080H，外码→内码→交换码→字形码）
为什么要加上2020H和8080H? 区位码.内码.国标码怎么转换非常简单,但是令人迷惑的是为什么要那么转换?这种转换不可能平白无故地那样转换! 我搜索很多资料,找到最好的解释,总结如下: 首先,注意 ...
java 初始化顺序问题
今天在<thinking in java>上面看了关于初始化问题,之前从来都没有深入考虑过,这次算是把它搞明白了,所以记录一下: 这个不是我看到的初始化顺序问题,在网上搜索的时候发现的,感 ...

爬虫必备—requests

1. GET请求

2. POST请求

3. 其它请求

4. 请求参数

5. 参数示例

6. requests模拟登陆GitHub

爬虫必备—requests的更多相关文章

随机推荐

热门专题