Q2Day79

requests

Python标准库中提供了：urllib、urllib2、httplib等模块以供Http请求，但是，它的 API 太渣了。它是为另一个时代、另一个互联网所创建的。它需要巨量的工作，甚至包括各种方法覆盖，来完成最简单的任务。

Requests 是使用 Apache2 Licensed 许可证的基于Python开发的HTTP 库，其在Python内置模块的基础上进行了高度的封装，从而使得Pythoner进行网络请求时，变得美好了许多，使用Requests可以轻而易举的完成浏览器可有的任何操作。

1、GET请求

# 1、无参数实例

import requests

ret = requests.get('https://github.com/timeline.json')

print ret.url

print ret.text

# 2、有参数实例

import requests

payload = {'key1': 'value1', 'key2': 'value2'}

ret = requests.get("http://httpbin.org/get", params=payload)

print ret.url

print ret.text

2、POST请求

# 1、基本POST实例

import requests

payload = {'key1': 'value1', 'key2': 'value2'}

ret = requests.post("http://httpbin.org/post", data=payload)

print ret.text

# 2、发送请求头和数据实例

import requests

import json

url = 'https://api.github.com/some/endpoint'

payload = {'some': 'data'}

headers = {'content-type': 'application/json'}

ret = requests.post(url, data=json.dumps(payload), headers=headers)

print ret.text

print ret.cookies

3、其他请求

requests.get(url, params=None, **kwargs)

requests.post(url, data=None, json=None, **kwargs)

requests.put(url, data=None, **kwargs)

requests.head(url, **kwargs)

requests.delete(url, **kwargs)

requests.patch(url, data=None, **kwargs)

requests.options(url, **kwargs)

# 以上方法均是在此方法的基础上构建

requests.request(method, url, **kwargs)

4、更多参数

def request(method, url, **kwargs):

    """Constructs and sends a :class:`Request <Request>`.

    :param method: method for the new :class:`Request` object.

    :param url: URL for the new :class:`Request` object.

    :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.

    :param data: (optional) Dictionary, bytes, or file-like object to send in the body of the :class:`Request`.

    :param json: (optional) json data to send in the body of the :class:`Request`.

    :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.

    :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.

    :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.

        ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``

        or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string

        defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers

        to add for the file.

    :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.

    :param timeout: (optional) How long to wait for the server to send data

        before giving up, as a float, or a :ref:`(connect timeout, read

        timeout) <timeouts>` tuple.

    :type timeout: float or tuple

    :param allow_redirects: (optional) Boolean. Set to True if POST/PUT/DELETE redirect following is allowed.

    :type allow_redirects: bool

    :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.

    :param verify: (optional) whether the SSL cert will be verified. A CA_BUNDLE path can also be provided. Defaults to ``True``.

    :param stream: (optional) if ``False``, the response content will be immediately downloaded.

    :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.

    :return: :class:`Response <Response>` object

    :rtype: requests.Response

    Usage::

      >>> import requests

      >>> req = requests.request('GET', 'http://httpbin.org/get')

      <Response [200]>

    """

参数列表

def param_method_url():

    # requests.request(method='get', url='http://127.0.0.1:8000/test/')

    # requests.request(method='post', url='http://127.0.0.1:8000/test/')

    pass

def param_param():

    # - 可以是字典

    # - 可以是字符串

    # - 可以是字节（ascii编码以内）

    # requests.request(method='get',

    # url='http://127.0.0.1:8000/test/',

    # params={'k1': 'v1', 'k2': '水电费'})

    # requests.request(method='get',

    # url='http://127.0.0.1:8000/test/',

    # params="k1=v1&k2=水电费&k3=v3&k3=vv3")

    # requests.request(method='get',

    # url='http://127.0.0.1:8000/test/',

    # params=bytes("k1=v1&k2=k2&k3=v3&k3=vv3", encoding='utf8'))

    # 错误

    # requests.request(method='get',

    # url='http://127.0.0.1:8000/test/',

    # params=bytes("k1=v1&k2=水电费&k3=v3&k3=vv3", encoding='utf8'))

    pass

def param_data():

    # 可以是字典

    # 可以是字符串

    # 可以是字节

    # 可以是文件对象

    # requests.request(method='POST',

    # url='http://127.0.0.1:8000/test/',

    # data={'k1': 'v1', 'k2': '水电费'})

    # requests.request(method='POST',

    # url='http://127.0.0.1:8000/test/',

    # data="k1=v1; k2=v2; k3=v3; k3=v4"

    # )

    # requests.request(method='POST',

    # url='http://127.0.0.1:8000/test/',

    # data="k1=v1;k2=v2;k3=v3;k3=v4",

    # headers={'Content-Type': 'application/x-www-form-urlencoded'}

    # )

    # requests.request(method='POST',

    # url='http://127.0.0.1:8000/test/',

    # data=open('data_file.py', mode='r', encoding='utf-8'), # 文件内容是：k1=v1;k2=v2;k3=v3;k3=v4

    # headers={'Content-Type': 'application/x-www-form-urlencoded'}

    # )

    pass

def param_json():

    # 将json中对应的数据进行序列化成一个字符串，json.dumps(...)

    # 然后发送到服务器端的body中，并且Content-Type是 {'Content-Type': 'application/json'}

    requests.request(method='POST',

                     url='http://127.0.0.1:8000/test/',

                     json={'k1': 'v1', 'k2': '水电费'})

def param_headers():

    # 发送请求头到服务器端

    requests.request(method='POST',

                     url='http://127.0.0.1:8000/test/',

                     json={'k1': 'v1', 'k2': '水电费'},

                     headers={'Content-Type': 'application/x-www-form-urlencoded'}

                     )

def param_cookies():

    # 发送Cookie到服务器端

    requests.request(method='POST',

                     url='http://127.0.0.1:8000/test/',

                     data={'k1': 'v1', 'k2': 'v2'},

                     cookies={'cook1': 'value1'},

                     )

    # 也可以使用CookieJar（字典形式就是在此基础上封装）

    from http.cookiejar import CookieJar

    from http.cookiejar import Cookie

    obj = CookieJar()

    obj.set_cookie(Cookie(version=0, name='c1', value='v1', port=None, domain='', path='/', secure=False, expires=None,

                          discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False,

                          port_specified=False, domain_specified=False, domain_initial_dot=False, path_specified=False)

                   )

    requests.request(method='POST',

                     url='http://127.0.0.1:8000/test/',

                     data={'k1': 'v1', 'k2': 'v2'},

                     cookies=obj)

def param_files():

    # 发送文件

    # file_dict = {

    # 'f1': open('readme', 'rb')

    # }

    # requests.request(method='POST',

    # url='http://127.0.0.1:8000/test/',

    # files=file_dict)

    # 发送文件，定制文件名

    # file_dict = {

    # 'f1': ('test.txt', open('readme', 'rb'))

    # }

    # requests.request(method='POST',

    # url='http://127.0.0.1:8000/test/',

    # files=file_dict)

    # 发送文件，定制文件名

    # file_dict = {

    # 'f1': ('test.txt', "hahsfaksfa9kasdjflaksdjf")

    # }

    # requests.request(method='POST',

    # url='http://127.0.0.1:8000/test/',

    # files=file_dict)

    # 发送文件，定制文件名

    # file_dict = {

    #     'f1': ('test.txt', "hahsfaksfa9kasdjflaksdjf", 'application/text', {'k1': '0'})

    # }

    # requests.request(method='POST',

    #                  url='http://127.0.0.1:8000/test/',

    #                  files=file_dict)

    pass

def param_auth():

    from requests.auth import HTTPBasicAuth, HTTPDigestAuth

    ret = requests.get('https://api.github.com/user', auth=HTTPBasicAuth('wupeiqi', 'sdfasdfasdf'))

    print(ret.text)

    # ret = requests.get('http://192.168.1.1',

    # auth=HTTPBasicAuth('admin', 'admin'))

    # ret.encoding = 'gbk'

    # print(ret.text)

    # ret = requests.get('http://httpbin.org/digest-auth/auth/user/pass', auth=HTTPDigestAuth('user', 'pass'))

    # print(ret)

    #

def param_timeout():

    # ret = requests.get('http://google.com/', timeout=1)

    # print(ret)

    # ret = requests.get('http://google.com/', timeout=(5, 1))

    # print(ret)

    pass

def param_allow_redirects():

    ret = requests.get('http://127.0.0.1:8000/test/', allow_redirects=False)

    print(ret.text)

def param_proxies():

    # proxies = {

    # "http": "61.172.249.96:80",

    # "https": "http://61.185.219.126:3128",

    # }

    # proxies = {'http://10.20.1.128': 'http://10.10.1.10:5323'}

    # ret = requests.get("http://www.proxy360.cn/Proxy", proxies=proxies)

    # print(ret.headers)

    # from requests.auth import HTTPProxyAuth

    #

    # proxyDict = {

    # 'http': '77.75.105.165',

    # 'https': '77.75.105.165'

    # }

    # auth = HTTPProxyAuth('username', 'mypassword')

    #

    # r = requests.get("http://www.google.com", proxies=proxyDict, auth=auth)

    # print(r.text)

    pass

def param_stream():

    ret = requests.get('http://127.0.0.1:8000/test/', stream=True)

    print(ret.content)

    ret.close()

    # from contextlib import closing

    # with closing(requests.get('http://httpbin.org/get', stream=True)) as r:

    # # 在此处理响应。

    # for i in r.iter_content():

    # print(i)

def requests_session():

    import requests

    session = requests.Session()

    ### 1、首先登陆任何页面，获取cookie

    i1 = session.get(url="http://dig.chouti.com/help/service")

    ### 2、用户登陆，携带上一次的cookie，后台对cookie中的 gpsd 进行授权

    i2 = session.post(

        url="http://dig.chouti.com/login",

        data={

            'phone': "",

            'password': "xxxxxx",

            'oneMonth': ""

        }

    )

    i3 = session.post(

        url="http://dig.chouti.com/link/vote?linksId=8589623",

    )

    print(i3.text)

参数示例

官方文档：http://cn.python-requests.org/zh_CN/latest/user/quickstart.html#id4

BeautifulSoup

BeautifulSoup是一个模块，该模块用于接收一个HTML或XML字符串，然后将其进行格式化，之后遍可以使用他提供的方法进行快速查找指定元素，从而使得在HTML或XML中查找指定元素变得简单。

from bs4 import BeautifulSoup

html_doc = """

<html><head><title>The Dormouse's story</title></head>

<body>

asdf

    <div class="title">

        <b>The Dormouse's story总共</b>

        <h1>f</h1>

    </div>

<div class="story">Once upon a time there were three little sisters; and their names were

    <a  class="sister0" id="link1">Els<span>f</span>ie</a>,

    <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and

    <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;

and they lived at the bottom of a well.</div>

ad<br/>sf

<p class="story">...</p>

</body>

</html>

"""

soup = BeautifulSoup(html_doc, features="lxml")

# 找到第一个a标签

tag1 = soup.find(name='a')

# 找到所有的a标签

tag2 = soup.find_all(name='a')

# 找到id＝link2的标签

tag3 = soup.select('#link2')

安装：

pip3 install beautifulsoup4

使用示例：

from bs4 import BeautifulSoup

html_doc = """

<html><head><title>The Dormouse's story</title></head>

<body>

    ...

</body>

</html>

"""

soup = BeautifulSoup(html_doc, features="lxml")

1. name，标签名称

# tag = soup.find('a')

# name = tag.name # 获取

# print(name)

# tag.name = 'span' # 设置

# print(soup)

2. attr，标签属性

# tag = soup.find('a')

# attrs = tag.attrs    # 获取

# print(attrs)

# tag.attrs = {'ik':123} # 设置

# tag.attrs['id'] = 'iiiii' # 设置

# print(soup)

3. children,所有子标签

# body = soup.find('body')

# v = body.children

4. children,所有子子孙孙标签

# body = soup.find('body')

# v = body.descendants

5. clear,将标签的所有子标签全部清空（保留标签名）

# tag = soup.find('body')

# tag.clear()

# print(soup)

6. decompose,递归的删除所有的标签

# body = soup.find('body')

# body.decompose()

# print(soup)

7. extract,递归的删除所有的标签，并获取删除的标签

# body = soup.find('body')

# v = body.extract()

# print(soup)

8. decode,转换为字符串（含当前标签）；decode_contents（不含当前标签）

# body = soup.find('body')

# v = body.decode()

# v = body.decode_contents()

# print(v)

9. encode,转换为字节（含当前标签）；encode_contents（不含当前标签）

# body = soup.find('body')

# v = body.encode()

# v = body.encode_contents()

# print(v)

10. find,获取匹配的第一个标签

# tag = soup.find('a')

# print(tag)

# tag = soup.find(name='a', attrs={'class': 'sister'}, recursive=True, text='Lacie')

# tag = soup.find(name='a', class_='sister', recursive=True, text='Lacie')

# print(tag)

11. find_all,获取匹配的所有标签

# tags = soup.find_all('a')

# print(tags)

# tags = soup.find_all('a',limit=1)

# print(tags)

# tags = soup.find_all(name='a', attrs={'class': 'sister'}, recursive=True, text='Lacie')

# # tags = soup.find(name='a', class_='sister', recursive=True, text='Lacie')

# print(tags)

# ####### 列表 #######

# v = soup.find_all(name=['a','div'])

# print(v)

# v = soup.find_all(class_=['sister0', 'sister'])

# print(v)

# v = soup.find_all(text=['Tillie'])

# print(v, type(v[0]))

# v = soup.find_all(id=['link1','link2'])

# print(v)

# v = soup.find_all(href=['link1','link2'])

# print(v)

# ####### 正则 #######

import re

# rep = re.compile('p')

# rep = re.compile('^p')

# v = soup.find_all(name=rep)

# print(v)

# rep = re.compile('sister.*')

# v = soup.find_all(class_=rep)

# print(v)

# rep = re.compile('http://www.oldboy.com/static/.*')

# v = soup.find_all(href=rep)

# print(v)

# ####### 方法筛选 #######

# def func(tag):

# return tag.has_attr('class') and tag.has_attr('id')

# v = soup.find_all(name=func)

# print(v)

# ## get,获取标签属性

# tag = soup.find('a')

# v = tag.get('id')

# print(v)

12. has_attr,检查标签是否具有该属性

# tag = soup.find('a')

# v = tag.has_attr('id')

# print(v)

13. get_text,获取标签内部文本内容

# tag = soup.find('a')

# v = tag.get_text('id')

# print(v)

14. index,检查标签在某标签中的索引位置

# tag = soup.find('body')

# v = tag.index(tag.find('div'))

# print(v)

# tag = soup.find('body')

# for i,v in enumerate(tag):

# print(i,v)

15. is_empty_element,是否是空标签(是否可以是空)或者自闭合标签，

判断是否是如下标签：'br' , 'hr', 'input', 'img', 'meta','spacer', 'link', 'frame', 'base'

# tag = soup.find('br')

# v = tag.is_empty_element

# print(v)

16. 当前的关联标签

# soup.next

# soup.next_element

# soup.next_elements

# soup.next_sibling

# soup.next_siblings

#

# tag.previous

# tag.previous_element

# tag.previous_elements

# tag.previous_sibling

# tag.previous_siblings

#

# tag.parent

# tag.parents

17. 查找某标签的关联标签

# tag.find_next(...)

# tag.find_all_next(...)

# tag.find_next_sibling(...)

# tag.find_next_siblings(...)

# tag.find_previous(...)

# tag.find_all_previous(...)

# tag.find_previous_sibling(...)

# tag.find_previous_siblings(...)

# tag.find_parent(...)

# tag.find_parents(...)

# 参数同find_all

18. select,select_one, CSS选择器

soup.select("title")

soup.select("p nth-of-type(3)")

soup.select("body a")

soup.select("html head title")

tag = soup.select("span,a")

soup.select("head > title")

soup.select("p > a")

soup.select("p > a:nth-of-type(2)")

soup.select("p > #link1")

soup.select("body > a")

soup.select("#link1 ~ .sister")

soup.select("#link1 + .sister")

soup.select(".sister")

soup.select("[class~=sister]")

soup.select("#link1")

soup.select("a#link2")

soup.select('a[href]')

soup.select('a[href="http://example.com/elsie"]')

soup.select('a[href^="http://example.com/"]')

soup.select('a[href$="tillie"]')

soup.select('a[href*=".com/el"]')

from bs4.element import Tag

def default_candidate_generator(tag):

    for child in tag.descendants:

        if not isinstance(child, Tag):

            continue

        if not child.has_attr('href'):

            continue

        yield child

tags = soup.find('body').select("a", _candidate_generator=default_candidate_generator)

print(type(tags), tags)

from bs4.element import Tag

def default_candidate_generator(tag):

    for child in tag.descendants:

        if not isinstance(child, Tag):

            continue

        if not child.has_attr('href'):

            continue

        yield child

tags = soup.find('body').select("a", _candidate_generator=default_candidate_generator, limit=1)

print(type(tags), tags)

19. 标签的内容

# tag = soup.find('span')

# print(tag.string)          # 获取

# tag.string = 'new content' # 设置

# print(soup)

# tag = soup.find('body')

# print(tag.string)

# tag.string = 'xxx'

# print(soup)

# tag = soup.find('body')

# v = tag.stripped_strings  # 递归内部获取所有标签的文本

# print(v)

20.append在当前标签内部追加一个标签

# tag = soup.find('body')

# tag.append(soup.find('a'))

# print(soup)

#

# from bs4.element import Tag

# obj = Tag(name='i',attrs={'id': 'it'})

# obj.string = '我是一个新来的'

# tag = soup.find('body')

# tag.append(obj)

# print(soup)

21.insert在当前标签内部指定位置插入一个标签

# from bs4.element import Tag

# obj = Tag(name='i', attrs={'id': 'it'})

# obj.string = '我是一个新来的'

# tag = soup.find('body')

# tag.insert(2, obj)

# print(soup)

22. insert_after,insert_before 在当前标签后面或前面插入

# from bs4.element import Tag

# obj = Tag(name='i', attrs={'id': 'it'})

# obj.string = '我是一个新来的'

# tag = soup.find('body')

# # tag.insert_before(obj)

# tag.insert_after(obj)

# print(soup)

23. replace_with 在当前标签替换为指定标签

# from bs4.element import Tag

# obj = Tag(name='i', attrs={'id': 'it'})

# obj.string = '我是一个新来的'

# tag = soup.find('div')

# tag.replace_with(obj)

# print(soup)

24. 创建标签之间的关系

# tag = soup.find('div')

# a = soup.find('a')

# tag.setup(previous_sibling=a)

# print(tag.previous_sibling)

25. wrap，将指定标签把当前标签包裹起来

# from bs4.element import Tag

# obj1 = Tag(name='div', attrs={'id': 'it'})

# obj1.string = '我是一个新来的'

#

# tag = soup.find('a')

# v = tag.wrap(obj1)

# print(soup)

# tag = soup.find('a')

# v = tag.wrap(soup.find('p'))

# print(soup)

26. unwrap，去掉当前标签，将保留其包裹的标签

# tag = soup.find('a')

# v = tag.unwrap()

# print(soup)

更多参数官方：http://beautifulsoup.readthedocs.io/zh_CN/v4.4.0/

一大波"自动登陆"示例

#!/usr/bin/env python

# -*- coding:utf-8 -*-

import requests

# ############## 方式一 ##############

"""

# ## 1、首先登陆任何页面，获取cookie

i1 = requests.get(url="http://dig.chouti.com/help/service")

i1_cookies = i1.cookies.get_dict()

# ## 2、用户登陆，携带上一次的cookie，后台对cookie中的 gpsd 进行授权

i2 = requests.post(

    url="http://dig.chouti.com/login",

    data={

        'phone': "8615131255089",

        'password': "xxooxxoo",

        'oneMonth': ""

    },

    cookies=i1_cookies

)

# ## 3、点赞（只需要携带已经被授权的gpsd即可）

gpsd = i1_cookies['gpsd']

i3 = requests.post(

    url="http://dig.chouti.com/link/vote?linksId=8589523",

    cookies={'gpsd': gpsd}

)

print(i3.text)

"""

# ############## 方式二 ##############

"""

import requests

session = requests.Session()

i1 = session.get(url="http://dig.chouti.com/help/service")

i2 = session.post(

    url="http://dig.chouti.com/login",

    data={

        'phone': "8615131255089",

        'password': "xxooxxoo",

        'oneMonth': ""

    }

)

i3 = session.post(

    url="http://dig.chouti.com/link/vote?linksId=8589523"

)

print(i3.text)

"""

抽屉新热榜

#!/usr/bin/env python

# -*- coding:utf-8 -*-

import requests

from bs4 import BeautifulSoup

# ############## 方式一 ##############

#

# # 1. 访问登陆页面，获取 authenticity_token

# i1 = requests.get('https://github.com/login')

# soup1 = BeautifulSoup(i1.text, features='lxml')

# tag = soup1.find(name='input', attrs={'name': 'authenticity_token'})

# authenticity_token = tag.get('value')

# c1 = i1.cookies.get_dict()

# i1.close()

#

# # 1. 携带authenticity_token和用户名密码等信息，发送用户验证

# form_data = {

# "authenticity_token": authenticity_token,

#     "utf8": "",

#     "commit": "Sign in",

#     "login": "wupeiqi@live.com",

#     'password': 'xxoo'

# }

#

# i2 = requests.post('https://github.com/session', data=form_data, cookies=c1)

# c2 = i2.cookies.get_dict()

# c1.update(c2)

# i3 = requests.get('https://github.com/settings/repositories', cookies=c1)

#

# soup3 = BeautifulSoup(i3.text, features='lxml')

# list_group = soup3.find(name='div', class_='listgroup')

#

# from bs4.element import Tag

#

# for child in list_group.children:

#     if isinstance(child, Tag):

#         project_tag = child.find(name='a', class_='mr-1')

#         size_tag = child.find(name='small')

#         temp = "项目:%s(%s); 项目路径:%s" % (project_tag.get('href'), size_tag.string, project_tag.string, )

#         print(temp)

# ############## 方式二 ##############

# session = requests.Session()

# # 1. 访问登陆页面，获取 authenticity_token

# i1 = session.get('https://github.com/login')

# soup1 = BeautifulSoup(i1.text, features='lxml')

# tag = soup1.find(name='input', attrs={'name': 'authenticity_token'})

# authenticity_token = tag.get('value')

# c1 = i1.cookies.get_dict()

# i1.close()

#

# # 1. 携带authenticity_token和用户名密码等信息，发送用户验证

# form_data = {

#     "authenticity_token": authenticity_token,

#     "utf8": "",

#     "commit": "Sign in",

#     "login": "wupeiqi@live.com",

#     'password': 'xxoo'

# }

#

# i2 = session.post('https://github.com/session', data=form_data)

# c2 = i2.cookies.get_dict()

# c1.update(c2)

# i3 = session.get('https://github.com/settings/repositories')

#

# soup3 = BeautifulSoup(i3.text, features='lxml')

# list_group = soup3.find(name='div', class_='listgroup')

#

# from bs4.element import Tag

#

# for child in list_group.children:

#     if isinstance(child, Tag):

#         project_tag = child.find(name='a', class_='mr-1')

#         size_tag = child.find(name='small')

#         temp = "项目:%s(%s); 项目路径:%s" % (project_tag.get('href'), size_tag.string, project_tag.string, )

#         print(temp)

github

#!/usr/bin/env python

# -*- coding:utf-8 -*-

import time

import requests

from bs4 import BeautifulSoup

session = requests.Session()

i1 = session.get(

    url='https://www.zhihu.com/#signin',

    headers={

        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',

    }

)

soup1 = BeautifulSoup(i1.text, 'lxml')

xsrf_tag = soup1.find(name='input', attrs={'name': '_xsrf'})

xsrf = xsrf_tag.get('value')

current_time = time.time()

i2 = session.get(

    url='https://www.zhihu.com/captcha.gif',

    params={'r': current_time, 'type': 'login'},

    headers={

        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',

    })

with open('zhihu.gif', 'wb') as f:

    f.write(i2.content)

captcha = input('请打开zhihu.gif文件，查看并输入验证码：')

form_data = {

    "_xsrf": xsrf,

    'password': 'xxooxxoo',

    "captcha": 'captcha',

    'email': '424662508@qq.com'

}

i3 = session.post(

    url='https://www.zhihu.com/login/email',

    data=form_data,

    headers={

        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',

    }

)

i4 = session.get(

    url='https://www.zhihu.com/settings/profile',

    headers={

        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',

    }

)

soup4 = BeautifulSoup(i4.text, 'lxml')

tag = soup4.find(id='rename-section')

nick_name = tag.find('span',class_='name').string

print(nick_name)

知乎

#!/usr/bin/env python

# -*- coding:utf-8 -*-

import re

import json

import base64

import rsa

import requests

def js_encrypt(text):

    b64der = 'MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCp0wHYbg/NOPO3nzMD3dndwS0MccuMeXCHgVlGOoYyFwLdS24Im2e7YyhB0wrUsyYf0/nhzCzBK8ZC9eCWqd0aHbdgOQT6CuFQBMjbyGYvlVYU2ZP7kG9Ft6YV6oc9ambuO7nPZh+bvXH0zDKfi02prknrScAKC0XhadTHT3Al0QIDAQAB'

    der = base64.standard_b64decode(b64der)

    pk = rsa.PublicKey.load_pkcs1_openssl_der(der)

    v1 = rsa.encrypt(bytes(text, 'utf8'), pk)

    value = base64.encodebytes(v1).replace(b'\n', b'')

    value = value.decode('utf8')

    return value

session = requests.Session()

i1 = session.get('https://passport.cnblogs.com/user/signin')

rep = re.compile("'VerificationToken': '(.*)'")

v = re.search(rep, i1.text)

verification_token = v.group(1)

form_data = {

    'input1': js_encrypt('wptawy'),

    'input2': js_encrypt('asdfasdf'),

    'remember': False

}

i2 = session.post(url='https://passport.cnblogs.com/user/signin',

                  data=json.dumps(form_data),

                  headers={

                      'Content-Type': 'application/json; charset=UTF-8',

                      'X-Requested-With': 'XMLHttpRequest',

                      'VerificationToken': verification_token}

                  )

i3 = session.get(url='https://i.cnblogs.com/EditDiary.aspx')

print(i3.text)

博客园

#!/usr/bin/env python

# -*- coding:utf-8 -*-

import requests

# 第一步：访问登陆页,拿到X_Anti_Forge_Token，X_Anti_Forge_Code

# 1、请求url:https://passport.lagou.com/login/login.html

# 2、请求方法:GET

# 3、请求头:

#    User-agent

r1 = requests.get('https://passport.lagou.com/login/login.html',

                 headers={

                     'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',

                 },

                 )

X_Anti_Forge_Token = re.findall("X_Anti_Forge_Token = '(.*?)'", r1.text, re.S)[0]

X_Anti_Forge_Code = re.findall("X_Anti_Forge_Code = '(.*?)'", r1.text, re.S)[0]

print(X_Anti_Forge_Token, X_Anti_Forge_Code)

# print(r1.cookies.get_dict())

# 第二步：登陆

# 1、请求url:https://passport.lagou.com/login/login.json

# 2、请求方法:POST

# 3、请求头:

#    cookie

#    User-agent

#    Referer:https://passport.lagou.com/login/login.html

#    X-Anit-Forge-Code:53165984

#    X-Anit-Forge-Token:3b6a2f62-80f0-428b-8efb-ef72fc100d78

#    X-Requested-With:XMLHttpRequest

# 4、请求体：

# isValidate:true

# username:15131252215

# password:ab18d270d7126ea65915c50288c22c0d

# request_form_verifyCode:''

# submit:''

r2 = requests.post(

    'https://passport.lagou.com/login/login.json',

    headers={

        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',

        'Referer': 'https://passport.lagou.com/login/login.html',

        'X-Anit-Forge-Code': X_Anti_Forge_Code,

        'X-Anit-Forge-Token': X_Anti_Forge_Token,

        'X-Requested-With': 'XMLHttpRequest'

    },

    data={

        "isValidate": True,

        'username': '',

        'password': 'ab18d270d7126ea65915c50288c22c0d',

        'request_form_verifyCode': '',

        'submit': ''

    },

    cookies=r1.cookies.get_dict()

)

print(r2.text)

拉勾网

拉钩参考：猛击这里

Q2Day79的更多相关文章

python学习博客地址集合。。。
python学习博客地址集合... 老师讲课博客目录 http://www.bootcdn.cn/bootstrap/ bootstrap cdn在线地址 http://www.cnblogs. ...
老男孩老师的博客地址 - 转自devops1992
害怕他那天不让人看了,所以我就复制一份到我自己的博客里. http://www.bootcdn.cn/bootstrap/ bootstrap cdn在线地址 http://www.cnblogs. ...

随机推荐

JMeter4.0以上分布式测试报错 "server failed start Listen failed on port"
使用JMeter4.0做分布式测试的是否,我的电脑作为肉鸡(执行机),双击jmeter-server.bat后显示失败 Found ApacheJMeter_core.jarUsing local p ...
SqlServer Where后面Case When的使用实例
SqlServer一个(用户表:a)中有两个字段都是用户ID 第一个ID是(收费员:id_remitter) 第二个ID是(退费员:id_returner) (收费表:b) 如何根据是否退费(F_RE ...
六十七：flask上下文之Local线程隔离对象
Local对象在flask中,类似于request对象,其实是绑定到了werkzeug.local.Local对象上,这样即使是同一个对象,在多线程中都是隔离的,类似的对象还有session以及g对象 ...
前端之路（一）之W3C是什么？
W3C 指万维网联盟(World Wide Web Consortium) W3C 最重要的工作是发展 Web 规范(称为推荐,Recommendations),这些规范描述了 Web 的通信协议(比 ...
Linux scp 免密码传输文件
Linux scp 免密码传输文件背景介绍最近项目是集群化部署(由 node1,node2,node3 三台 CentOS 7.4 的虚拟机构成). 但是,涉及到跨机器同步文件的问题,想通过写s ...
Python——GUI编程控件及常用信号
QSlider类中的常用信号 valueChanged: 当滑块位置发生改变时触发此信号 sliderPressed: 当用户按下滑块时触发此信号 sliderMoved: 当用户拖动滑块时触发此信号 ...
Python中针对函数处理的特殊方法
Python中针对函数处理的特殊方法很多语言都提供了对参数或变量进行处理的机制,作为灵活的Python,提供了一些针对函数处理的特殊方法 filter(function, sequence):对se ...
记：倍福（CP2611 Control Panel）了解
型号:CP2611 Control Panel Multitouch 11 为啥选型?嗯!因为不了解,了解了,作为只运行.net客户端窗体程序,谁会选用他,不是说他不好,反而相反,他是很优秀的嵌入式集 ...
RedisTemplate5种数据结构操作
1 操作字符串 redisTemplate.opsForValue(); 2 操作hash redisTemplate.opsForHash(); 3 操作list redisTemplate.ops ...
【DSP开发】解读TI的KeyStone II云技术应用
最近,德州仪器(TI)公司推出6款最新KeyStone II多核SoC,助力云应用.TI公司多核DSP中国市场开发经理蒋亚坚先生向媒体讲解了这6款KeyStone II新产品的特点与目标应用. ...