从0开始学爬虫10之urllib和requests库与github/api的交互

urllib库的使用

# coding=utf-8

import urllib2

import urllib

# htpbin模拟的环境

URL_IP="http://10.11.0.215:8080"

URL_GET = "http://10.11.0.215:8080/get"

def use_simple_urllib2():

    response = urllib2.urlopen(URL_IP)

    print '>>>> Response Headers:'

    print response.info()

    print '>>>>Response Body:'

    print ''.join([line for line in response.readlines()])

def use_params_urllib2():

    # 构建请求参数

    params = urllib.urlencode({'param1': 'hello','param2': 'world'})

    print 'Request Params:'

    print params

    # 发送请求

    response = urllib2.urlopen('?'.join([URL_GET, '%s']) % params)

    # 处理响应

    print '>>>Response Headers:'

    print response.info()

    print '>>>Status code'

    print response.getcode()

    print '>>>Response Body'

    print ''.join([line for line in response.readlines()])

    # print response.readlines()

if __name__ == '__main__':

    # print '>>>Use simple urllib2'

    # use_simple_urllib2()

    print '>>>Use params urllib2'

    use_params_urllib2()

requests库的简单使用

# coding=utf-8

import requests

URL_IP="http://10.11.0.215:8080/ip"

URL_GET="http://10.11.0.215:8080/get"

def use_simple_requests():

    response = requests.get(URL_IP)

    print ">>>Response Headers:"

    print response.headers

    print ">>>Response Code:"

    print response.status_code

    print "Response Body:"

    print response.text

def use_params_requests():

    response = requests.get(URL_GET)

    print ">>>Response Headers:"

    print response.headers

    print ">>>Response Code:"

    print response.status_code

    print response.reason

    print "Response Body:"

    print response.json()

if __name__ == "__main__":

    # print "simple requests:"

    # use_simple_requests()

    print "params requests:"

    use_params_requests()

requests和github api的互动

# coding=utf-8

import json

import requests

from requests import exceptions

URL = "https://api.github.com"

def build_uri(endpoint):

    # 拼凑url为最终的api路径

    return '/'.join([URL, endpoint])

def better_print(json_str):

    # 格式化输出， indent=4是缩进为4个空格

    return json.dumps(json.loads(json_str), indent = 4)

def request_method():

    # 获取用户信息

    # response = requests.get(build_uri('users/reblue520'))

    # response = requests.get(build_uri('user/emails'), auth=('reblue520', 'reblue520'))

    response = requests.get(build_uri('user/public_emails'), auth=('reblue520', 'reblue520'))

    print(better_print(response.text))

def params_request():

    response = requests.get(build_uri('users'), params={'since':11})

    print better_print(response.text)

    print response.request.headers

    print response.url

def json_request():

    # 更新用户信息，邮箱必须是已经验证过的邮箱

    # response = requests.patch(build_uri('user'), auth=('reblue520','reblue520'),json={'name':'hellojack2019','email':'reblue520@163.com'})

    response = requests.post(build_uri('user/emails'), auth=('reblue520','Reblue0225520'),json=['hellojack2019@163.com'])

    print better_print(response.text)

    print response.request.headers

    print response.request.body

    print response.status_code

def timeout_request():

    # api异常处理：超时

    try:

        response = requests.get(build_uri('user/emails'), timeout=10)

        response.raise_for_status()

    except exceptions.Timeout as e:

        print e.message

    except exceptions.HTTPError as e:

        print e.message

    else:

        print response.status_code

        print response.text

def hard_requests():

    # 自定义request

    from requests import Request, Session

    s = Session()

    headers = {'User-Agent': 'fake1.3.4'}

    req = Request('GET', build_uri('user/emails'), auth=('reblue520', 'Reblue0225520'), headers=headers)

    prepped = req.prepare()

    print prepped.body

    print prepped.headers

    resp = s.send(prepped, timeout = 5)

    print resp.status_code

    print resp.request.headers

    print resp.text

if __name__ == '__main__':

    # request_method()

    # params_request()

    # json_request()

    # timeout_request()

    hard_requests()

response响应的常用api

响应的基本API

In []: import requests                                                                                                                                                                                              

In []: response = requests.get("https://api.github.com")                                                                                                                                                            

In []: response.status_code

Out[]: 

In []: response.reason

Out[]: 'OK'

In []: response.headers

Out[]: {'Date': 'Sat, 20 Jul 2019 03:48:51 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Server': 'GitHub.com', 'Status': '200 OK', 'X-RateLimit-Limit': '', 'X-RateLimit-Remaining': '', 'X-RateLimit-Reset': '', 'Cache-Control': 'public, max-age=60, s-maxage=60', 'Vary': 'Accept, Accept-Encoding', 'ETag': 'W/"7dc470913f1fe9bb6c7355b50a0737bc"', 'X-GitHub-Media-Type': 'github.v3; format=json', 'Access-Control-Expose-Headers': 'ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type', 'Access-Control-Allow-Origin': '*', 'Strict-Transport-Security': 'max-age=31536000; includeSubdomains; preload', 'X-Frame-Options': 'deny', 'X-Content-Type-Options': 'nosniff', 'X-XSS-Protection': '1; mode=block', 'Referrer-Policy': 'origin-when-cross-origin, strict-origin-when-cross-origin', 'Content-Security-Policy': "default-src 'none'", 'Content-Encoding': 'gzip', 'X-GitHub-Request-Id': '33D9:591B:9D084B:CF860E:5D328F23'}

In []: response.url

Out[]: 'https://api.github.com/'

In []: response.history

Out[]: []

In []: response = requests.get("http://api.github.com")                                                                                                                                                             

In []: response.history

Out[]: [<Response []>]

In []: response = requests.get("https://api.github.com")                                                                                                                                                           

In []: response.elapsed

Out[]: datetime.timedelta(microseconds=)

In []: response.request

Out[]: <PreparedRequest [GET]>

In []: response.request.headers

Out[]: {'User-Agent': 'python-requests/2.22.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}

In []: response.encoding

Out[]: 'utf-8'

In []: response.raw.read()

Out[]: b''

In []: response.content

Out[]: b'{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}","emails_url":"https://api.github.com/user/emails","emojis_url":"https://api.github.com/emojis","events_url":"https://api.github.com/events","feeds_url":"https://api.github.com/feeds","followers_url":"https://api.github.com/user/followers","following_url":"https://api.github.com/user/following{/target}","gists_url":"https://api.github.com/gists{/gist_id}","hub_url":"https://api.github.com/hub","issue_search_url":"https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}","issues_url":"https://api.github.com/issues","keys_url":"https://api.github.com/user/keys","notifications_url":"https://api.github.com/notifications","organization_repositories_url":"https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}","organization_url":"https://api.github.com/orgs/{org}","public_gists_url":"https://api.github.com/gists/public","rate_limit_url":"https://api.github.com/rate_limit","repository_url":"https://api.github.com/repos/{owner}/{repo}","repository_search_url":"https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}","current_user_repositories_url":"https://api.github.com/user/repos{?type,page,per_page,sort}","starred_url":"https://api.github.com/user/starred{/owner}{/repo}","starred_gists_url":"https://api.github.com/gists/starred","team_url":"https://api.github.com/teams","user_url":"https://api.github.com/users/{user}","user_organizations_url":"https://api.github.com/user/orgs","user_repositories_url":"https://api.github.com/users/{user}/repos{?type,page,per_page,sort}","user_search_url":"https://api.github.com/search/users?q={query}{&page,per_page,sort,order}"}'

In []: response.json()

Out[]:

{'current_user_url': 'https://api.github.com/user',

 'current_user_authorizations_html_url': 'https://github.com/settings/connections/applications{/client_id}',

 'authorizations_url': 'https://api.github.com/authorizations',

 'code_search_url': 'https://api.github.com/search/code?q={query}{&page,per_page,sort,order}',

 'commit_search_url': 'https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}',

 'emails_url': 'https://api.github.com/user/emails',

 'emojis_url': 'https://api.github.com/emojis',

 'events_url': 'https://api.github.com/events',

 'feeds_url': 'https://api.github.com/feeds',

 'followers_url': 'https://api.github.com/user/followers',

 'following_url': 'https://api.github.com/user/following{/target}',

 'gists_url': 'https://api.github.com/gists{/gist_id}',

 'hub_url': 'https://api.github.com/hub',

 'issue_search_url': 'https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}',

 'issues_url': 'https://api.github.com/issues',

 'keys_url': 'https://api.github.com/user/keys',

 'notifications_url': 'https://api.github.com/notifications',

 'organization_repositories_url': 'https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}',

 'organization_url': 'https://api.github.com/orgs/{org}',

 'public_gists_url': 'https://api.github.com/gists/public',

 'rate_limit_url': 'https://api.github.com/rate_limit',

 'repository_url': 'https://api.github.com/repos/{owner}/{repo}',

 'repository_search_url': 'https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}',

 'current_user_repositories_url': 'https://api.github.com/user/repos{?type,page,per_page,sort}',

 'starred_url': 'https://api.github.com/user/starred{/owner}{/repo}',

 'starred_gists_url': 'https://api.github.com/gists/starred',

 'team_url': 'https://api.github.com/teams',

 'user_url': 'https://api.github.com/users/{user}',

 'user_organizations_url': 'https://api.github.com/user/orgs',

 'user_repositories_url': 'https://api.github.com/users/{user}/repos{?type,page,per_page,sort}',

 'user_search_url': 'https://api.github.com/search/users?q={query}{&page,per_page,sort,order}'}

从0开始学爬虫10之urllib和requests库与github/api的交互的更多相关文章

从0开始学爬虫11之使用requests库下载图片
从0开始学爬虫11之使用requests库下载图片 # coding=utf-8 import requests def download_imgage(): ''' demo: 下载图片 ''' h ...
从0开始学爬虫8使用requests/pymysql和beautifulsoup4爬取维基百科词条链接并存入数据库
从0开始学爬虫8使用requests和beautifulsoup4爬取维基百科词条链接并存入数据库 Python使用requests和beautifulsoup4爬取维基百科词条链接并存入数据库参考 ...
从0开始学爬虫12之使用requests库基本认证
从0开始学爬虫12之使用requests库基本认证此处我们使用github的token进行简单测试验证 # coding=utf-8 import requests BASE_URL = " ...
从0开始学爬虫9之requests库的学习之环境搭建
从0开始学爬虫9之requests库的学习之环境搭建 Requests库的环境搭建环境:python2.7.9版本参考文档:http://2.python-requests.org/zh_CN/l ...
从0开始学爬虫4之requests基础知识
从0开始学爬虫4之requests基础知识安装requestspip install requests get请求:可以用浏览器直接访问请求可以携带参数,但是又长度限制请求参数直接放在URL后面 P ...
从0开始学爬虫3之xpath的介绍和使用
从0开始学爬虫3之xpath的介绍和使用 Xpath:一种HTML和XML的查询语言,它能在XML和HTML的树状结构中寻找节点安装xpath: pip install lxml HTML 超文本标 ...
从0开始学爬虫2之json的介绍和使用
从0开始学爬虫2之json的介绍和使用 Json 一种轻量级的数据交换格式,通用,跨平台键值对的集合,值的有序列表类似于python中的dict Json中的键值如果是字符串一定要用双引号 jso ...
Python使用urllib,urllib3,requests库+beautifulsoup爬取网页
Python使用urllib/urllib3/requests库+beautifulsoup爬取网页 urllib urllib3 requests 笔者在爬取时遇到的问题 1.结果不全 2.'抓取失 ...
urllib和requests库
目录 1. Python3 使用urllib库请求网络 1.1 基于urllib库的GET请求 1.2 使用User-Agent伪装后请求网站 1.3 基于urllib库的POST请求,并用Cooki ...

随机推荐

大数据之路week07--day06 （Sqoop 在从HDFS中导出到关系型数据库时的一些问题）
问题一: 在上传过程中遇到这种问题: ERROR tool.ExportTool: Encountered IOException running export job: java.io.IOExce ...
安装 uwsgi报错解决
背景: 安装 uwsgi时报错如下,查阅相关资料说是 python-devel的问题,于是安装之后python-devel后问题解决报错如下: (venv) [xxxxxxx]# pip insta ...
找回IntelliJ IDEA中丢失的Run Dashboard视图
一般项目中包含多个springboot项目的时候都会出现run dashboard视图,但如果一开始它提示的时候,不点击展示,就再也找不到这个视图了,给我们后续启动一个一个的启动项目带来了很大的不便, ...
pdfminer批量处理PDF文件
from pdfminer.pdfparser import PDFParser, PDFDocument from pdfminer.pdfinterp import PDFResourceMana ...
SpringBoot 初始化流程以及各种常见第三方配置的源码实现
带着这几个问题去分析SpringBoot 初始化以及扩展机制实现 1.容器何时被创建,并默认配置了什么? 2.Spring 容器依赖于哪个后置处理器进行bean 容器的装配? 3.Spring 如何进 ...
第一章 -- MySQL简介及安装
什么是数据库数据库实际上就是一个文件集合,是一个存储数据的仓库,本质就是一个文件系统,数据库是按照特定的格式把数据存储起来,用户可以对存储的数据进行增删改查操作数据库管理系统(DBMS) RDBM ...
配置asgi来达到能处理websocket
在项目中使用了webscoket进行实时通讯,但是生产环境又使用了django+nginx+uwsgi的部署方式,我们都知道uwsgi并不能处理websocket请求,所以需要asgi服务器来处理we ...
洛谷 P1231教辅的组成
题目描述 /* s->练习册(1~b)->书(b+1~a+b)->答案(a+b+1~a+b+c)->t 但是可能会有多本练习册指向同一本书,这本书又可能会指向多本答案这样每本 ...
LibreOJ #527. 「LibreOJ β Round #4」框架
二次联通门 : LibreOJ #527. 「LibreOJ β Round #4」框架 /* LibreOJ #527. 「LibreOJ β Round #4」框架 %% xxy dalao 对于 ...
golang-笔记1
指针: 指针就是地址. 指针变量就是存储地址的变量. *p : 解引用.间接引用. 栈帧: 用来给函数运行提供内存空间. 取内存于 stack 上. 当函数调用时,产生栈帧.函数调用结束,释放栈帧. ...

从0开始学爬虫10之urllib和requests库与github/api的交互

从0开始学爬虫10之urllib和requests库与github/api的交互的更多相关文章

随机推荐

热门专题