从0开始学爬虫10之urllib和requests库与github/api的交互
urllib库的使用
# coding=utf-8
import urllib2
import urllib # htpbin模拟的环境
URL_IP="http://10.11.0.215:8080"
URL_GET = "http://10.11.0.215:8080/get" def use_simple_urllib2():
response = urllib2.urlopen(URL_IP)
print '>>>> Response Headers:'
print response.info()
print '>>>>Response Body:'
print ''.join([line for line in response.readlines()]) def use_params_urllib2():
# 构建请求参数
params = urllib.urlencode({'param1': 'hello','param2': 'world'})
print 'Request Params:'
print params
# 发送请求
response = urllib2.urlopen('?'.join([URL_GET, '%s']) % params)
# 处理响应
print '>>>Response Headers:'
print response.info()
print '>>>Status code'
print response.getcode()
print '>>>Response Body'
print ''.join([line for line in response.readlines()])
# print response.readlines() if __name__ == '__main__':
# print '>>>Use simple urllib2'
# use_simple_urllib2()
print '>>>Use params urllib2'
use_params_urllib2()
requests库的简单使用
# coding=utf-8 import requests URL_IP="http://10.11.0.215:8080/ip"
URL_GET="http://10.11.0.215:8080/get" def use_simple_requests():
response = requests.get(URL_IP)
print ">>>Response Headers:"
print response.headers
print ">>>Response Code:"
print response.status_code
print "Response Body:"
print response.text def use_params_requests():
response = requests.get(URL_GET)
print ">>>Response Headers:"
print response.headers
print ">>>Response Code:"
print response.status_code
print response.reason
print "Response Body:"
print response.json() if __name__ == "__main__":
# print "simple requests:"
# use_simple_requests()
print "params requests:"
use_params_requests()
requests和github api的互动
# coding=utf-8
import json
import requests
from requests import exceptions URL = "https://api.github.com" def build_uri(endpoint):
# 拼凑url为最终的api路径
return '/'.join([URL, endpoint]) def better_print(json_str):
# 格式化输出, indent=4是缩进为4个空格
return json.dumps(json.loads(json_str), indent = 4) def request_method():
# 获取用户信息
# response = requests.get(build_uri('users/reblue520'))
# response = requests.get(build_uri('user/emails'), auth=('reblue520', 'reblue520'))
response = requests.get(build_uri('user/public_emails'), auth=('reblue520', 'reblue520'))
print(better_print(response.text)) def params_request():
response = requests.get(build_uri('users'), params={'since':11})
print better_print(response.text)
print response.request.headers
print response.url def json_request():
# 更新用户信息,邮箱必须是已经验证过的邮箱
# response = requests.patch(build_uri('user'), auth=('reblue520','reblue520'),json={'name':'hellojack2019','email':'reblue520@163.com'})
response = requests.post(build_uri('user/emails'), auth=('reblue520','Reblue0225520'),json=['hellojack2019@163.com'])
print better_print(response.text)
print response.request.headers
print response.request.body
print response.status_code def timeout_request():
# api异常处理:超时
try:
response = requests.get(build_uri('user/emails'), timeout=10)
response.raise_for_status()
except exceptions.Timeout as e:
print e.message
except exceptions.HTTPError as e:
print e.message
else:
print response.status_code
print response.text def hard_requests():
# 自定义request
from requests import Request, Session
s = Session()
headers = {'User-Agent': 'fake1.3.4'}
req = Request('GET', build_uri('user/emails'), auth=('reblue520', 'Reblue0225520'), headers=headers)
prepped = req.prepare()
print prepped.body
print prepped.headers resp = s.send(prepped, timeout = 5)
print resp.status_code
print resp.request.headers
print resp.text if __name__ == '__main__':
# request_method()
# params_request()
# json_request()
# timeout_request()
hard_requests()
response响应的常用api
响应的基本API
In []: import requests In []: response = requests.get("https://api.github.com") In []: response.status_code
Out[]: In []: response.reason
Out[]: 'OK' In []: response.headers
Out[]: {'Date': 'Sat, 20 Jul 2019 03:48:51 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Server': 'GitHub.com', 'Status': '200 OK', 'X-RateLimit-Limit': '', 'X-RateLimit-Remaining': '', 'X-RateLimit-Reset': '', 'Cache-Control': 'public, max-age=60, s-maxage=60', 'Vary': 'Accept, Accept-Encoding', 'ETag': 'W/"7dc470913f1fe9bb6c7355b50a0737bc"', 'X-GitHub-Media-Type': 'github.v3; format=json', 'Access-Control-Expose-Headers': 'ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type', 'Access-Control-Allow-Origin': '*', 'Strict-Transport-Security': 'max-age=31536000; includeSubdomains; preload', 'X-Frame-Options': 'deny', 'X-Content-Type-Options': 'nosniff', 'X-XSS-Protection': '1; mode=block', 'Referrer-Policy': 'origin-when-cross-origin, strict-origin-when-cross-origin', 'Content-Security-Policy': "default-src 'none'", 'Content-Encoding': 'gzip', 'X-GitHub-Request-Id': '33D9:591B:9D084B:CF860E:5D328F23'} In []: response.url
Out[]: 'https://api.github.com/' In []: response.history
Out[]: [] In []: response = requests.get("http://api.github.com") In []: response.history
Out[]: [<Response []>] In []: response = requests.get("https://api.github.com") In []: response.elapsed
Out[]: datetime.timedelta(microseconds=) In []: response.request
Out[]: <PreparedRequest [GET]> In []: response.request.headers
Out[]: {'User-Agent': 'python-requests/2.22.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'} In []: response.encoding
Out[]: 'utf-8' In []: response.raw.read()
Out[]: b'' In []: response.content
Out[]: b'{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}","emails_url":"https://api.github.com/user/emails","emojis_url":"https://api.github.com/emojis","events_url":"https://api.github.com/events","feeds_url":"https://api.github.com/feeds","followers_url":"https://api.github.com/user/followers","following_url":"https://api.github.com/user/following{/target}","gists_url":"https://api.github.com/gists{/gist_id}","hub_url":"https://api.github.com/hub","issue_search_url":"https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}","issues_url":"https://api.github.com/issues","keys_url":"https://api.github.com/user/keys","notifications_url":"https://api.github.com/notifications","organization_repositories_url":"https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}","organization_url":"https://api.github.com/orgs/{org}","public_gists_url":"https://api.github.com/gists/public","rate_limit_url":"https://api.github.com/rate_limit","repository_url":"https://api.github.com/repos/{owner}/{repo}","repository_search_url":"https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}","current_user_repositories_url":"https://api.github.com/user/repos{?type,page,per_page,sort}","starred_url":"https://api.github.com/user/starred{/owner}{/repo}","starred_gists_url":"https://api.github.com/gists/starred","team_url":"https://api.github.com/teams","user_url":"https://api.github.com/users/{user}","user_organizations_url":"https://api.github.com/user/orgs","user_repositories_url":"https://api.github.com/users/{user}/repos{?type,page,per_page,sort}","user_search_url":"https://api.github.com/search/users?q={query}{&page,per_page,sort,order}"}' In []: response.json()
Out[]:
{'current_user_url': 'https://api.github.com/user',
'current_user_authorizations_html_url': 'https://github.com/settings/connections/applications{/client_id}',
'authorizations_url': 'https://api.github.com/authorizations',
'code_search_url': 'https://api.github.com/search/code?q={query}{&page,per_page,sort,order}',
'commit_search_url': 'https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}',
'emails_url': 'https://api.github.com/user/emails',
'emojis_url': 'https://api.github.com/emojis',
'events_url': 'https://api.github.com/events',
'feeds_url': 'https://api.github.com/feeds',
'followers_url': 'https://api.github.com/user/followers',
'following_url': 'https://api.github.com/user/following{/target}',
'gists_url': 'https://api.github.com/gists{/gist_id}',
'hub_url': 'https://api.github.com/hub',
'issue_search_url': 'https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}',
'issues_url': 'https://api.github.com/issues',
'keys_url': 'https://api.github.com/user/keys',
'notifications_url': 'https://api.github.com/notifications',
'organization_repositories_url': 'https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}',
'organization_url': 'https://api.github.com/orgs/{org}',
'public_gists_url': 'https://api.github.com/gists/public',
'rate_limit_url': 'https://api.github.com/rate_limit',
'repository_url': 'https://api.github.com/repos/{owner}/{repo}',
'repository_search_url': 'https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}',
'current_user_repositories_url': 'https://api.github.com/user/repos{?type,page,per_page,sort}',
'starred_url': 'https://api.github.com/user/starred{/owner}{/repo}',
'starred_gists_url': 'https://api.github.com/gists/starred',
'team_url': 'https://api.github.com/teams',
'user_url': 'https://api.github.com/users/{user}',
'user_organizations_url': 'https://api.github.com/user/orgs',
'user_repositories_url': 'https://api.github.com/users/{user}/repos{?type,page,per_page,sort}',
'user_search_url': 'https://api.github.com/search/users?q={query}{&page,per_page,sort,order}'}
从0开始学爬虫10之urllib和requests库与github/api的交互的更多相关文章
- 从0开始学爬虫11之使用requests库下载图片
从0开始学爬虫11之使用requests库下载图片 # coding=utf-8 import requests def download_imgage(): ''' demo: 下载图片 ''' h ...
- 从0开始学爬虫8使用requests/pymysql和beautifulsoup4爬取维基百科词条链接并存入数据库
从0开始学爬虫8使用requests和beautifulsoup4爬取维基百科词条链接并存入数据库 Python使用requests和beautifulsoup4爬取维基百科词条链接并存入数据库 参考 ...
- 从0开始学爬虫12之使用requests库基本认证
从0开始学爬虫12之使用requests库基本认证 此处我们使用github的token进行简单测试验证 # coding=utf-8 import requests BASE_URL = " ...
- 从0开始学爬虫9之requests库的学习之环境搭建
从0开始学爬虫9之requests库的学习之环境搭建 Requests库的环境搭建 环境:python2.7.9版本 参考文档:http://2.python-requests.org/zh_CN/l ...
- 从0开始学爬虫4之requests基础知识
从0开始学爬虫4之requests基础知识 安装requestspip install requests get请求:可以用浏览器直接访问请求可以携带参数,但是又长度限制请求参数直接放在URL后面 P ...
- 从0开始学爬虫3之xpath的介绍和使用
从0开始学爬虫3之xpath的介绍和使用 Xpath:一种HTML和XML的查询语言,它能在XML和HTML的树状结构中寻找节点 安装xpath: pip install lxml HTML 超文本标 ...
- 从0开始学爬虫2之json的介绍和使用
从0开始学爬虫2之json的介绍和使用 Json 一种轻量级的数据交换格式,通用,跨平台 键值对的集合,值的有序列表 类似于python中的dict Json中的键值如果是字符串一定要用双引号 jso ...
- Python使用urllib,urllib3,requests库+beautifulsoup爬取网页
Python使用urllib/urllib3/requests库+beautifulsoup爬取网页 urllib urllib3 requests 笔者在爬取时遇到的问题 1.结果不全 2.'抓取失 ...
- urllib和requests库
目录 1. Python3 使用urllib库请求网络 1.1 基于urllib库的GET请求 1.2 使用User-Agent伪装后请求网站 1.3 基于urllib库的POST请求,并用Cooki ...
随机推荐
- 使用AutoIt实现文件上传
在网页上上传文件的时候,Selenium无法直接操作如Flash.JavaScript 或Ajax 等技术所实现的上传功能,这时候我们需要借用一个叫做AutoIt的软件来帮助我们事先自动化的上传操作. ...
- 在动态sql的使用where时,if标签判断中,如果实体类中的某一个属性是String类型,那么就可以这样来判断连接语句:
在动态sql的使用where时,if标签判断中,如果实体类中的某一个属性是String类型,那么就可以这样来判断连接语句: 如果是String类型的字符串进行判空的时候: <if test=&q ...
- 《少年先疯队》第八次团队作业:Alpha冲刺第五天
前言 第五天冲刺会议 时间:2019.6.18 地点:宿舍 5.1 今日完成任务情况以及遇到的问题. 5.1.1今日完成任务情况 姚玉婷:会员查询消费记录功能的实现 马丽莎:昨天 ...
- TODO : 一些新的学习计划
1.读完jvm那本书 2.加深Android的开发知识 3.编写atx的demo 4.跑几个apk的性能测试并做详细的性能分析 5.尝试实现一个uiautomator多个手机同时执行脚本的可能性(连线 ...
- TAPD----设置新缺陷模板必填信息
进入设置的路径:设置-->应用设置-->缺陷-->显示设置-->创建页面模板-->点击某个模板
- 《挑战30天C++入门极限》C++面向对象编程入门:类(class)
C++面向对象编程入门:类(class) 上两篇内容我们着重说了结构体相关知识的操作. 以后的内容我们将逐步完全以c++作为主体了,这也意味着我们的教程正式进入面向对象的编程了. 前面的教程我 ...
- raycaster选取捕获obj模型&&选中高亮代码
目录 raycaster选取捕获obj模型&&选中高亮代码 raycaster关键代码 选中高亮代码 obj整体上色 raycaster选取捕获obj模型&&选中高亮代 ...
- php单点登录SSO(Single Sign On)的解决思路
一.什么是单点登录 解释:登录一个系统后,其它系统无需再次登录,即可进入. 二.举个例子: 你登录了淘宝,然后你进入天猫,发现你不用登录了.这时你要注意到,淘宝跟天猫可是完全不一样的域名. 你登录淘宝 ...
- 加入购物车的功能wepy
1.先有一个加入购物车的按钮 <view wx:if="{{(detaildata.boughtNum < detaildata.buy_limit) && de ...
- redis主从复制读写分离
主从复制,读写分离 Master/Slave 是什么 master写入 slave读取 能干嘛 读写分离,更加安全,性能提升 怎么玩 一主二仆.薪火相传.反客为主 周明老师,能够把长篇大论总结的很精辟 ...