Scrapy学习-10-Request&Response对象
请求URL流程
Scarpy使用请求和响应对象来抓取网站
Request对象
"""
This module implements the Request class which is used to represent HTTP
requests in Scrapy. See documentation in docs/topics/request-response.rst
"""
import six
from w3lib.url import safe_url_string from scrapy.http.headers import Headers
from scrapy.utils.python import to_bytes
from scrapy.utils.trackref import object_ref
from scrapy.utils.url import escape_ajax
from scrapy.http.common import obsolete_setter class Request(object_ref): def __init__(self, url, callback=None, method='GET', headers=None, body=None,
cookies=None, meta=None, encoding='utf-8', priority=0,
dont_filter=False, errback=None, flags=None): self._encoding = encoding # this one has to be set first
self.method = str(method).upper()
self._set_url(url)
self._set_body(body)
assert isinstance(priority, int), "Request priority not an integer: %r" % priority
self.priority = priority if callback is not None and not callable(callback):
raise TypeError('callback must be a callable, got %s' % type(callback).__name__)
if errback is not None and not callable(errback):
raise TypeError('errback must be a callable, got %s' % type(errback).__name__)
assert callback or not errback, "Cannot use errback without a callback"
self.callback = callback
self.errback = errback self.cookies = cookies or {}
self.headers = Headers(headers or {}, encoding=encoding)
self.dont_filter = dont_filter self._meta = dict(meta) if meta else None
self.flags = [] if flags is None else list(flags) @property
def meta(self):
if self._meta is None:
self._meta = {}
return self._meta def _get_url(self):
return self._url def _set_url(self, url):
if not isinstance(url, six.string_types):
raise TypeError('Request url must be str or unicode, got %s:' % type(url).__name__) s = safe_url_string(url, self.encoding)
self._url = escape_ajax(s) if ':' not in self._url:
raise ValueError('Missing scheme in request url: %s' % self._url) url = property(_get_url, obsolete_setter(_set_url, 'url')) def _get_body(self):
return self._body def _set_body(self, body):
if body is None:
self._body = b''
else:
self._body = to_bytes(body, self.encoding) body = property(_get_body, obsolete_setter(_set_body, 'body')) @property
def encoding(self):
return self._encoding def __str__(self):
return "<%s %s>" % (self.method, self.url) __repr__ = __str__ def copy(self):
"""Return a copy of this Request"""
return self.replace() def replace(self, *args, **kwargs):
"""Create a new Request with the same attributes except for those
given new values.
"""
for x in ['url', 'method', 'headers', 'body', 'cookies', 'meta',
'encoding', 'priority', 'dont_filter', 'callback', 'errback']:
kwargs.setdefault(x, getattr(self, x))
cls = kwargs.pop('cls', self.__class__)
return cls(*args, **kwargs)
部分参数解析
url (string) – the URL of this request callback (callable) – the function that will be called with the response of this request (once its downloaded) as its first parameter. For more information see Passing additional data to callback functions below. If a Request doesn’t specify a callback, the spider’s parse() method will be used. Note that if exceptions are raised during processing, errback is called instead. method (string) – the HTTP method of this request. Defaults to 'GET'. meta (dict) – the initial values for the Request.meta attribute. If given, the dict passed in this parameter will be shallow copied. body (str or unicode) – the request body. If a unicode is passed, then it’s encoded to str using the encoding passed (which defaults to utf-8). If body is not given, an empty string is stored. Regardless of the type of this argument, the final value stored will be a str (never unicode or None). headers (dict) – the headers of this request. The dict values can be strings (for single valued headers) or lists (for multi-valued headers). If None is passed as value, the HTTP header will not be sent at all. cookies (dict or list) –
the request cookies. These can be sent in two forms.
1.Using a dict:
request_with_cookies = Request(url="http://www.example.com",
cookies={'currency': 'USD', 'country': 'UY'})
2. Using a list of dicts
request_with_cookies = Request(url="http://www.example.com",
cookies=[{'name': 'currency',
'value': 'USD',
'domain': 'example.com',
'path': '/currency'}])
Response对象
"""
This module implements the Response class which is used to represent HTTP
responses in Scrapy. See documentation in docs/topics/request-response.rst
"""
from six.moves.urllib.parse import urljoin from scrapy.http.request import Request
from scrapy.http.headers import Headers
from scrapy.link import Link
from scrapy.utils.trackref import object_ref
from scrapy.http.common import obsolete_setter
from scrapy.exceptions import NotSupported class Response(object_ref): def __init__(self, url, status=200, headers=None, body=b'', flags=None, request=None):
self.headers = Headers(headers or {})
self.status = int(status)
self._set_body(body)
self._set_url(url)
self.request = request
self.flags = [] if flags is None else list(flags) @property
def meta(self):
try:
return self.request.meta
except AttributeError:
raise AttributeError(
"Response.meta not available, this response "
"is not tied to any request"
) def _get_url(self):
return self._url def _set_url(self, url):
if isinstance(url, str):
self._url = url
else:
raise TypeError('%s url must be str, got %s:' % (type(self).__name__,
type(url).__name__)) url = property(_get_url, obsolete_setter(_set_url, 'url')) def _get_body(self):
return self._body def _set_body(self, body):
if body is None:
self._body = b''
elif not isinstance(body, bytes):
raise TypeError(
"Response body must be bytes. "
"If you want to pass unicode body use TextResponse "
"or HtmlResponse.")
else:
self._body = body body = property(_get_body, obsolete_setter(_set_body, 'body')) def __str__(self):
return "<%d %s>" % (self.status, self.url) __repr__ = __str__ def copy(self):
"""Return a copy of this Response"""
return self.replace() def replace(self, *args, **kwargs):
"""Create a new Response with the same attributes except for those
given new values.
"""
for x in ['url', 'status', 'headers', 'body', 'request', 'flags']:
kwargs.setdefault(x, getattr(self, x))
cls = kwargs.pop('cls', self.__class__)
return cls(*args, **kwargs) def urljoin(self, url):
"""Join this Response's url with a possible relative url to form an
absolute interpretation of the latter."""
return urljoin(self.url, url) @property
def text(self):
"""For subclasses of TextResponse, this will return the body
as text (unicode object in Python 2 and str in Python 3)
"""
raise AttributeError("Response content isn't text") def css(self, *a, **kw):
"""Shortcut method implemented only by responses whose content
is text (subclasses of TextResponse).
"""
raise NotSupported("Response content isn't text") def xpath(self, *a, **kw):
"""Shortcut method implemented only by responses whose content
is text (subclasses of TextResponse).
"""
raise NotSupported("Response content isn't text") def follow(self, url, callback=None, method='GET', headers=None, body=None,
cookies=None, meta=None, encoding='utf-8', priority=0,
dont_filter=False, errback=None):
# type: (...) -> Request
"""
Return a :class:`~.Request` instance to follow a link ``url``.
It accepts the same arguments as ``Request.__init__`` method,
but ``url`` can be a relative URL or a ``scrapy.link.Link`` object,
not only an absolute URL. :class:`~.TextResponse` provides a :meth:`~.TextResponse.follow`
method which supports selectors in addition to absolute/relative URLs
and Link objects.
"""
if isinstance(url, Link):
url = url.url
url = self.urljoin(url)
return Request(url, callback,
method=method,
headers=headers,
body=body,
cookies=cookies,
meta=meta,
encoding=encoding,
priority=priority,
dont_filter=dont_filter,
errback=errback)
参考官方文档 https://doc.scrapy.org
Scrapy学习-10-Request&Response对象的更多相关文章
- Servlet的学习之Request请求对象(3)
本篇接上一篇,将Servlet中的HttpServletRequest对象获取RequestDispatcher对象后能进行的[转发]forward功能和[包含]include功能介绍完. 首先来看R ...
- Servlet的学习之Request请求对象(2)
在上一篇<Servlet的学习(十)>中介绍了HttpServletRequest请求对象的一些常用方法,而从这篇起开始介绍和学习HttpServletRequest的常用功能. 使用Ht ...
- Servlet的学习之Request请求对象(1)
在本篇中开始对Servlet中的HttpServletRequest请求对象进行学习,请求对象同响应对象一样,我们可以根据该对象中的方法获取例如请求行,请求头和请求实体数据的方法. 在本篇中先对Htt ...
- Java-Spring-获取Request,Response对象
转载自:https://www.cnblogs.com/bjlhx/p/6639542.html 第一种.参数 @RequestMapping("/test") @Response ...
- request与response对象.
request与response对象. 1. request代表请求对象 response代表的响应对象. 学习它们我们可以操作http请求与响应. 2.request,response体系结构. 在 ...
- request与response对象详述
request与response对象. 1. request代表请求对象 response代表的响应对象. 学习它们我们可以操作http请求与响应. 2.request,response体系结构. 在 ...
- java中获取request与response对象的方法
Java 获取Request,Response对象方法 第一种.参数 @RequestMapping("/test") @ResponseBody public void sa ...
- SpringMvc4中获取request、response对象的方法
springMVC4中获取request和response对象有以下两种简单易用的方法: 1.在control层获取 在control层中获取HttpServletRequest和HttpServle ...
- Scrapy 中 Request 对象和 Response 对象的各参数及属性介绍
Request 对象 Request构造器方法的参数列表: Request(url [, callback=None, method='GET', headers=None, body=None,co ...
随机推荐
- 基于Python的Web应用开发实战——3 模板
要想开发出易于维护的程序,关键在于编写形式简洁且结构良好的代码. 当目前为止,你看到的示例都太简单,无法说明这一点,但Flask视图函数的两个完全独立的作用却被融合在了一起,这就产生了一个问题. 视图 ...
- python_112_网络编程 Socket编程
实例1:客户端发小写英文,服务器端返回给客户端大写英文(仅支持一次接受发送) 服务器端: #服务器端(先于客户端运行) import socket server=socket.socket() ser ...
- EOF与feof
在C语言中,或更精确地说成C标准函数库中表示文件结束符(end of file).在while循环中以EOF作为文件结束标志,这种以EOF作为文件结束标志的文件,必须是文本文件.在文本文件中,数据都是 ...
- oracle中group by的高级用法
简单的group by用法 select c1,sum(c2) from t1 where t1<>'test' group by c1 having sum(c2)>100; ro ...
- 洛谷 P1483 序列变换
https://www.luogu.org/problemnew/show/P1483 数据范围不是太大. 一个数组记录给k,记录每个数加了多少. 对于查询每个数的大小,那么就枚举每个数的因子,加上这 ...
- Noip2018 考前准备
目录 基础算法 二分 模拟(未补) 高精(未学习) 搜索(未补) 排序 图论 树的直径 树的重心 最短路算法 Spfa Dijkstra Floyd 最小生成树 kruskal 数论 线性筛 线性筛素 ...
- perl学习之:函数总结
一.进程处理函数 1.进程启动函数 函数名 eval 调用语法 eval(string) 解说 将string看作Perl语句执行.正确执行后,系统变量$@为空串,如果有错误,$@中为错误信息. 例子 ...
- django第8天(在测试文件中运行django项目|单表操作)
django第8天 在测试文件中运行django项目 1.将项目配置文件数据库该为mysql,修改配置信息 PORT = '127.0.0.1' DATABASES = { 'default': { ...
- Flash生成HTML5动画方法
方法一:利用“swiffy”将Flash转换成HTML5动画. 首先,我们需要下载一款基于“Flash”程序的插件,名称为“swiffy”,这是一款由谷歌推出的一个Flash扩展,可以通过“Fla ...
- android 之 GridView
GridView 的用法基本与ListView类似. 程序布局文件main.xml <?xml version="1.0" encoding="utf-8" ...