Scrapy:配置日志
Scrapy logger 在每个spider实例中提供了一个可以访问和使用的实例,方法如下:
import scrapy class MySpider(scrapy.Spider):
name = 'myspider'
start_url = ['https://www.baidu.com'] def parse(self,response):
self.logger.info('Parse function called on %s',response.url)
方法二:
该记录器是使用spider的名称创建的,当然也可以应用到任意项目中
import logging
import scrapy logger = logging.getLogger('mycustomlogger')
#创建logger模块 class MySpider(scrapy.Spider):
name = 'myspider'
start_url = ['https://www.baidu.com'] #触发模块
def parse(self,response):
logger.info('Parse function called on %s',response.url)
只需使用logging.getLogger函数获取其名称即可使用其记录器:
import logging
logger = logging.getLogger('mycustomlogger')
logger.warning('This is a warning')
so anyway:我们也可以使用__name__变量填充当前模块的路径,确保正在处理的任何模块设置自定义记录器:
import logging logger = logging.getLogger(__name__)
logger.warning('This is a warning')
在scrapy项目的settings 文件中配置
LOG_ENABLED = True #是否启动日志记录,默认True
LOG_ENCODING = 'UTF-8'
LOG_FILE = 'TEST1.LOG'#日志输出文件,如果为NONE,就打印到控制台
LOG_LEVEL = 'INFO'#日志级别,默认debug
LOG_FORMAT #日志格式
LOG_DATEFORMAT#日志日期格式
LOG_STDOUT #日志标准输出,默认False,如果True所有标准输出都将写入日志中,比如代码中的print输出也会被写入到文件
LOG_SHORT_NAMES#短日志名,默认为false,如果为True将不输出组件名
日志如下
2019-04-26 15:48:33 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: patencent)
2019-04-26 15:48:33 [scrapy.utils.log] INFO: Versions: lxml 4.3.3.0, libxml2 2.9.5, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 19.2.0, Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1b 26 Feb 2019), cryptography 2.6.1, Platform Windows-10-10.0.17134-SP0
2019-04-26 15:48:33 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'patencent', 'LOG_ENCODING': 'UTF8', 'LOG_FILE': 'TEST1.LOG', 'LOG_LEVEL': 'INFO', 'NEWSPIDER_MODULE': 'patencent.spiders', 'SPIDER_MODULES': ['patencent.spiders'], 'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}
2019-04-26 15:48:34 [scrapy.extensions.telnet] INFO: Telnet Password: 08683c5af998704b
2019-04-26 15:48:34 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
2019-04-26 15:48:34 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2019-04-26 15:48:34 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2019-04-26 15:48:34 [scrapy.middleware] INFO: Enabled item pipelines:
['patencent.pipelines.PatencentPipeline']
2019-04-26 15:48:34 [scrapy.core.engine] INFO: Spider opened
2019-04-26 15:48:34 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-04-26 15:48:34 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2019-04-26 15:49:34 [scrapy.extensions.logstats] INFO: Crawled 1384 pages (at 1384 pages/min), scraped 1255 items (at 1255 items/min)
2019-04-26 15:49:52 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: patencent)
2019-04-26 15:49:52 [scrapy.utils.log] INFO: Versions: lxml 4.3.3.0, libxml2 2.9.5, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 19.2.0, Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1b 26 Feb 2019), cryptography 2.6.1, Platform Windows-10-10.0.17134-SP0
2019-04-26 15:49:52 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'patencent', 'CONCURRENT_REQUESTS': 32, 'LOG_ENCODING': 'UTF8', 'LOG_FILE': 'TEST1.LOG', 'LOG_LEVEL': 'INFO', 'NEWSPIDER_MODULE': 'patencent.spiders', 'SPIDER_MODULES': ['patencent.spiders'], 'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}
2019-04-26 15:49:52 [scrapy.extensions.telnet] INFO: Telnet Password: b2f1951d137cf133
2019-04-26 15:49:52 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
2019-04-26 15:49:52 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2019-04-26 15:49:52 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2019-04-26 15:49:52 [scrapy.middleware] INFO: Enabled item pipelines:
['patencent.pipelines.PatencentPipeline']
2019-04-26 15:49:52 [scrapy.core.engine] INFO: Spider opened
2019-04-26 15:49:52 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-04-26 15:49:52 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2019-04-26 15:50:43 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: patencent)
2019-04-26 15:50:43 [scrapy.utils.log] INFO: Versions: lxml 4.3.3.0, libxml2 2.9.5, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 19.2.0, Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1b 26 Feb 2019), cryptography 2.6.1, Platform Windows-10-10.0.17134-SP0
2019-04-26 15:50:43 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'patencent', 'CONCURRENT_REQUESTS': 32, 'LOG_ENCODING': 'UTF8', 'LOG_FILE': 'TEST1.LOG', 'LOG_LEVEL': 'INFO', 'NEWSPIDER_MODULE': 'patencent.spiders', 'SPIDER_MODULES': ['patencent.spiders'], 'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}
2019-04-26 15:50:43 [scrapy.extensions.telnet] INFO: Telnet Password: 24d0317609676d2e
2019-04-26 15:50:43 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
2019-04-26 15:50:44 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2019-04-26 15:50:44 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2019-04-26 15:50:44 [scrapy.middleware] INFO: Enabled item pipelines:
['patencent.pipelines.PatencentPipeline']
2019-04-26 15:50:44 [scrapy.core.engine] INFO: Spider opened
2019-04-26 15:50:44 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-04-26 15:50:44 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2019-04-26 15:51:44 [scrapy.extensions.logstats] INFO: Crawled 1364 pages (at 1364 pages/min), scraped 1238 items (at 1238 items/min)
Scrapy:配置日志的更多相关文章
- scrapy之日志等级
scrapy之日志等级 在settings.py中配置如下项: LOG_LEVEL = 'ERROR' # 当LOG_LEVEL设置为ERROR时,在进行日志打印时,只是打印ERROR级别的日志 这样 ...
- 微信小程序开发工具的数据,配置,日志等目录在哪儿? 怎么找?
原文地址:http://www.wxapp-union.com/portal.php?mod=view&aid=359 本文由本站halfyawn原创:感谢原创者:如有疑问,请在评论内回复 ...
- BEA WebLogic Server 10 查看和配置日志
查看和配置日志 WebLogic Server 内的每个子系统都可生成日志消息来传达其状态.例如,当启动 WebLogic Server 实例时,安全子系统会输出消息以报告其初始化状态.为了记录其子系 ...
- 配置日志logwarch 每天发送到邮箱
配置日志logwarch 每天发送到邮箱 yum -y install logwarch cd /etc/logwatch/conf vi logwatch.conf 增加 ...
- scrapy配置
scrapy配置 增加并发 并发是指同时处理的request的数量.其有全局限制和局部(每个网站)的限制. Scrapy默认的全局并发限制对同时爬取大量网站的情况并不适用,因此您需要增加这个值. 增加 ...
- Linux配置日志服务器
title: Linux配置日志服务器 tags: linux, 日志服务器 --- Linux配置日志服务器 日志服务器配置文件:/etc/rsyslog.conf 服务器端: 服务器IP如下: 编 ...
- python之配置日志的三种方式
以下3种方式来配置logging: 1)使用Python代码显式的创建loggers, handlers和formatters并分别调用它们的配置函数: 2)创建一个日志配置文件,然后使用fileCo ...
- log4j 配置日志输出(log4j.properties)
轉: https://blog.csdn.net/qq_29166327/article/details/80467593 一.入门log4j实例 1.1 下载解压log4j.jar(地址:http: ...
- Python之配置日志的几种方式(logging模块)
原文:https://blog.csdn.net/WZ18810463869/article/details/81147167 作为开发者,我们可以通过以下3种方式来配置logging: 1)使用Py ...
- 配置日志中显示IP
package com.demo.conf; import ch.qos.logback.classic.pattern.ClassicConverter; import ch.qos.logback ...
随机推荐
- Python基础(生成器)
二.生成器(可以看做是一种数据类型) 描述: 通过列表生成式,我们可以直接创建一个列表.但是,受到内存限制,列表容量肯定是有限的.而且,创建一个包含100万个元素的列表,不仅占用很大的存储空间,如果我 ...
- 4.alembic数据迁移工具
alembic是用来做ORM模型与数据库的迁移与映射.alembic使用方式跟git有点类似,表现在两个方面,第一个,alemibi的所有命令都是以alembic开头: 第二,alembic的迁移文件 ...
- 【Netty】(8)---理解ChannelPipeline
ChannelPipeline ChannelPipeline不是单独存在,它肯定会和Channel.ChannelHandler.ChannelHandlerContext关联在一起,所以有关概念这 ...
- Entity Framework 之存储过程篇
最近几天在搞CRUD,使用的是EF这个ORM,最近的项目中上了存储过程,就把在开发中的经验分享出来!我们先创建一个最基本的存储过程,脚本如下,这是一个不带参数的存储过程,我们从最简单的往上走! cre ...
- 数据库~dotnetcore连接Mysql插入中文失败
到目录 在dotnetcore里,连接mysql数据,插入中文时出现无法识别,并提示插入失败的情况,分析后得知它是编码问题,即数据库编码问题,你的中文在数据表里无法被识别! 解决方法(一) 进行mys ...
- Flink从入门到放弃(入门篇2)-本地环境搭建&构建第一个Flink应用
戳更多文章: 1-Flink入门 2-本地环境搭建&构建第一个Flink应用 3-DataSet API 4-DataSteam API 5-集群部署 6-分布式缓存 7-重启策略 8-Fli ...
- iview起步
ivew是一套基于vue的高质量的ui组件库.使用它我们可以非常简单的得到非常美观的页面和非常棒的用户体验. 1. 获取源码 前往github下载源码,下载地址:https://github.com/ ...
- .NET Framework框架介绍
1.内容 .net framework c#和.net关系 掌握C#中命名空间2..net 就是微软提供的一个开发平台 版本: vs2008 3.5 vs2010 4.0 vs2012 2013 20 ...
- HttpServletRequest内容处理工具类
目录 HttpServletRequestUtil类 (可转换成json数据,xml数据,map数据) HttpServletRequestUtil类 import javax.servlet.htt ...
- 将传统 WPF 程序迁移到 DotNetCore 3.0
介绍 由于历史原因,基于 Windows 平台存在着大量的基于 .NetFramework 开发的 WPF 和 WinForm 相关程序,如果将这些程序全部基于 DotNetCore 3.0 重写一遍 ...