Scrapy:配置日志
Scrapy logger 在每个spider实例中提供了一个可以访问和使用的实例,方法如下:
import scrapy class MySpider(scrapy.Spider):
name = 'myspider'
start_url = ['https://www.baidu.com'] def parse(self,response):
self.logger.info('Parse function called on %s',response.url)
方法二:
该记录器是使用spider的名称创建的,当然也可以应用到任意项目中
import logging
import scrapy logger = logging.getLogger('mycustomlogger')
#创建logger模块 class MySpider(scrapy.Spider):
name = 'myspider'
start_url = ['https://www.baidu.com'] #触发模块
def parse(self,response):
logger.info('Parse function called on %s',response.url)
只需使用logging.getLogger函数获取其名称即可使用其记录器:
import logging
logger = logging.getLogger('mycustomlogger')
logger.warning('This is a warning')
so anyway:我们也可以使用__name__变量填充当前模块的路径,确保正在处理的任何模块设置自定义记录器:
import logging logger = logging.getLogger(__name__)
logger.warning('This is a warning')
在scrapy项目的settings 文件中配置
LOG_ENABLED = True #是否启动日志记录,默认True
LOG_ENCODING = 'UTF-8'
LOG_FILE = 'TEST1.LOG'#日志输出文件,如果为NONE,就打印到控制台
LOG_LEVEL = 'INFO'#日志级别,默认debug
LOG_FORMAT #日志格式
LOG_DATEFORMAT#日志日期格式
LOG_STDOUT #日志标准输出,默认False,如果True所有标准输出都将写入日志中,比如代码中的print输出也会被写入到文件
LOG_SHORT_NAMES#短日志名,默认为false,如果为True将不输出组件名
日志如下
2019-04-26 15:48:33 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: patencent)
2019-04-26 15:48:33 [scrapy.utils.log] INFO: Versions: lxml 4.3.3.0, libxml2 2.9.5, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 19.2.0, Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1b 26 Feb 2019), cryptography 2.6.1, Platform Windows-10-10.0.17134-SP0
2019-04-26 15:48:33 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'patencent', 'LOG_ENCODING': 'UTF8', 'LOG_FILE': 'TEST1.LOG', 'LOG_LEVEL': 'INFO', 'NEWSPIDER_MODULE': 'patencent.spiders', 'SPIDER_MODULES': ['patencent.spiders'], 'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}
2019-04-26 15:48:34 [scrapy.extensions.telnet] INFO: Telnet Password: 08683c5af998704b
2019-04-26 15:48:34 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
2019-04-26 15:48:34 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2019-04-26 15:48:34 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2019-04-26 15:48:34 [scrapy.middleware] INFO: Enabled item pipelines:
['patencent.pipelines.PatencentPipeline']
2019-04-26 15:48:34 [scrapy.core.engine] INFO: Spider opened
2019-04-26 15:48:34 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-04-26 15:48:34 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2019-04-26 15:49:34 [scrapy.extensions.logstats] INFO: Crawled 1384 pages (at 1384 pages/min), scraped 1255 items (at 1255 items/min)
2019-04-26 15:49:52 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: patencent)
2019-04-26 15:49:52 [scrapy.utils.log] INFO: Versions: lxml 4.3.3.0, libxml2 2.9.5, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 19.2.0, Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1b 26 Feb 2019), cryptography 2.6.1, Platform Windows-10-10.0.17134-SP0
2019-04-26 15:49:52 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'patencent', 'CONCURRENT_REQUESTS': 32, 'LOG_ENCODING': 'UTF8', 'LOG_FILE': 'TEST1.LOG', 'LOG_LEVEL': 'INFO', 'NEWSPIDER_MODULE': 'patencent.spiders', 'SPIDER_MODULES': ['patencent.spiders'], 'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}
2019-04-26 15:49:52 [scrapy.extensions.telnet] INFO: Telnet Password: b2f1951d137cf133
2019-04-26 15:49:52 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
2019-04-26 15:49:52 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2019-04-26 15:49:52 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2019-04-26 15:49:52 [scrapy.middleware] INFO: Enabled item pipelines:
['patencent.pipelines.PatencentPipeline']
2019-04-26 15:49:52 [scrapy.core.engine] INFO: Spider opened
2019-04-26 15:49:52 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-04-26 15:49:52 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2019-04-26 15:50:43 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: patencent)
2019-04-26 15:50:43 [scrapy.utils.log] INFO: Versions: lxml 4.3.3.0, libxml2 2.9.5, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 19.2.0, Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1b 26 Feb 2019), cryptography 2.6.1, Platform Windows-10-10.0.17134-SP0
2019-04-26 15:50:43 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'patencent', 'CONCURRENT_REQUESTS': 32, 'LOG_ENCODING': 'UTF8', 'LOG_FILE': 'TEST1.LOG', 'LOG_LEVEL': 'INFO', 'NEWSPIDER_MODULE': 'patencent.spiders', 'SPIDER_MODULES': ['patencent.spiders'], 'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}
2019-04-26 15:50:43 [scrapy.extensions.telnet] INFO: Telnet Password: 24d0317609676d2e
2019-04-26 15:50:43 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
2019-04-26 15:50:44 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2019-04-26 15:50:44 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2019-04-26 15:50:44 [scrapy.middleware] INFO: Enabled item pipelines:
['patencent.pipelines.PatencentPipeline']
2019-04-26 15:50:44 [scrapy.core.engine] INFO: Spider opened
2019-04-26 15:50:44 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-04-26 15:50:44 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2019-04-26 15:51:44 [scrapy.extensions.logstats] INFO: Crawled 1364 pages (at 1364 pages/min), scraped 1238 items (at 1238 items/min)
Scrapy:配置日志的更多相关文章
- scrapy之日志等级
scrapy之日志等级 在settings.py中配置如下项: LOG_LEVEL = 'ERROR' # 当LOG_LEVEL设置为ERROR时,在进行日志打印时,只是打印ERROR级别的日志 这样 ...
- 微信小程序开发工具的数据,配置,日志等目录在哪儿? 怎么找?
原文地址:http://www.wxapp-union.com/portal.php?mod=view&aid=359 本文由本站halfyawn原创:感谢原创者:如有疑问,请在评论内回复 ...
- BEA WebLogic Server 10 查看和配置日志
查看和配置日志 WebLogic Server 内的每个子系统都可生成日志消息来传达其状态.例如,当启动 WebLogic Server 实例时,安全子系统会输出消息以报告其初始化状态.为了记录其子系 ...
- 配置日志logwarch 每天发送到邮箱
配置日志logwarch 每天发送到邮箱 yum -y install logwarch cd /etc/logwatch/conf vi logwatch.conf 增加 ...
- scrapy配置
scrapy配置 增加并发 并发是指同时处理的request的数量.其有全局限制和局部(每个网站)的限制. Scrapy默认的全局并发限制对同时爬取大量网站的情况并不适用,因此您需要增加这个值. 增加 ...
- Linux配置日志服务器
title: Linux配置日志服务器 tags: linux, 日志服务器 --- Linux配置日志服务器 日志服务器配置文件:/etc/rsyslog.conf 服务器端: 服务器IP如下: 编 ...
- python之配置日志的三种方式
以下3种方式来配置logging: 1)使用Python代码显式的创建loggers, handlers和formatters并分别调用它们的配置函数: 2)创建一个日志配置文件,然后使用fileCo ...
- log4j 配置日志输出(log4j.properties)
轉: https://blog.csdn.net/qq_29166327/article/details/80467593 一.入门log4j实例 1.1 下载解压log4j.jar(地址:http: ...
- Python之配置日志的几种方式(logging模块)
原文:https://blog.csdn.net/WZ18810463869/article/details/81147167 作为开发者,我们可以通过以下3种方式来配置logging: 1)使用Py ...
- 配置日志中显示IP
package com.demo.conf; import ch.qos.logback.classic.pattern.ClassicConverter; import ch.qos.logback ...
随机推荐
- Python + PyQt5 实现美剧爬虫可视工具(二)
美剧<权力的游戏>终于开播最后一季了,在上周写了个简单的可视化美剧的爬虫软件来爬取美剧,链接:https://www.cnblogs.com/weijiutao/p/10614694.ht ...
- qml demo分析(samegame-拼图游戏)
一.效果展示 相信大家都玩儿过连连看游戏,而且此款游戏也是闲时一款打发时间的趣事,那么接下来我将分析一款类似的游戏,完全使用qml编写界面,复杂逻辑使用js完成.由于此游戏包含4种游戏模式,因此本篇文 ...
- okhttputils【 Android 一个改善的okHttp封装库】使用(三)
版权声明:本文为HaiyuKing原创文章,转载请注明出处! 前言 这一篇主要讲一下将OkHttpUtils运用到mvp模式中. 数据请求地址:http://www.wanandroid.com/to ...
- Java进阶篇设计模式之三 ----- 建造者模式和原型模式
前言 在上一篇中我们学习了工厂模式,介绍了简单工厂模式.工厂方法和抽象工厂模式.本篇则介绍设计模式中属于创建型模式的建造者模式和原型模式. 建造者模式 简介 建造者模式是属于创建型模式.建造者模式使用 ...
- interrupt interrupted isInterrupted 方法对比、区别与联系 多线程中篇(八)
interrupt interrupted isInterrupted 是三个“长相”非常类似的方法. 本文将对这三个方法简单的对比下,首先了解下线程停止的方式 线程停止方式 在Java中如果想停止一 ...
- Tushare模块
.TuShare简介和环境安装 TuShare是一个著名的免费.开源的python财经数据接口包.其官网主页为:TuShare -财经数据接口包.该接口包如今提供了大量的金融数据,涵盖了股票.基本面. ...
- Php7.3 could not find driver
今天phpstudy升级php7.3,发现框架报错:could not find driver,后来发现默认php.ini的配置有几个是注释掉的,配置php.ini,修改如下 extension=my ...
- PHP接口APP接口
使用PHP来生成APP接口数据是非常简单的,如果你还不了解PHP没有关系,只需要看过PHP的基本语法,再看本示例就可以了. APP接口一般都是json格式(当然也有少数xml格式)遵循restful规 ...
- php 获取URL 各部分参数
URL处理几个关键的函数parse_url.parse_str与http_build_query parse_url() 该函数可以解析 URL,返回其组成部分.它的用法如下: array parse ...
- (三) Keras Mnist分类程序以及改用交叉熵对比
视频学习来源 https://www.bilibili.com/video/av40787141?from=search&seid=17003307842787199553 笔记 Mnist分 ...