Scrapy框架: settings.py设置
# -*- coding: utf-8 -*-
# Scrapy settings for maitian project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
# https://doc.scrapy.org/en/latest/topics/settings.html
# https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
# https://doc.scrapy.org/en/latest/topics/spider-middleware.html
BOT_NAME = 'maitian'
SPIDER_MODULES = ['maitian.spiders']
NEWSPIDER_MODULE = 'maitian.spiders'
#不能批量设置
# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = 'maitian (+http://www.yourdomain.com)'
#默认遵守robots协议
# Obey robots.txt rules
ROBOTSTXT_OBEY = False
#设置日志文件
LOG_FILE="maitian.log"
#日志等级分为5种:1.DEBUG 2.INFO 3.Warning 4.ERROR 5.CRITICAL
#等级越高 输出的日志越少
# LOG_LEVEL="INFO"
#scrapy设置最大并发数 默认16
# Configure maximum concurrent requests performed by Scrapy (default: 16)
#CONCURRENT_REQUESTS = 32
#设置批量延迟请求16 等待3秒再发16 秒
# Configure a delay for requests for the same website (default: 0)
# See https://doc.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
#DOWNLOAD_DELAY = 3
# The download delay setting will honor only one of:
#CONCURRENT_REQUESTS_PER_DOMAIN = 16
#CONCURRENT_REQUESTS_PER_IP = 16
#cookie 不生效 默认是True
# Disable cookies (enabled by default)
#COOKIES_ENABLED = False
#远程
# Disable Telnet Console (enabled by default)
#TELNETCONSOLE_ENABLED = False
#加载默认的请求头
# Override the default request headers:
#DEFAULT_REQUEST_HEADERS = {
# 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
# 'Accept-Language': 'en',
#}
#爬虫中间件
# Enable or disable spider middlewares
# See https://doc.scrapy.org/en/latest/topics/spider-middleware.html
#SPIDER_MIDDLEWARES = {
# 'maitian.middlewares.MaitianSpiderMiddleware': 543,
#}
#下载中间件
# Enable or disable downloader middlewares
# See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
#DOWNLOADER_MIDDLEWARES = {
# 'maitian.middlewares.MaitianDownloaderMiddleware': 543,
#}
# Enable or disable extensions
# See https://doc.scrapy.org/en/latest/topics/extensions.html
#EXTENSIONS = {
# 'scrapy.extensions.telnet.TelnetConsole': None,
#}
#在配置文件 开启管道
#优先级的范围 0--1000;值越小 优先级越高
# Configure item pipelines
# See https://doc.scrapy.org/en/latest/topics/item-pipeline.html
#ITEM_PIPELINES = {
# 'maitian.pipelines.MaitianPipeline': 300,
#}
# Enable and configure the AutoThrottle extension (disabled by default)
# See https://doc.scrapy.org/en/latest/topics/autothrottle.html
#AUTOTHROTTLE_ENABLED = True
# The initial download delay
#AUTOTHROTTLE_START_DELAY = 5
# The maximum download delay to be set in case of high latencies
#AUTOTHROTTLE_MAX_DELAY = 60
# The average number of requests Scrapy should be sending in parallel to
# each remote server
#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# Enable showing throttling stats for every response received:
#AUTOTHROTTLE_DEBUG = False
# Enable and configure HTTP caching (disabled by default)
# See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
#HTTPCACHE_ENABLED = True
#HTTPCACHE_EXPIRATION_SECS = 0
#HTTPCACHE_DIR = 'httpcache'
#HTTPCACHE_IGNORE_HTTP_CODES = []
#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'
Scrapy框架: settings.py设置的更多相关文章
- Scrapy框架: middlewares.py设置
# -*- coding: utf-8 -*- # Define here the models for your spider middleware # # See documentation in ...
- Scrapy框架: pipelines.py设置
保存数据到json文件 # -*- coding: utf-8 -*- # Define your item pipelines here # # Don't forget to add your p ...
- Django settings.py设置 DEBUG=False后静态文件无法加载解决
解决办法: settings.py 文件 DEBUG = False STATIC_ROOT = os.path.join(BASE_DIR,'static') #新增 urls.py文件(项目的) ...
- django的settings.py设置static
DEBUG = True ################ STATICFILES ################ # A list of locations of additional stati ...
- django的settings.py设置session
############ # SESSIONS # ############ SESSION_CACHE_ALIAS = 'default' # Cache to store session data ...
- django 配置文件settings.py 设置模板
INSTALLED_APPS = [ 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'dj ...
- Django settings.py 的media路径设置
转载自:http://www.xuebuyuan.com/676599.html 在一个 models 中使用 FileField 或 ImageField 需要以下步骤: 1. 在你的 settin ...
- Django之settings.py 的media路径设置
在一个 models 中使用 FileField 或 ImageField 需要以下步骤: 1. 在你的 settings.py文件中, 定义一个完整路径给MEDIA_ROOT 以便让 Django在 ...
- Scrapy框架实战-妹子图爬虫
Scrapy这个成熟的爬虫框架,用起来之后发现并没有想象中的那么难.即便是在一些小型的项目上,用scrapy甚至比用requests.urllib.urllib2更方便,简单,效率也更高.废话不多说, ...
随机推荐
- 详解linux io flush
通过本文你会清楚知道 fsync().fdatasync().sync().O_DIRECT.O_SYNC.REQ_PREFLUSH.REQ_FUA的区别和作用. fsync() fdatasync( ...
- sql查询静态数据
select * from ( ,,,'高中')) AS Education ( EducationId,EducationName )
- 交叉编译fw_printenv
source /opt/poky/environment... 创建交叉编译环境. 更改u-boot/tools/env/Make 添加CC 9 CC=aarch64-poky-linux-gcc - ...
- yum 仓库搭建与源码包安装实战
目录 一.yum 仓库自建示例: 二.源码包安装实践 基础环境 服务端配置 下载及安装fpm软件 客户端: 一.yum 仓库自建示例: 1.安装ftp服务 yum -y install vsftpd ...
- ARM-LINUX学习记录
1:调用C语言函数之前会有一段汇编代码在前面执行来完成软硬件方面的初始化.比如:关闭看门狗:初始化时钟:设置堆栈:调用main函数等.在学习51单片机时候这些操作是由开发环境(如KEIL)在编译C代码 ...
- Codeforces 1178E
题意:给你一个长度为n的字符串,只包含a, b, c3种字符,字符串中相邻字符一定不同,问是否存在一个长度为n / 2(向下取整)的子序列是回文的,有就输出. 思路:相邻的字符一定不同,并且一共只有3 ...
- 关于python全局变量
今天踩了一个坑,记录一下,避免后犯 在constant.py 中定义了一个全局变量 ZH_LIST= [],以个用于存储配置一些信息: 在views.py 中引用了这个ZH_LIST : 然后在app ...
- 21.Semaphore信号量
Semaphore是一种基于计数的信号量.它可以设定一个阈值,基于此,多个线程竞争获取许可信号,做自己的申请后归还,超过阈值后,线程申请许可信号将会被阻塞.Semaphore可以用来构建一些对象池,资 ...
- Arthas 开源一周年,GitHub Star 16 K ,我们一直在坚持什么?
缘起 最近看到一个很流行的标题,<开源XX年,star XXX,我是如何坚持的>.看到这样的标题,忽然发觉 Arthas 从 2018 年 9 月开源以来,刚好一年了,正好在这个秋高气爽的 ...
- paper 144:人生苦短,快用Python
1.Python 语言特点 Python是一种面向对象.直译式计算机程序设计语言,这种语言的语法简捷而清晰,具有丰富和强大的类库,基本上能胜任你平时需要的编程工作. Python的优点: (1)编写的 ...