Scrapy框架: settings.py设置
# -*- coding: utf-8 -*-
# Scrapy settings for maitian project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
# https://doc.scrapy.org/en/latest/topics/settings.html
# https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
# https://doc.scrapy.org/en/latest/topics/spider-middleware.html
BOT_NAME = 'maitian'
SPIDER_MODULES = ['maitian.spiders']
NEWSPIDER_MODULE = 'maitian.spiders'
#不能批量设置
# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = 'maitian (+http://www.yourdomain.com)'
#默认遵守robots协议
# Obey robots.txt rules
ROBOTSTXT_OBEY = False
#设置日志文件
LOG_FILE="maitian.log"
#日志等级分为5种:1.DEBUG 2.INFO 3.Warning 4.ERROR 5.CRITICAL
#等级越高 输出的日志越少
# LOG_LEVEL="INFO"
#scrapy设置最大并发数 默认16
# Configure maximum concurrent requests performed by Scrapy (default: 16)
#CONCURRENT_REQUESTS = 32
#设置批量延迟请求16 等待3秒再发16 秒
# Configure a delay for requests for the same website (default: 0)
# See https://doc.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
#DOWNLOAD_DELAY = 3
# The download delay setting will honor only one of:
#CONCURRENT_REQUESTS_PER_DOMAIN = 16
#CONCURRENT_REQUESTS_PER_IP = 16
#cookie 不生效 默认是True
# Disable cookies (enabled by default)
#COOKIES_ENABLED = False
#远程
# Disable Telnet Console (enabled by default)
#TELNETCONSOLE_ENABLED = False
#加载默认的请求头
# Override the default request headers:
#DEFAULT_REQUEST_HEADERS = {
# 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
# 'Accept-Language': 'en',
#}
#爬虫中间件
# Enable or disable spider middlewares
# See https://doc.scrapy.org/en/latest/topics/spider-middleware.html
#SPIDER_MIDDLEWARES = {
# 'maitian.middlewares.MaitianSpiderMiddleware': 543,
#}
#下载中间件
# Enable or disable downloader middlewares
# See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
#DOWNLOADER_MIDDLEWARES = {
# 'maitian.middlewares.MaitianDownloaderMiddleware': 543,
#}
# Enable or disable extensions
# See https://doc.scrapy.org/en/latest/topics/extensions.html
#EXTENSIONS = {
# 'scrapy.extensions.telnet.TelnetConsole': None,
#}
#在配置文件 开启管道
#优先级的范围 0--1000;值越小 优先级越高
# Configure item pipelines
# See https://doc.scrapy.org/en/latest/topics/item-pipeline.html
#ITEM_PIPELINES = {
# 'maitian.pipelines.MaitianPipeline': 300,
#}
# Enable and configure the AutoThrottle extension (disabled by default)
# See https://doc.scrapy.org/en/latest/topics/autothrottle.html
#AUTOTHROTTLE_ENABLED = True
# The initial download delay
#AUTOTHROTTLE_START_DELAY = 5
# The maximum download delay to be set in case of high latencies
#AUTOTHROTTLE_MAX_DELAY = 60
# The average number of requests Scrapy should be sending in parallel to
# each remote server
#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# Enable showing throttling stats for every response received:
#AUTOTHROTTLE_DEBUG = False
# Enable and configure HTTP caching (disabled by default)
# See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
#HTTPCACHE_ENABLED = True
#HTTPCACHE_EXPIRATION_SECS = 0
#HTTPCACHE_DIR = 'httpcache'
#HTTPCACHE_IGNORE_HTTP_CODES = []
#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'
Scrapy框架: settings.py设置的更多相关文章
- Scrapy框架: middlewares.py设置
# -*- coding: utf-8 -*- # Define here the models for your spider middleware # # See documentation in ...
- Scrapy框架: pipelines.py设置
保存数据到json文件 # -*- coding: utf-8 -*- # Define your item pipelines here # # Don't forget to add your p ...
- Django settings.py设置 DEBUG=False后静态文件无法加载解决
解决办法: settings.py 文件 DEBUG = False STATIC_ROOT = os.path.join(BASE_DIR,'static') #新增 urls.py文件(项目的) ...
- django的settings.py设置static
DEBUG = True ################ STATICFILES ################ # A list of locations of additional stati ...
- django的settings.py设置session
############ # SESSIONS # ############ SESSION_CACHE_ALIAS = 'default' # Cache to store session data ...
- django 配置文件settings.py 设置模板
INSTALLED_APPS = [ 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'dj ...
- Django settings.py 的media路径设置
转载自:http://www.xuebuyuan.com/676599.html 在一个 models 中使用 FileField 或 ImageField 需要以下步骤: 1. 在你的 settin ...
- Django之settings.py 的media路径设置
在一个 models 中使用 FileField 或 ImageField 需要以下步骤: 1. 在你的 settings.py文件中, 定义一个完整路径给MEDIA_ROOT 以便让 Django在 ...
- Scrapy框架实战-妹子图爬虫
Scrapy这个成熟的爬虫框架,用起来之后发现并没有想象中的那么难.即便是在一些小型的项目上,用scrapy甚至比用requests.urllib.urllib2更方便,简单,效率也更高.废话不多说, ...
随机推荐
- git提交时,仓库是空的,本地有源码。
应该打开cmd 归到项目路径 然后输入git push -u origin master -f 是把本地的项目强制推送到空的仓库 git init (在当前文件夹下初始化一个git仓库) git ...
- 转 用SQL语句,删除掉重复项只保留一条
用SQL语句,删除掉重复项只保留一条 用SQL语句,删除掉重复项只保留一条 在几千条记录里,存在着些相同的记录,如何能用SQL语句,删除掉重复的呢1.查找表中多余的重复记录,重复记录是根据单个字段(p ...
- 使用自编译的Emacs26.0.50build10版本,helm报错(已解决)
使用自编译的Emacs26.0.50build10版本,helm报错(已解决) */--> code {color: #FF0000} pre.src {background-color: #0 ...
- 在$CF$水题の记录
CF1158C CF1163E update after CF1173 很好,我!expert!掉rating了!! 成为pupil指日可待== 下次要记得合理安排时间== ps.一道题都没写的\(a ...
- Mac os x安装IDEAL及配置JDK和Maven
此文章是在已安装好IDEAL前提下进行配置jdk和maven的操作文档. 1. 下载并配置JDK及Maven Mac下载并配置JDK方法: 详见Mac安装JDK和JMeter5-安装JDK Mac下载 ...
- kubernetes容器集群部署Etcd集群
安装etcd 二进制包下载地址:https://github.com/etcd-io/etcd/releases/tag/v3.2.12 [root@master ~]# GOOGLE_URL=htt ...
- linux上传与下载文件命令
//文件从Linux系统上传到其他系统. sz空格+文件名 //文件从其他系统下载到Linux系统. rz //之后会弹出路径选择框,选择文件,即可下载到当前路径.
- The Preliminary Contest for ICPC Asia Xuzhou 2019 I J
I. query 题意:给出n的一个排列,有m个询问[l,r],询问[l,r]直接有倍数关系的pair个数. 解法:比赛完之后听说是原题,但是我没做过呀,做题太少了qwq.首先因为数字是1-n的,所以 ...
- RabbitMQ学习第三记:发布/订阅模式(Publish/Subscribe)
工作队列模式是直接在生产者与消费者里声明好一个队列,这种情况下消息只会对应同类型的消费者. 举个用户注册的列子:用户在注册完后一般都会发送消息通知用户注册成功(失败).如果在一个系统中,用户注册信息有 ...
- RecyclerView跳转到指定位置的三种方式
自从android5.0推出RecyclerView以后,RecyclerView越来越受广大程序员的热爱了!大家都知道RecyclerView的出现目的是为了替代listview和ScrollVie ...