# -*- coding: utf-8 -*-

# Scrapy settings for maitian project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
# https://doc.scrapy.org/en/latest/topics/settings.html
# https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
# https://doc.scrapy.org/en/latest/topics/spider-middleware.html BOT_NAME = 'maitian' SPIDER_MODULES = ['maitian.spiders']
NEWSPIDER_MODULE = 'maitian.spiders' #不能批量设置
# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = 'maitian (+http://www.yourdomain.com)' #默认遵守robots协议
# Obey robots.txt rules
ROBOTSTXT_OBEY = False #设置日志文件
LOG_FILE="maitian.log"
#日志等级分为5种:1.DEBUG 2.INFO 3.Warning 4.ERROR 5.CRITICAL
#等级越高 输出的日志越少
# LOG_LEVEL="INFO" #scrapy设置最大并发数 默认16
# Configure maximum concurrent requests performed by Scrapy (default: 16)
#CONCURRENT_REQUESTS = 32 #设置批量延迟请求16 等待3秒再发16 秒
# Configure a delay for requests for the same website (default: 0)
# See https://doc.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
#DOWNLOAD_DELAY = 3
# The download delay setting will honor only one of:
#CONCURRENT_REQUESTS_PER_DOMAIN = 16
#CONCURRENT_REQUESTS_PER_IP = 16 #cookie 不生效 默认是True
# Disable cookies (enabled by default)
#COOKIES_ENABLED = False #远程
# Disable Telnet Console (enabled by default)
#TELNETCONSOLE_ENABLED = False #加载默认的请求头
# Override the default request headers:
#DEFAULT_REQUEST_HEADERS = {
# 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
# 'Accept-Language': 'en',
#} #爬虫中间件
# Enable or disable spider middlewares
# See https://doc.scrapy.org/en/latest/topics/spider-middleware.html
#SPIDER_MIDDLEWARES = {
# 'maitian.middlewares.MaitianSpiderMiddleware': 543,
#} #下载中间件
# Enable or disable downloader middlewares
# See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
#DOWNLOADER_MIDDLEWARES = {
# 'maitian.middlewares.MaitianDownloaderMiddleware': 543,
#} # Enable or disable extensions
# See https://doc.scrapy.org/en/latest/topics/extensions.html
#EXTENSIONS = {
# 'scrapy.extensions.telnet.TelnetConsole': None,
#} #在配置文件 开启管道
#优先级的范围 0--1000;值越小 优先级越高
# Configure item pipelines
# See https://doc.scrapy.org/en/latest/topics/item-pipeline.html
#ITEM_PIPELINES = {
# 'maitian.pipelines.MaitianPipeline': 300,
#} # Enable and configure the AutoThrottle extension (disabled by default)
# See https://doc.scrapy.org/en/latest/topics/autothrottle.html
#AUTOTHROTTLE_ENABLED = True
# The initial download delay
#AUTOTHROTTLE_START_DELAY = 5
# The maximum download delay to be set in case of high latencies
#AUTOTHROTTLE_MAX_DELAY = 60
# The average number of requests Scrapy should be sending in parallel to
# each remote server
#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# Enable showing throttling stats for every response received:
#AUTOTHROTTLE_DEBUG = False # Enable and configure HTTP caching (disabled by default)
# See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
#HTTPCACHE_ENABLED = True
#HTTPCACHE_EXPIRATION_SECS = 0
#HTTPCACHE_DIR = 'httpcache'
#HTTPCACHE_IGNORE_HTTP_CODES = []
#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'

Scrapy框架: settings.py设置的更多相关文章

  1. Scrapy框架: middlewares.py设置

    # -*- coding: utf-8 -*- # Define here the models for your spider middleware # # See documentation in ...

  2. Scrapy框架: pipelines.py设置

    保存数据到json文件 # -*- coding: utf-8 -*- # Define your item pipelines here # # Don't forget to add your p ...

  3. Django settings.py设置 DEBUG=False后静态文件无法加载解决

    解决办法: settings.py 文件 DEBUG = False STATIC_ROOT = os.path.join(BASE_DIR,'static') #新增 urls.py文件(项目的) ...

  4. django的settings.py设置static

    DEBUG = True ################ STATICFILES ################ # A list of locations of additional stati ...

  5. django的settings.py设置session

    ############ # SESSIONS # ############ SESSION_CACHE_ALIAS = 'default' # Cache to store session data ...

  6. django 配置文件settings.py 设置模板

    INSTALLED_APPS = [ 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'dj ...

  7. Django settings.py 的media路径设置

    转载自:http://www.xuebuyuan.com/676599.html 在一个 models 中使用 FileField 或 ImageField 需要以下步骤: 1. 在你的 settin ...

  8. Django之settings.py 的media路径设置

    在一个 models 中使用 FileField 或 ImageField 需要以下步骤: 1. 在你的 settings.py文件中, 定义一个完整路径给MEDIA_ROOT 以便让 Django在 ...

  9. Scrapy框架实战-妹子图爬虫

    Scrapy这个成熟的爬虫框架,用起来之后发现并没有想象中的那么难.即便是在一些小型的项目上,用scrapy甚至比用requests.urllib.urllib2更方便,简单,效率也更高.废话不多说, ...

随机推荐

  1. python开发必备pycharm专业版破解方法

    修改hosts文件 添加下面一行到hosts文件,目的是屏蔽掉Pycharm对激活码的验证 0.0.0.0 account.jetbrains.com 注:hosts文件路径,Windows在C:\W ...

  2. HDFS 工具类

    读取HDFS上文件数据 import java.io.File; import java.io.FileInputStream; import java.io.IOException; import ...

  3. Model-View-ViewModel (MVVM) Explained 转摘自:http://www.wintellect.com/blogs/jlikness/model-view-viewmodel-mvvm-explained

    The purpose of this post is to provide an introduction to the Model-View-ViewModel (MVVM) pattern. W ...

  4. 微信小程序中new Date()转换时间时间格式时IOS不兼容的问题

    本周写小程序,遇到的一个bug,在chrome上显示得好好的时间,一到Safari/iPhone 就报错 “invalid date”,时间格式为“2019.06.06 13:12:49”,然后利用n ...

  5. 中州韵输入法(rime)导入搜狗词库

    rime是一个非常优秀的输入法,linux平台下的反应速度远超搜狗,也没有隐私风险.2012年开始接触它,到后来抛弃了它,因为rime自带的词库真的太弱了,也懒得折腾.最近发现一个词库转换软件叫ime ...

  6. linux-tomcat-install

    1.解压: tar zxvf xxx.tar.gz 配置JDK的环境变量,在etc/profile文件中添加 2.修改Tomcat启动端口 cd tomcat/conf/server.xml中的con ...

  7. docker 安装 jenkins 笔记

    前提: 已安装好 docker-ce,可运行 docker 命令 命令: sudo docker pull jenkins mkdir -p ~/dockers/jenkins cd ~/docker ...

  8. Eigen ,MKL和 matlab 矩阵乘法速度比较

    Eigen 矩阵乘法的速度  < MKL矩阵乘法的速度,MKL矩阵乘法的速度与matlab矩阵乘法的速度相差不大,但matlab GPU版本的矩阵乘法速度是CUP的两倍,在采用float数据类型 ...

  9. 使用KEIL C51实现的简单合作式多任务操作系统内核(单片机实现版本)

    基于网上网友的代码,自己在单片机上实现, 特此记录分享之. 基于https://blog.csdn.net/yyx112358/article/details/78877523 //使用KEIL C5 ...

  10. java基础学习笔记二(接口、super、this)

    一.super 和 this的用法 主要解释一下引用构造函数的用法 super(参数):调用父类中的某一个构造函数(应该为构造函数中的第一条语句) this(参数):调用本类中另一种形式的构造函数(应 ...