# -*- coding: utf-8 -*-

# Scrapy settings for tencent project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
# https://doc.scrapy.org/en/latest/topics/settings.html
# https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
# https://doc.scrapy.org/en/latest/topics/spider-middleware.html # 项目名
BOT_NAME = 'tencent' # 爬虫位置
SPIDER_MODULES = ['tencent.spiders']
NEWSPIDER_MODULE = 'tencent.spiders' # 日志输出
LOG_LEVEL = "INFO" # Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36' # Robot协议
# Obey robots.txt rules
ROBOTSTXT_OBEY = True # 设置最大并发请求
# Configure maximum concurrent requests performed by Scrapy (default: 16)
#CONCURRENT_REQUESTS = 32 # Configure a delay for requests for the same website (default: 0)
# See https://doc.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs # 每次请求之前的下载延迟
#DOWNLOAD_DELAY = 3
# The download delay setting will honor only one of:
#CONCURRENT_REQUESTS_PER_DOMAIN = 16 #每个域名的最大并发请求数
#CONCURRENT_REQUESTS_PER_IP = 16 #每个ip的最大并发请求数 # 下次请求之前携带之前请求的cookie
# Disable cookies (enabled by default)
#COOKIES_ENABLED = False # 禁用Telnet控制台(默认启用)
# Disable Telnet Console (enabled by default)
#TELNETCONSOLE_ENABLED = False # 覆盖默认请求头,USER_AGENT在之前,放此处将不起作用
# Override the default request headers:
#DEFAULT_REQUEST_HEADERS = {
# 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
# 'Accept-Language': 'en',
#} # 启用或禁用spider中间件
# Enable or disable spider middlewares
# See https://doc.scrapy.org/en/latest/topics/spider-middleware.html
#SPIDER_MIDDLEWARES = {
# 'tencent.middlewares.TencentSpiderMiddleware': 543,
#} # 启用或禁用下载器中间件
# Enable or disable downloader middlewares
# See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
#DOWNLOADER_MIDDLEWARES = {
# 'tencent.middlewares.TencentDownloaderMiddleware': 543,
#} #启用或禁用扩展程序
# Enable or disable extensions
# See https://doc.scrapy.org/en/latest/topics/extensions.html
#EXTENSIONS = {
# 'scrapy.extensions.telnet.TelnetConsole': None,
#} # 配置项目管道
# Configure item pipelines
# See https://doc.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
'tencent.pipelines.TencentPipeline': 300,
} # 启用和配置AutoThrottle扩展(默认情况下禁用)
# Enable and configure the AutoThrottle extension (disabled by default)
# See https://doc.scrapy.org/en/latest/topics/autothrottle.html
#AUTOTHROTTLE_ENABLED = True #初始下载延迟
# The initial download delay
#AUTOTHROTTLE_START_DELAY = 5 #在高延迟的情况下设置的最大下载延迟
# The maximum download delay to be set in case of high latencies
#AUTOTHROTTLE_MAX_DELAY = 60 # The average number of requests Scrapy should be sending in parallel to
# each remote server
#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0 #启用显示所收到的每个响应的调节统计信息:
# Enable showing throttling stats for every response received:
#AUTOTHROTTLE_DEBUG = False #启用和配置HTTP缓存(默认情况下禁用)
# Enable and configure HTTP caching (disabled by default)
# See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
#HTTPCACHE_ENABLED = True
#HTTPCACHE_EXPIRATION_SECS = 0
#HTTPCACHE_DIR = 'httpcache'
#HTTPCACHE_IGNORE_HTTP_CODES = []
#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'

Scrapy——settings配置文件的更多相关文章

  1. maven settings 配置文件

    maven settings 配置文件 <?xml version="1.0" encoding="UTF-8"?> <settings xm ...

  2. Scrapy 框架 配置文件

    配置文件 基本配置 #1.项目名称,默认的USER_AGENT由它来构成,也作为日志记录的日志名 BOT_NAME = 'Amazon' #2.爬虫应用路径 SPIDER_MODULES = ['Am ...

  3. 高仿 django插拔式 及 settings配置文件

    目录 基于django中间件实现插拔式 仿django settings 基于django中间件实现插拔式 start.py from notify import * send_all('好嗨哦') ...

  4. 【UE4 C++】 Config Settings配置文件(.ini)

    简介 常见存储路径 \Engine\Config\ \Engine\Saved\Config\ (运行后生成) [ProjectName]\Config\ [ProjectName]\Saved\Co ...

  5. Django settings配置文件

    由来:为什么我在用django配置的时候导入的不是我项目名下的那个settings 但是我配置了之后依然能够起作用,这是为什么? from django.conf import settings # ...

  6. django settings配置文件ALLOWED_HOSTS

    ALLOWED_HOSTS列表为了防止黑客入侵,只允许列表中的ip地址访问 填写上“*”可以使所有的网址都能访问Django项目了,项目测试的时候,可以这么做.这样就失去了保护

  7. Django的settings配置文件

    一.邮件配置 EMAIL_BACKEND = 'django.core.mail.backends.smtp.EmailBackend' EMAIL_HOST = 'smtp.qq.com' EMAI ...

  8. Scrapy 框架 安装 五大核心组件 settings 配置 管道存储

    scrapy 框架的使用 博客: https://www.cnblogs.com/bobo-zhang/p/10561617.html 安装: pip install wheel 下载 Twisted ...

  9. 第三百四十九节,Python分布式爬虫打造搜索引擎Scrapy精讲—cookie禁用、自动限速、自定义spider的settings,对抗反爬机制

    第三百四十九节,Python分布式爬虫打造搜索引擎Scrapy精讲—cookie禁用.自动限速.自定义spider的settings,对抗反爬机制 cookie禁用 就是在Scrapy的配置文件set ...

随机推荐

  1. 一些命令可以帮您了解Linux 操作系统用户信息

    1 显示上次登录的用户信息列表,包括(登录时间.退出时间.登录IP): [sywu@wusuyuan ~]$ last root pts/1 192.168.1.3 Wed Aug 27 22:08 ...

  2. Web测试实践-任务进度-Day02

    小组成员 华同学.郭同学.覃同学.刘同学.穆同学.沈同学 任务进度 在经过任务分配阶段后,大家都投入到了各自的任务中,以下是大家今天任务的进度情况汇总. 华同学 & 刘同学(任务1) 1.对爱 ...

  3. ceph修复osd为down的情况

    尝试一.直接重新激活所有osd 1.查看osd树 root@ceph01:~# ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-A ...

  4. 利用predis操作redis方法大全

    predis是PHP连接Redis的操作库,由于它完全使用php编写,大量使用命名空间以及闭包等功能,只支持php5.3以上版本,故实测性能一般,每秒25000次读写. 将session数据存放到re ...

  5. JSTL 引入

    首先要明白jstl有如下版本:  jstl1.0的引入方式为: <taglib uri="http://java.sun.com/jstl/core" prefix=&quo ...

  6. python文件操作os模块

    Python 统计某一文件夹下文件数量 使用python  pathlib模块 from pathlib import Path dir_path = ' ' print(len(list(Path( ...

  7. [Postgres]合并多行到一列(转)

    转自http://csk83.sinaapp.com/?p=104 在实际应用中常常遇见这样的情况,见下表,我们现在需要统计出来每年每个人的工资总和以及发放月份. user_name year mon ...

  8. jQuery获取各种input输入框的值

    1.获取一组radio单选框的值(name属性为一组的radio的name属性) var q1 = $("input[name=element_name]:checked").va ...

  9. Verilog MIPS32 CPU(四)-- RAM

    Verilog MIPS32 CPU(一)-- PC寄存器 Verilog MIPS32 CPU(二)-- Regfiles Verilog MIPS32 CPU(三)-- ALU Verilog M ...

  10. C# 连接Oracle,并调用存储过程(存在返回值),C# 调用sql存储过程

    1.获取Oracle表格信息 public OracleHelpers(string ConnStr) { ConnectionString = ConnStr; conn = new OracleC ...