一、依赖

virtualenv -p python3.6 xx

pip install scrapy

pip install pymysql

二、

1、创建项目和spider1

scrapy startproject scraw_swagger

scrapy genspider spider1  xxx.com  (执行之后在项目的spiders目录下会生成一个spider1.py的文件)

以下代码主要实现了将swagger的第一级目录爬下来存在一个叫:interfaces_path的文件下

# -*- coding: utf-8 -*-
import scrapy
import json
from scraw_swagger import settings class Spider1Spider(scrapy.Spider): name = 'spider1'
allowed_domains = ['xxx.com'']
scrawl_domain = settings.interface_domain+'/api-docs'
start_urls = [scrawl_domain] def parse(self, response):
# 调试代码
# filename = 'mid_link'
# open(filename, 'wb').write(response.body)
# /////////
response = response.body
response_dict = json.loads(response)
apis = response_dict['apis']
n = len(apis)
temppath = []
i = 0 domain = settings.interface_domain+'/api-docs'
filename = 'interfaces_path'
file = open(filename, 'w')
for i in range(0, n):
subapi = apis[i]
path = subapi['path']
path = ','+domain + path
temppath.append(path)
file.write(path)

2、创建spider2

scrapy genspider spider2 xxx.com  (执行之后在项目的spiders目录下会生成一个spider2.py的文件)

以下代码主要实现了获取interfaces_path的文件下的地址对应的内容

# # -*- coding: utf-8 -*-
import scrapy
from scraw_swagger.items import ScrawSwaggerItem
import json
from scraw_swagger import settings class Spider2Spider(scrapy.Spider):
name = 'spider2'
allowed_domains = ['xxx.com']
file = open('interfaces_path', 'r')
file = file.read()
list_files = []
files = file.split(',')
n = len(files)
for i in range(1, n):
file = files[i]
list_files.append(file)
start_urls = list_files def parse(self, response):
outitem = ScrawSwaggerItem()
out_interface = []
out_domain = []
out_method = []
out_param_name = []
out_data_type = []
out_param_required = []
# 调试代码
# filename = response.url.split("/")[-1]
# open('temp/'+filename, 'wb').write(response.body)
# ///////
response = response.body
response_dict = json.loads(response)
items = response_dict['apis']
items_len = len(items)
for j in range(0, items_len):
path = items[j]['path']
# interface组成list
operations = items[j]['operations'][0]
method = operations['method']
parameters = operations['parameters']
parameters_len = len(parameters)
param_name = []
param_required = []
data_type = []
for i in range(0, parameters_len):
name = parameters[i]['name']
param_name.append(name)
required = parameters[i]['required']
param_required.append(required)
type = parameters[i]['type']
data_type.append(type) out_interface.append(path)
interface_domain = settings.interface_domain
out_domain.append(interface_domain)
out_method.append(method)
out_data_type.append(data_type)
out_param_name.append(param_name)
out_param_required.append(param_required) outitem['interface'] = out_interface
outitem['domain'] = out_domain
outitem['method'] = out_method
outitem['param_name'] = out_param_name
outitem['param_required'] = out_param_required
outitem['data_type'] = out_data_type
yield outitem

3、settings.py文件

# -*- coding: utf-8 -*-

# Scrapy settings for scraw_swagger project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
# https://doc.scrapy.org/en/latest/topics/settings.html
# https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
# https://doc.scrapy.org/en/latest/topics/spider-middleware.html interface_domain = 'test'
token = 'test' # 调试代码
# interface_domain = 'http://xxxx..net'
# token = 'xxxxxx'
# /////////////// BOT_NAME = 'scraw_swagger' SPIDER_MODULES = ['scraw_swagger.spiders']
NEWSPIDER_MODULE = 'scraw_swagger.spiders' FEED_EXPORT_ENCODING = 'utf-8' # Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = 'scraw_swagger (+http://www.yourdomain.com)' # Obey robots.txt rules
ROBOTSTXT_OBEY = True # Configure maximum concurrent requests performed by Scrapy (default: 16)
#CONCURRENT_REQUESTS = 32 # Configure a delay for requests for the same website (default: 0)
# See https://doc.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
#DOWNLOAD_DELAY = 3
# The download delay setting will honor only one of:
#CONCURRENT_REQUESTS_PER_DOMAIN = 16
#CONCURRENT_REQUESTS_PER_IP = 16 # Disable cookies (enabled by default)
#COOKIES_ENABLED = False # Disable Telnet Console (enabled by default)
#TELNETCONSOLE_ENABLED = False # Override the default request headers:
#DEFAULT_REQUEST_HEADERS = {
# 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
# 'Accept-Language': 'en',
#} # Enable or disable spider middlewares
# See https://doc.scrapy.org/en/latest/topics/spider-middleware.html
#SPIDER_MIDDLEWARES = {
# 'scraw_swagger.middlewares.ScrawSwaggerSpiderMiddleware': 543,
#} # Enable or disable downloader middlewares
# See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
#DOWNLOADER_MIDDLEWARES = {
# 'scraw_swagger.middlewares.ScrawSwaggerDownloaderMiddleware': 543,
#} # Enable or disable extensions
# See https://doc.scrapy.org/en/latest/topics/extensions.html
#EXTENSIONS = {
# 'scrapy.extensions.telnet.TelnetConsole': None,
#} # Configure item pipelines
# See https://doc.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
'scraw_swagger.pipelines.ScrawSwaggerPipeline': 300,
# 'scraw_swagger.pipelines.MysqlTwistedPipline': 200,
} # Enable and configure the AutoThrottle extension (disabled by default)
# See https://doc.scrapy.org/en/latest/topics/autothrottle.html
#AUTOTHROTTLE_ENABLED = True
# The initial download delay
#AUTOTHROTTLE_START_DELAY = 5
# The maximum download delay to be set in case of high latencies
#AUTOTHROTTLE_MAX_DELAY = 60
# The average number of requests Scrapy should be sending in parallel to
# each remote server
#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# Enable showing throttling stats for every response received:
#AUTOTHROTTLE_DEBUG = False # Enable and configure HTTP caching (disabled by default)
# See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
#HTTPCACHE_ENABLED = True
#HTTPCACHE_EXPIRATION_SECS = 0
#HTTPCACHE_DIR = 'httpcache'
#HTTPCACHE_IGNORE_HTTP_CODES = []
#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage' MYSQL_HOST = ''
MYSQL_DBNAME = ''
MYSQL_USER = ''
MYSQL_PASSWD = ''
MYSQL_PORT = 3306

4、存入数据库。编写pipelines.py文件。提取返回的item,并将对应的字段存入数据库

# -*- coding: utf-8 -*-
import pymysql
from scraw_swagger import settings
from twisted.enterprise import adbapi
import pymysql.cursors
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html class ScrawSwaggerPipeline(object):
def process_item(self, item, spider):
try:
# 插入数据sql
sql = """
insert into interfaces (domain, interface, method, param_name ,data_type, param_required)
VALUES (%s, %s, %s, %s, %s, %s)
"""
domain = item['domain']
n = len(domain)
for i in range(0, n):
domain = str(item['domain'][i])
interface = str(item["interface"][i])
method = str(item["method"][i])
param_name = str(item["param_name"][i])
data_type = str(item["data_type"][i])
param_required = str(item["param_required"][i])
a = (domain, interface, method, param_name, data_type, param_required)
self.cursor.execute(sql, a)
self.connect.commit()
except Exception as error:
# 出现错误时打印错误日志
print(error)
# self.connect.close()
return item def __init__(self):
# 连接数据库
self.connect = pymysql.connect(
host=settings.MYSQL_HOST,
db=settings.MYSQL_DBNAME,
user=settings.MYSQL_USER,
passwd=settings.MYSQL_PASSWD,
port=settings.MYSQL_PORT,
charset='utf8',
use_unicode=True) # 通过cursor执行增删查改
self.cursor = self.connect.cursor()

swagger上的接口写入数据库的更多相关文章

  1. 开机后将sim/uim卡上的联系人写入数据库

    tyle="margin:20px 0px 0px; font-size:14px; line-height:26px; font-family:Arial; color:rgb(51,51 ...

  2. 通过POI实现上传EXCEL的批量读取数据写入数据库

    最近公司新增功能要求导入excel,并读取其中数据批量写入数据库.于是就开始了这个事情,之前的文章,记录了上传文件,本篇记录如何通过POI读取excel数据并封装为对象上传. 上代码: 1.首先这是一 ...

  3. c#上传文件并将word pdf转化成txt存储并将内容写入数据库

    c#上传文件并将word pdf转化成txt存储并将内容写入数据库 using System; using System.Data; using System.Configuration; using ...

  4. Django上传excel表格并将数据写入数据库

    前言: 最近公司领导要统计技术部门在各个业务条线花费的工时百分比,而 jira 当前的 Tempo 插件只能统计个人工时.于是就写了个报表工具,将 jira 中导出的个人工时excel表格 导入数据库 ...

  5. c#WebApi使用form表单提交excel,实现批量写入数据库

    思路:用户点击下载模板按钮,获取到excel模板,然后向里面填写数据保存.from表单提交的时候选择保存好的excel,实现数据的批量导入过程 先把模板放在服务器的项目目录下面:如 模板我一般放在:F ...

  6. 使用log4j让日志写入数据库

    之前做的一个项目有这么个要求,在日志管理系统里,需要将某些日志信息存储到数据库里,供用户.管理员查看分析.因此我就花了点时间搞了一下这一功能,各位请看. 摘要:我们知道log4j能提供强大的可配置的记 ...

  7. Excel 导入到Datatable 中,再使用常规方法写入数据库

    首先呢?要看你的电脑的office版本,我的是office 2013 .为了使用oledb程序,需要安装一个引擎.名字为AccessDatabaseEngine.exe.这里不过多介绍了哦.它的数据库 ...

  8. SQLBulkCopy使用实例--读取Excel写入数据库/将 Excel 文件转成 DataTable

    MS SQL Server 提供一个称为 bcp 的流行的命令提示符实用工具,用于将数据从一个表移动到另一个表(表可以在不同服务器上). SqlBulkCopy 类允许编写提供类似功能的托管代码解决方 ...

  9. thinkphp + 美图秀秀api 实现图片裁切上传,带数据库

    思路: 1.数据库 创建test2 创建表img,字段id,url,addtime 2.前台页: 1>我用的是bootstrap 引入必要的js,css 2>引入美图秀秀的js 3.后台: ...

随机推荐

  1. [枚举]P1085 不高兴的津津

    不高兴的津津 题目描述 津津上初中了.妈妈认为津津应该更加用功学习,所以津津除了上学之外,还要参加妈妈为她报名的各科复习班.另外每周妈妈还会送她去学习朗诵.舞蹈和钢琴.但是津津如果一天上课超过八个小时 ...

  2. java面试-CAS底层原理

    一.CAS是什么? 比较并交换,它是一条CPU并发原语. CAS是一种无锁算法,CAS有3个操作数,内存值V,旧的预期值A,要修改的新值B.当且仅当预期值A和内存值V相同时,将内存值V修改为B,否则什 ...

  3. OO_Unit3总结

    OO_Unit3总结 一.JML情况 理论基础 JML(Java Modeling Language)是用于对Java程序进行规格化设计的一种表示语言,为严格的程序设计提供了一套行之有效的方法.通过J ...

  4. 2020 OO 第二单元总结

    只要跑得够快即使从头关到尾你也喜欢吗? 一.设计策略 1.1 总体策略概述 在多线程的协同和同步控制方面,我三次作业都是采用生产者/消费者模式(还憨憨地在内部分了customer.producer.t ...

  5. Bounding Volume Hierarchies 加速结构

    背景   光线与物体求交是光线追踪的主要时间瓶颈.   如果不进行优化,则对每条光线,我们都需要遍历场景中的全部物体并求交.而现在想建模一个小物体的表面,往往要几千甚至几万个三角形,一个商业级产品,屏 ...

  6. (Collection, List, 泛型)JAVA集合框架一

    Java集合框架部分细节总结一 Collection List 有序,有下标,元素可重复 Set 无序,无下标,元素不可重复 以上为Collection接口 以ArrayList为实现类实现遍历:增强 ...

  7. 【spring源码系列】之【xml解析】

    1. 读源码的方法 java程序员都知道读源码的重要性,尤其是spring的源码,代码设计不仅优雅,而且功能越来越强大,几乎可以与很多开源框架整合,让应用更易于专注业务领域开发.但是能把spring的 ...

  8. Window下Python+CUDA+PyTorch安装

    1 概述 Windows下Python+CUDA+PyTorch安装,步骤都很详细,特此记录下来,帮助读者少走弯路. 2 Python Python的安装还是比较简单的,从官网下载exe安装包即可: ...

  9. Tree Recovery UVA - 536

    Little Valentine liked playing with binary trees very much. Her favorite game was constructing rando ...

  10. 网关Ocelot功能演示完结,久等了~~~

    前言 关于网关(Ocelot)的分享,还遗留一些功能没演示呢,接着来聊聊:这次重点针对网关Ocelot使用缓存.集成Polly做服务治理.集成IdentityServer4做认证授权来详细说说:如果对 ...