swagger上的接口写入数据库

一、依赖

virtualenv -p python3.6 xx

pip install scrapy

pip install pymysql

二、

1、创建项目和spider1

scrapy startproject scraw_swagger

scrapy genspider spider1 xxx.com (执行之后在项目的spiders目录下会生成一个spider1.py的文件)

以下代码主要实现了将swagger的第一级目录爬下来存在一个叫：interfaces_path的文件下

# -*- coding: utf-8 -*-

import scrapy

import json

from scraw_swagger import settings

class Spider1Spider(scrapy.Spider):

    name = 'spider1'

    allowed_domains = ['xxx.com'']

    scrawl_domain = settings.interface_domain+'/api-docs'

    start_urls = [scrawl_domain]

    def parse(self, response):

        # 调试代码

        # filename = 'mid_link'

        # open(filename, 'wb').write(response.body)

        # /////////

        response = response.body

        response_dict = json.loads(response)

        apis = response_dict['apis']

        n = len(apis)

        temppath = []

        i = 0

        domain = settings.interface_domain+'/api-docs'

        filename = 'interfaces_path'

        file = open(filename, 'w')

        for i in range(0, n):

            subapi = apis[i]

            path = subapi['path']

            path = ','+domain + path

            temppath.append(path)

            file.write(path)

2、创建spider2

scrapy genspider spider2 xxx.com (执行之后在项目的spiders目录下会生成一个spider2.py的文件)

以下代码主要实现了获取interfaces_path的文件下的地址对应的内容

# # -*- coding: utf-8 -*-

import scrapy

from scraw_swagger.items import ScrawSwaggerItem

import json

from scraw_swagger import settings

class Spider2Spider(scrapy.Spider):

    name = 'spider2'

    allowed_domains = ['xxx.com']

    file = open('interfaces_path', 'r')

    file = file.read()

    list_files = []

    files = file.split(',')

    n = len(files)

    for i in range(1, n):

        file = files[i]

        list_files.append(file)

    start_urls = list_files

    def parse(self, response):

        outitem = ScrawSwaggerItem()

        out_interface = []

        out_domain = []

        out_method = []

        out_param_name = []

        out_data_type = []

        out_param_required = []

        # 调试代码

        # filename = response.url.split("/")[-1]

        # open('temp/'+filename, 'wb').write(response.body)

        # ///////

        response = response.body

        response_dict = json.loads(response)

        items = response_dict['apis']

        items_len = len(items)

        for j in range(0, items_len):

            path = items[j]['path']

            # interface组成list

            operations = items[j]['operations'][0]

            method = operations['method']

            parameters = operations['parameters']

            parameters_len = len(parameters)

            param_name = []

            param_required = []

            data_type = []

            for i in range(0, parameters_len):

                name = parameters[i]['name']

                param_name.append(name)

                required = parameters[i]['required']

                param_required.append(required)

                type = parameters[i]['type']

                data_type.append(type)

            out_interface.append(path)

            interface_domain = settings.interface_domain

            out_domain.append(interface_domain)

            out_method.append(method)

            out_data_type.append(data_type)

            out_param_name.append(param_name)

            out_param_required.append(param_required)

        outitem['interface'] = out_interface

        outitem['domain'] = out_domain

        outitem['method'] = out_method

        outitem['param_name'] = out_param_name

        outitem['param_required'] = out_param_required

        outitem['data_type'] = out_data_type

        yield outitem

3、settings.py文件

# -*- coding: utf-8 -*-

# Scrapy settings for scraw_swagger project

#

# For simplicity, this file contains only settings considered important or

# commonly used. You can find more settings consulting the documentation:

#

#     https://doc.scrapy.org/en/latest/topics/settings.html

#     https://doc.scrapy.org/en/latest/topics/downloader-middleware.html

#     https://doc.scrapy.org/en/latest/topics/spider-middleware.html

interface_domain = 'test'

token = 'test'

# 调试代码

# interface_domain = 'http://xxxx..net'

# token = 'xxxxxx'

# ///////////////

BOT_NAME = 'scraw_swagger'

SPIDER_MODULES = ['scraw_swagger.spiders']

NEWSPIDER_MODULE = 'scraw_swagger.spiders'

FEED_EXPORT_ENCODING = 'utf-8'

# Crawl responsibly by identifying yourself (and your website) on the user-agent

#USER_AGENT = 'scraw_swagger (+http://www.yourdomain.com)'

# Obey robots.txt rules

ROBOTSTXT_OBEY = True

# Configure maximum concurrent requests performed by Scrapy (default: 16)

#CONCURRENT_REQUESTS = 32

# Configure a delay for requests for the same website (default: 0)

# See https://doc.scrapy.org/en/latest/topics/settings.html#download-delay

# See also autothrottle settings and docs

#DOWNLOAD_DELAY = 3

# The download delay setting will honor only one of:

#CONCURRENT_REQUESTS_PER_DOMAIN = 16

#CONCURRENT_REQUESTS_PER_IP = 16

# Disable cookies (enabled by default)

#COOKIES_ENABLED = False

# Disable Telnet Console (enabled by default)

#TELNETCONSOLE_ENABLED = False

# Override the default request headers:

#DEFAULT_REQUEST_HEADERS = {

#   'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',

#   'Accept-Language': 'en',

#}

# Enable or disable spider middlewares

# See https://doc.scrapy.org/en/latest/topics/spider-middleware.html

#SPIDER_MIDDLEWARES = {

#    'scraw_swagger.middlewares.ScrawSwaggerSpiderMiddleware': 543,

#}

# Enable or disable downloader middlewares

# See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html

#DOWNLOADER_MIDDLEWARES = {

#    'scraw_swagger.middlewares.ScrawSwaggerDownloaderMiddleware': 543,

#}

# Enable or disable extensions

# See https://doc.scrapy.org/en/latest/topics/extensions.html

#EXTENSIONS = {

#    'scrapy.extensions.telnet.TelnetConsole': None,

#}

# Configure item pipelines

# See https://doc.scrapy.org/en/latest/topics/item-pipeline.html

ITEM_PIPELINES = {

   'scraw_swagger.pipelines.ScrawSwaggerPipeline': 300,

   # 'scraw_swagger.pipelines.MysqlTwistedPipline': 200,

}

# Enable and configure the AutoThrottle extension (disabled by default)

# See https://doc.scrapy.org/en/latest/topics/autothrottle.html

#AUTOTHROTTLE_ENABLED = True

# The initial download delay

#AUTOTHROTTLE_START_DELAY = 5

# The maximum download delay to be set in case of high latencies

#AUTOTHROTTLE_MAX_DELAY = 60

# The average number of requests Scrapy should be sending in parallel to

# each remote server

#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0

# Enable showing throttling stats for every response received:

#AUTOTHROTTLE_DEBUG = False

# Enable and configure HTTP caching (disabled by default)

# See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings

#HTTPCACHE_ENABLED = True

#HTTPCACHE_EXPIRATION_SECS = 0

#HTTPCACHE_DIR = 'httpcache'

#HTTPCACHE_IGNORE_HTTP_CODES = []

#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'

MYSQL_HOST = ''

MYSQL_DBNAME = ''

MYSQL_USER = ''

MYSQL_PASSWD = ''

MYSQL_PORT = 3306

4、存入数据库。编写pipelines.py文件。提取返回的item，并将对应的字段存入数据库

# -*- coding: utf-8 -*-

import pymysql

from scraw_swagger import settings

from twisted.enterprise import adbapi

import pymysql.cursors

#

# Don't forget to add your pipeline to the ITEM_PIPELINES setting

# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html

class ScrawSwaggerPipeline(object):

    def process_item(self, item, spider):

        try:

            # 插入数据sql

            sql = """

                insert into interfaces (domain, interface, method, param_name ,data_type, param_required)

                VALUES (%s, %s, %s, %s, %s, %s)

            """

            domain = item['domain']

            n = len(domain)

            for i in range(0, n):

                domain = str(item['domain'][i])

                interface = str(item["interface"][i])

                method = str(item["method"][i])

                param_name = str(item["param_name"][i])

                data_type = str(item["data_type"][i])

                param_required = str(item["param_required"][i])

                a = (domain, interface, method, param_name, data_type, param_required)

                self.cursor.execute(sql, a)

                self.connect.commit()

        except Exception as error:

            # 出现错误时打印错误日志

            print(error)

        # self.connect.close()

        return item

    def __init__(self):

        # 连接数据库

        self.connect = pymysql.connect(

            host=settings.MYSQL_HOST,

            db=settings.MYSQL_DBNAME,

            user=settings.MYSQL_USER,

            passwd=settings.MYSQL_PASSWD,

            port=settings.MYSQL_PORT,

            charset='utf8',

            use_unicode=True)

        # 通过cursor执行增删查改

        self.cursor = self.connect.cursor()

swagger上的接口写入数据库的更多相关文章

开机后将sim/uim卡上的联系人写入数据库
tyle="margin:20px 0px 0px; font-size:14px; line-height:26px; font-family:Arial; color:rgb(51,51 ...
通过POI实现上传EXCEL的批量读取数据写入数据库
最近公司新增功能要求导入excel,并读取其中数据批量写入数据库.于是就开始了这个事情,之前的文章,记录了上传文件,本篇记录如何通过POI读取excel数据并封装为对象上传. 上代码: 1.首先这是一 ...
c#上传文件并将word pdf转化成txt存储并将内容写入数据库
c#上传文件并将word pdf转化成txt存储并将内容写入数据库 using System; using System.Data; using System.Configuration; using ...
Django上传excel表格并将数据写入数据库
前言: 最近公司领导要统计技术部门在各个业务条线花费的工时百分比,而 jira 当前的 Tempo 插件只能统计个人工时.于是就写了个报表工具,将 jira 中导出的个人工时excel表格导入数据库 ...
c#WebApi使用form表单提交excel，实现批量写入数据库
思路:用户点击下载模板按钮,获取到excel模板,然后向里面填写数据保存.from表单提交的时候选择保存好的excel,实现数据的批量导入过程先把模板放在服务器的项目目录下面:如模板我一般放在:F ...
使用log4j让日志写入数据库
之前做的一个项目有这么个要求,在日志管理系统里,需要将某些日志信息存储到数据库里,供用户.管理员查看分析.因此我就花了点时间搞了一下这一功能,各位请看. 摘要:我们知道log4j能提供强大的可配置的记 ...
Excel 导入到Datatable 中，再使用常规方法写入数据库
首先呢?要看你的电脑的office版本,我的是office 2013 .为了使用oledb程序,需要安装一个引擎.名字为AccessDatabaseEngine.exe.这里不过多介绍了哦.它的数据库 ...
SQLBulkCopy使用实例--读取Excel写入数据库/将 Excel 文件转成 DataTable
MS SQL Server 提供一个称为 bcp 的流行的命令提示符实用工具,用于将数据从一个表移动到另一个表(表可以在不同服务器上). SqlBulkCopy 类允许编写提供类似功能的托管代码解决方 ...
thinkphp + 美图秀秀api 实现图片裁切上传，带数据库
思路: 1.数据库创建test2 创建表img,字段id,url,addtime 2.前台页: 1>我用的是bootstrap 引入必要的js,css 2>引入美图秀秀的js 3.后台: ...

随机推荐

MzzTxx——团队介绍
项目内容这个作业属于哪个课程 2021春季计算机学院软件工程(罗杰任健) 这个作业的要求在哪里团队项目-团队介绍我在这个课程的目标是提升工程能力和团队意识,熟悉软件开发的流程这个作业在哪 ...
java POI（二）
name.xslx 1 public class Demo6 { 2 3 public static void main(String[] args) throws IOException { 4 I ...
Day14_85_通过反射机制修改Class的属性值（IO+Properties）动态修改
通过反射机制修改Class的属性值(IO+Properties)动态修改 import java.io.FileInputStream; import java.io.FileNotFoundExce ...
微信小程序底部实现自定义动态Tabbar
多图警告!!! 最近在工作中遇到这样一个需求:微信小程序底部的Tab需要通过判断登录人的角色动态进行改变,想要实现这个功能依靠小程序原生的Tabbar是不可能实现的了,所以研究了一下自定义Tab,这里 ...
[重要更新]微信小程序登录、用户信息相关接口调整：使用 wx.getUserProfile 取代 wx.getUserInfo
2021年2月24日,微信官方团队发布了一个调整通知:<小程序登录.用户信息相关接口调整说明>,公告明确从4月13日起,所有发布的小程序将无法使用 wx.getUserInfo 接口(JS ...
postman传入cookie的三种方式
1.直接在环境变量里面设置 2.在登录接口的tests,把返回值的cookie设置为全局变量 3.直接在cookie里面传入
判断post,ajax,get请求的方法
判断post,ajax,get请求的方法 define('IS_GET',isset($_SERVER['REQUEST_METHOD']) ? $_SERVER['REQUEST_METHOD'] ...
【日志追踪】（微服务应用和单体应用）-logback中的MDC机制
一.MDC介绍 MDC(Mapped Diagnostic Contexts)映射诊断上下文,该特征是logback提供的一种方便在多线程条件下的记录日志的功能, 某些应用程序采用多线程的方式来处理多 ...
Python中math和cmath模块的使用
目录 Math模块 Cmath模块 Math模块 pi 数字常量,圆周率 e 表示一个常量 sqrt(x) 求x的平方根 ...
Windows PR提权
目录提权利用的漏洞 PR提权提权利用的漏洞 Microsoft Windows RPCSS服务隔离本地权限提升漏洞 RPCSS服务没有正确地隔离 NetworkService 或 LocalSer ...

swagger上的接口写入数据库

swagger上的接口写入数据库的更多相关文章

随机推荐

热门专题