scrapy Pipeline 练习

class WeatherPipeline(object):

    def process_item(self, item, spider):

        print(item)

        return item

#插入到redis

import redis

import json

class RedisPipeline(object):

    def __init__(self,host,port,password):

        self.host=host

        self.port=port

        self.password=password

    @classmethod

    def from_crawler(cls, crawler):

        return cls(

            host=crawler.settings.get('RE_HOST'),

            port=crawler.settings.get('RE_PORT', ''),

            password=crawler.settings.get('RE_PASS', 'xxxxx')

        )

    def open_spider(self, spider):

        pool = redis.ConnectionPool(host=self.host,password=self.password,port=self.port,db=3)

        self.client=redis.Redis(connection_pool=pool)

        # print(self.client)

    def process_item(self, item, spider):

        self.client.hmset(item['city'],dict(item))

        # self.client.lpush('weather',json.dumps(dict(item)))

        # self.client.sadd('weathers',json.dumps(dict(item)))

        # return item

        return item

#插入到mongoDB

import pymongo

class MongoPipeline(object):

    collection_name = 'tianqi'

    def __init__(self, mongo_host, mongo_db):

        self.mongo_host = mongo_host

        self.mongo_db = mongo_db

    @classmethod

    def from_crawler(cls, crawler):

        return cls(

            mongo_host=crawler.settings.get('MO_HOST'),

            mongo_db=crawler.settings.get('MO_DB', 'weather')

        )

    def open_spider(self, spider):

        self.client = pymongo.MongoClient(host=self.mongo_host)

        self.db = self.client[self.mongo_db]

    def close_spider(self, spider):

        self.client.close()

    def process_item(self, item, spider):

        self.db[self.collection_name].insert_one(dict(item))

        return item

#插入mysql 数据库

import pymysql

class MysqlPipeline(object):

    def __init__(self,host,username,password,database,port,charset):

        self.host=host

        self.username=username

        self.password=password

        self.database=database

        self.port=port

        self.charset=charset

    @classmethod

    def from_crawler(cls, crawler):

        return cls(

            host=crawler.settings.get('MY_HOST'),

            username=crawler.settings.get('MY_USER'),

            password=crawler.settings.get('MY_PASS'),

            database=crawler.settings.get('MY_DATA'),

            port=crawler.settings.get('MY_PORT'),

            charset=crawler.settings.get('MY_CHARSET'),

        )

    def open_spider(self,spider):

        self.client=pymysql.connect(host=self.host,user=self.username,password=self.password,database=self.database,port=self.port,charset=self.charset)

        self.cursor=self.client.cursor()

    def close_spider(self, spider):

        self.cursor.close()

        self.client.close()

    def process_item(self, item, spider):

        self.cursor.execute("INSERT INTO weather (`sheng`,`city`,`hqiwen`,`lqiwen`) VALUES (%s,%s,%s,%s)",(item['sheng'],item['city'],item['hqiwen'],item['lqiwen']))

        self.client.commit()

        return item

scrapy Pipeline 练习的更多相关文章

scrapy pipeline
pipeline的四个方法 @classmethod def from_crawler(cls, crawler): """ 初始化的时候,用以创建pipeline对象 ...
scrapy Pipeline使用twisted异步实现mysql数据插入
from twisted.enterprise import adbapi class MySQLAsyncPipeline: def open_spider(self, spider): db = ...
scrapy项目5：爬取ajax形式加载的数据，并用ImagePipeline保存图片
1.目标分析: 我们想要获取的数据为如下图: 1).每本书的名称 2).每本书的价格 3).每本书的简介 2.网页分析: 网站url:http://e.dangdang.com/list-WY1-dd ...
Scrapy 下载文件和图片
我们学习了从网页中爬取信息的方法,这只是爬虫最典型的一种应用,除此之外,下载文件也是实际应用中很常见的一种需求,例如使用爬虫爬取网站中的图片.视频.WORD文档.PDF文件.压缩包等. 1.Files ...
Python逆向爬虫之scrapy框架,非常详细
爬虫系列目录目录 Python逆向爬虫之scrapy框架,非常详细一.爬虫入门 1.1 定义需求 1.2 需求分析 1.2.1 下载某个页面上所有的图片 1.2.2 分页 1.2.3 进行下载图片 ...
Scrapy:为spider指定pipeline
当一个Scrapy项目中有多个spider去爬取多个网站时,往往需要多个pipeline,这时就需要为每个spider指定其对应的pipeline. [通过程序来运行spider],可以通过修改配置s ...
Python爬虫从入门到放弃（十六）之 Scrapy框架中Item Pipeline用法
当Item 在Spider中被收集之后,就会被传递到Item Pipeline中进行处理每个item pipeline组件是实现了简单的方法的python类,负责接收到item并通过它执行一些行为, ...
二、Item Pipeline和Spider-----基于scrapy取校花网的信息
Item Pipeline 当Item在Spider中被收集之后,它将会被传递到Item Pipeline,这些Item Pipeline组件按定义的顺序处理Item. 每个Item Pipeline ...
Scrapy爬虫框架第七讲【ITEM PIPELINE用法】
ITEM PIPELINE用法详解: ITEM PIPELINE作用: 清理HTML数据验证爬取的数据(检查item包含某些字段) 去重(并丢弃)[预防数据去重,真正去重是在url,即请求阶段做] ...

随机推荐

如何利用fastjson将JSON格式的字符串转换为Map，再返回至前端成为js对象
//注意,这里的jsonStr是json格式的字符串,里面如果遇到双引号嵌套双引号的,一般是嵌套的双引号经过转义 // \",假如有这样的一个场景,这些字符串里面有需要的css样式的j ...
public class 与 class 的区别
public class 与 class 的区别 1.一个类前面的public是可有可无的 2.如果一个类使用 public 修饰,则文件名必须与类名一致 3.如果一个类前面没有使用public修饰, ...
办公开发环境（外接显示屏，wifi热点）
笔记本电脑怎样外接显示器 https://jingyan.baidu.com/article/3c48dd34495247e10ae35879.html?qq-pf-to=pcqq.c2c 怎样在Wi ...
Unity 游戏对象的组件列表
描述: 1 个游戏对象,上面有 4 个组件, 如图: 脚本 Test_01 的内容,如下: using System.Collections; using System.Collections.Gen ...
H903
Metadata-Version: 2.0Name: hackingVersion: 0.10.2Summary: OpenStack Hacking Guideline EnforcementHom ...
Node调试之node-inspect工具
1.全局安装node-inspect模块: npm install -g node-inspect 2.通过谷歌浏览器打开:chrome://flags/#enable-devtools-experi ...
Java工具-检验ftp服务器的指定文件是否存在
项目工作中,需要检验ftp服务器中指定文件是否存在,在网上查阅了相关资料,可以通过ftpClient类进行实现. import org.apache.commons.net.ftp.FTP; impo ...
从零开始的全栈工程师——js篇（闭包）
闭包是js中的一大特色,也是一大难点.简单来说,所谓闭包就是说,一个函数能够访问其函数外部作用域中的变量. 闭包的三大特点为: 1.函数嵌套函数 2.内部函数可以访问外部函数的变量 3.参数和变量不会 ...
BZOJ1044: [HAOI2008]木棍分割(dp 单调队列)
题意题目链接 Sol 比较套路的一个题. 第一问二分答案check一下第二问设\(f[i][j]\)表示前\(i\)个数,切了\(j\)段的方案数,单调队列优化一下. 转移的时候只需要保证当前段的 ...
栅格那点儿事（四E）
栅格金字塔如果上面的部分都已经看过了,那么如何在ArcMap中更好的渲染一个栅格数据你已经知道了.可仅展示好一个栅格数据是不够的,我们还需要知道如何快速的展示一个栅格数据. 讲金字塔之前,先解释 ...

scrapy Pipeline 练习

scrapy Pipeline 练习的更多相关文章

随机推荐

热门专题