scrapy 数据存储mysql

#spider.py
from scrapy.linkextractors import LinkExtractor

from scrapy.spiders import CrawlSpider, Rule

from Cwpjt.items import CwpjtItem

class FulongSpider(CrawlSpider):

    name = 'fulong'

    allowed_domains = ['sina.com.cn']

    start_urls = ['http://sina.com.cn/']

    'http://news.sina.com.cn/c/2017-05-09/doc-ifyeycte9324112.shtml'

    rules = (

        Rule(LinkExtractor(allow=('.*?/[0-9]{4}.[0-9]{2}.[0-9]{2}.doc-.*?shtml'),allow_domains=('sina.com.cn')),

             callback='parse_item', follow=True),

    )

    def parse_item(self, response):

        i = CwpjtItem()

        i['name']=response.xpath('/html/head/title/text()').extract()

        i['kws'] = response.xpath('/html/head/meta[@name="keywords"]/@content').extract()

        #i['domain_id'] = response.xpath('//input[@id="sid"]/@value').extract()

        #i['name'] = response.xpath('//div[@id="name"]').extract()

        #i['description'] = response.xpath('//div[@id="description"]').extract()

        return i

pipeline

import pymysql

from pymysql import connections

class CwpjtPipeline(object):

    def __init__(self):

        self.conn = pymysql.connect(host='127.0.0.1',user='root',passwd='123456',db ='mydb')

        self.cursor = self.conn.cursor()

    def process_item(self, item, spider):

        name = item['name'][0]

        kws = item['kws'][0]

        sql ="insert into hehe(title,kws) VALUES(%s,%s)"

        self.cursor.execute(sql,(name,kws,))

        self.conn.commit()

        return item

    def close_spider(self,spider):

        self.conn.close()

item

import scrapy

class CwpjtItem(scrapy.Item):

    # define the fields for your item here like:

    name = scrapy.Field()

    kws = scrapy.Field()

scrapy存储mysql的更多相关文章

分布式数据存储 - MySQL双主复制
上篇文章<分布式数据存储 - MySQL主从复制>,我们说到MySQL主从复制很好的保障了从库,读的高可用性.so,问题来了: 1.针对主库,写的高可用性又是如何做到高可用性? 2.如果需 ...
Scrapy小技巧-MySQL存储, MYSQL拼接
这两天上班接手,别人留下来的爬虫发现一个很好玩的 SQL脚本拼接. 只要你的Scrapy Field字段名字和数据库字段的名字一样.那么恭喜你你就可以拷贝这段SQL拼接脚本.进行MySQL入库处理 ...
scrapy 异步存储mysql
1.在setting中设置MySQL连接信息 HOST='101.201.70.139'MYSQL_DBNAME='anttest'MYSQL_PASSWORD='Myjr678!@#'MYSQL_U ...
scrapy 数据存储mysql
#spider.pyfrom scrapy.linkextractors import LinkExtractor from scrapy.spiders import CrawlSpider, Ru ...
解析数据存储MySQL
为了适应不同项目对不同感兴趣属性的解析存储,数据存储结构采用纵向的属性列表方式,即一个url页面多个属性存储多条记录方式,并且按照text,html, data,num几大典型类型分别对应存储. 创建 ...
分布式数据存储 - MySQL主从复制高可用方案
前面几篇文章说道MySQL数据库的高可用方案主从复制.主从复制的延迟产生原因.延迟检测及延迟解决方案(并未从根本上解决),这种主从复制方案保证数据的冗余的同时可以做读写分离来分担系统压力但是并非是高可 ...
解决Emoji存储MySQL报错问题
在解决之前,得先说明一下为什么会出现报错,Emoji表情占用4个字节,但是MySQL数据库UTF-8编码最多只能存储3个字节,就会导致存储不进去如何解决Emoji存储问题 mysql 的 utf8编 ...
scrapy连接MySQL
Scrapy中连接MySQL所需要做的工作如下: 1.settings中需要设置的部分 # 启动管道组件 ITEM_PIPELINES = { 'QianChengWuYu.mongoDBPiplel ...
数据存储之关系型数据库存储---MySQL存储
MySQL的存储利用PyMySQL连接MySQL 连接数据库 import pymysql # 连接MySQL MySQL在本地运行用户名为root 密码为123456 默认端口3306 db = ...

随机推荐

hive 调优（三）tez优化
我们采用亚马逊emr构建的集群,用hive查询的时候报错,FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.e ...
构建springboot的几种方式在线构建 STS构建 Idea 内置构建 Maven 构建
SpringBoot项目的几种创建方式,启动.和访问最常用的4种方式,但除了这些以外,还有其他方式: ①在线创建 ②STS构建 ③Intell Idea内置构建工具 ④Maven创建 STS官 ...
C++入门经典-例3.8-使用条件表达式判断一个数是否是3和5的整倍数
1:代码如下: // 3.8.cpp : 定义控制台应用程序的入口点. // #include "stdafx.h" #include<iostream> using ...
What exactly is the parameter e (event) and why pass it to JavaScript functions?
What exactly is the parameter e (event) and why pass it to JavaScript functions? 问题 Well, when I lea ...
SpringBoot通过@Value获取application.yml配置文件的属性值
application.yml实例: spring: redis: database: 0 host: 127.0.0.1 获取方法: /** * @Auther:WangZiBin * @Descr ...
LeetCode 5——最长回文子串
1. 题目 2. 解答我们定义状态 state[i][j] 表示子串 s[i, j] 是否为回文子串,如果 s[i, j] 为回文子串,并且有 s[i-1] == s[j+1],那么 s[i-1, ...
OpenStack 制作image，启动VM，无console log
OpenStack image 制作官方文档:https://docs.openstack.org/image-guide/create-images-manually.html 如果通过制作的镜像启 ...
阶段3 2.Spring_02.程序间耦合_8 工厂模式解耦的升级版
遍历枚举改造获取的方法,这样获取的对象就是单例模式再次运行测试程序对象只有一个实例的情况下对i这个值进行了反复的操作.当多个人活着多线程在使用时.这就会出现类成员变量由于第一个人的修改.后面看到 ...
利用jquery的淡入淡出函数（fadeIn和fadeOut)--实现轮播
首先说下,我在网上找的例子全是用的UL 实现,其实大可不必,只要是能包含img标签的HTML标签都可以做轮播效果.利用jquery的淡入淡出函数(fadeIn和fadeOut).废话也不多说,边上代码 ...
vtkExampleWarpVector和vtkWarpScalar
vtkWarpVector : deform geometry with vector data vtkWarpVector is a filter that modifies point coord ...

scrapy存储mysql

scrapy 数据存储mysql

scrapy存储mysql的更多相关文章

随机推荐

热门专题