python 爬虫 5i5j房屋信息获取并存储到数据库

 from lxml import etree

 from selenium import webdriver

 import pymysql

 def Geturl(fullurl):#获取每个招聘网页的链接

     browser.get(fullurl)

     shouye_html_text = browser.page_source

     shouye_ele = etree.HTML(shouye_html_text)

     zf_list = shouye_ele.xpath('/html/body/div[4]/div[1]/div[2]/ul/li/div/h3/a/@href')#链接url

     zf_url_list  = []

     for zf_url_lost in zf_list:

         zf_url  = 'https://bj.5i5j.com'+zf_url_lost

         zf_url_list.append(zf_url)

     return zf_url_list

 def Getinfo(zp_url_list):

     for zp_url in zp_url_list:

         browser.get(zp_url)

         zp_info_html = browser.page_source

         zp_ele = etree.HTML(zp_info_html)

         zp_info_title = str(zp_ele.xpath('//html/body/div[3]/div[1]/div[1]/h1/text()')[0])

         zp_info_num = str(zp_ele.xpath('/html/body/div[3]/div[2]/div[2]/div[1]/div[1]/div/p[1]/text()')[0])+'元/月'#价格

         zp_info_type = str(zp_ele.xpath('/html/body/div[3]/div[2]/div[2]/div[1]/div[2]/div/p[1]/text()')[0])#户型

         zp_info_zone = str(zp_ele.xpath('/html/body/div[3]/div[2]/div[2]/div[1]/div[3]/div/p[1]/text()')[0])+'平米'#房屋大小

         zp_info_need_1 = str(zp_ele.xpath('/html/body/div[3]/div[2]/div[2]/div[2]/ul/li[1]/span/text()')[0])#房屋信息

         zp_info_need_2 = str(zp_ele.xpath('/html/body/div[3]/div[2]/div[2]/div[2]/ul/li[1]/a/text()')[0])#房屋信息

         zp_info_need = zp_info_need_1+zp_info_need_2

         connection = pymysql.connect(host='localhost', user='root', password='', db='5i5j', )

         try:

             with connection.cursor() as cursor:

                 sql = "INSERT INTO `5i5j_info` (`title`,`num`,`type`, `zone`,`need`) VALUES (%s,%s,%s,%s, %s)"

                 cursor.execute(sql, (zp_info_title,zp_info_num,zp_info_type,zp_info_zone,zp_info_need))

             connection.commit()

         finally:

             connection.close()

         print(zp_info_title,zp_info_num,zp_info_type,zp_info_zone,zp_info_need)

 if __name__ == '__main__':

     browser = webdriver.Chrome()

     pags = int(input('需要几页?'))

     for i in range(1,pags+1):

         url = 'https://bj.5i5j.com/zufang/huilongguan/n{}/'

         fullurl = url.format(str(i))

         zf_url_list = Geturl(fullurl)

         print(fullurl)

         # print(zf_url_list)

         Getinfo(zf_url_list)

     browser.close()

python 爬虫 5i5j房屋信息获取并存储到数据库的更多相关文章

Python爬取招聘信息，并且存储到MySQL数据库中
前面一篇文章主要讲述,如何通过Python爬取招聘信息,且爬取的日期为前一天的,同时将爬取的内容保存到数据库中:这篇文章主要讲述如何将python文件压缩成exe可执行文件,供后面的操作. 这系列文章 ...
[Python爬虫] Selenium+Phantomjs动态获取CSDN下载资源信息和评论
前面几篇文章介绍了Selenium.PhantomJS的基础知识及安装过程,这篇文章是一篇应用.通过Selenium调用Phantomjs获取CSDN下载资源的信息,最重要的是动态获取资源的评论,它是 ...
Python爬虫入门案例：获取百词斩已学单词列表
百词斩是一款很不错的单词记忆APP,在学习过程中,它会记录你所学的每个单词及你答错的次数,通过此列表可以很方便地找到自己在记忆哪些单词时总是反复出错记不住.我们来用Python来爬取这些信息,同时学习 ...
Python实战：Python爬虫学习教程，获取电影排行榜
Python应用现在如火如荼,应用范围很广.因其效率高开发迅速的优势,快速进入编程语言排行榜前几名.本系列文章致力于可以全面系统的介绍Python语言开发知识和相关知识总结.希望大家能够快速入门并学习 ...
Python爬虫之cookie的获取、保存和使用【新手必学】
前言本文的文字及图片来源于网络,仅供学习.交流使用,不具有任何商业用途,版权归原作者所有,如有问题请及时联系我们以作处理.作者:huhanghao Cookie,指某些网站为了辨别用户身份.进行ses ...
Python爬虫技术(从网页获取图片)+HierarchicalClustering层次聚类算法，实现自动从网页获取图片然后根据图片色调自动分类—Jason niu
网上教程太啰嗦,本人最讨厌一大堆没用的废话,直接上,就是干! 网络爬虫?非监督学习? 只有两步,只有两个步骤? Are you kidding me? Are you ok? 来吧,follow me ...
Python爬虫-播报天气信息(生成exe文件)待续
#!/usr/bin/env python3 # -*- coding : utf-8 -*- '''1.从https://my.oschina.net/joanfen/blog/140364获取要播 ...
python爬虫入门（九）Scrapy框架之数据库保存
豆瓣电影TOP 250爬取-->>>数据保存到MongoDB 豆瓣电影TOP 250网址要求: 1.爬取豆瓣top 250电影名字.演员列表.评分和简介 2.设置随机UserAge ...
python爬虫之12306网站--车站信息查询
python爬虫查询车站信息目录: 1.找到要查询的url 2.对信息进行分析 3.对信息进行处理 python爬虫查询全拼相同的车站目录: 1.找到要查询的url 2.对信息进行分析 3.对信息 ...

随机推荐

angular2 文件上传
ng2-file-upload文件上传 1.安装ng2-file-upload模块 npm install ng2-file-upload --save 2.如果使用systemjs打包,需要在配置s ...
ZT 段祺瑞终生忏悔枪杀学生？
段祺瑞终生忏悔枪杀学生?http://cache.baiducontent.com/c?m=9f65cb4a8c8507ed4fece76310528c315c4380146080955468d4e4 ...
如何在SAP C4C里使用ABSL消费第三方Restful API
首先我们得有一个可以正常工作的Restful API: 然后在Cloud for Customer的Cloud Application Studio里创建Restful API的模型,把第一步可以正常 ...
Github注册
官网:https://github.com/ sign in : 登录. sign up: 注册. 注意邮箱不可以随便乱填,邮箱用于激活账号和找回密码. 由于github的服务器在国外,访问比较慢,需 ...
thrift C++ Centos 安装
1.在官方下载thrift http://thrift.apache.org/download 这里下载thrift-0.11.0.tar.gz版本 2.如果想支持安装Cpp版本就需要先安装boost ...
21、整合Druid数据源
1).引入外部的数据源(Druid)  <dependenc ...
匹配iPhoneX
1.header中加一下标签 <meta name="viewport" content="width=device-width,initial-scale=1,m ...
Enum介绍
public enum Color { RED, YELLOW, BLUE; } 说明: 使用的是enum关键字而不是class 多个枚举变量之间用逗号隔开枚举变量名大写,多个单词之间用 _ 隔 ...
wshShell.SendKeys模拟键盘操作
Dim wshShellSet wshShell = CreateObject("Wscript.Shell")wshShell.SendKeys "{ENTER}&qu ...
如何在github上实现预览
这个问题在网络上有很多答案,但是真正能解决的寥寥无几!接下来我就来尝试一下网络上疯传的几种方法.准备好了吗?我要开车了!!! PS:以下实验上传到github的demo采取导入本地css,js和网络上 ...

python 爬虫 5i5j房屋信息 获取并存储到数据库

python 爬虫 5i5j房屋信息 获取并存储到数据库的更多相关文章

随机推荐

热门专题

python 爬虫 5i5j房屋信息获取并存储到数据库

python 爬虫 5i5j房屋信息获取并存储到数据库的更多相关文章