利用BeautifulSoup爬去我爱我家的租房数据

因为之前对BeautifulSoup一直不是很熟悉，刚好身边的朋友同事在找房子，就想着能不能自己写个爬虫爬一下数据，因此就写了这个爬虫。基本都是边看书边写的，不过也没什么好讲的。直接粘代码了。

# coding=utf-8

import requests

from bs4 import BeautifulSoup

import  pymysql

import time

db= pymysql.connect(host="127.0.0.1",port =3306,user="root" ,passwd="root",db="woaiwojia",charset='utf8')

cursor = db.cursor()

for num in range(1,81):

    url = "https://sh.5i5j.com/zufang/o8r1u1n"+str(num)+"/"

    time.sleep(10)

    strhtml = requests.get(url)

    fanlist = BeautifulSoup(strhtml.text,"lxml")

    sthtml = fanlist.find_all("ul",{"class":"pList"})

    for ul in fanlist.find_all("ul",{"class":"pList"}):

        for li in ul.find_all(name="li"):

            for div in li.find_all("div",{"class":"listCon"}):

                xiaoqu = div.h3.a.string

                detailUrl = "https://sh.5i5j.com"+div.h3.a.attrs['href']

                detailhtml = requests.get(detailUrl)

                detail = BeautifulSoup(detailhtml.text,"lxml")

                jinjirenlist =detail.find_all("div",{"id":"housebroker"})

                for div1 in  div.find_all("div",{"class":"listX"}):

                    area = div1.find_all("p")[0].text

                    community = div1.find_all("p")[1].text

                    hot = div1.find_all("p")[2].text

                    price = div1.find_all("div",{"class":"jia"})[0].p.strong.string

                    for uldiv in detail.find_all("div",{"id":"housebroker"}):

                        for  ul in uldiv.find_all("ul"):

                            lxrphone = ul.h3.string+ul.label.string

                            sql = "insert into zufang(area,xiaoqu,community,hot,price,lxrphone) VALUES  ('%s','%s','%s','%s','%s','%s');" % (area, xiaoqu,community,hot,price,lxrphone)

                    try:

                        cursor.execute(sql)

                        db.commit()

                    except:

                        print('插入失败')

有什么问题或者建议可以评论与我进行交流

利用BeautifulSoup爬去我爱我家的租房数据的更多相关文章

python爬虫：利用BeautifulSoup爬取链家深圳二手房首页的详细信息
1.问题描述: 爬取链家深圳二手房的详细信息,并将爬取的数据存储到Excel表 2.思路分析: 发送请求--获取数据--解析数据--存储数据 1.目标网址:https://sz.lianjia.com ...
Python爬虫之利用BeautifulSoup爬取豆瓣小说（二）——回车分段打印小说信息
在上一篇文章中,我主要是设置了代理IP,虽然得到了相关的信息,但是打印出来的信息量有点多,要知道每打印一页,15个小说的信息全部会显示而过,有时因为屏幕太小,无法显示全所有的小说信息,那么,在这篇文章 ...
爬虫--scrapy+redis分布式爬取58同城北京全站租房数据
作业需求: 1.基于Spider或者CrawlSpider进行租房信息的爬取 2.本机搭建分布式环境对租房信息进行爬取 3.搭建多台机器的分布式环境,多台机器同时进行租房数据爬取建议:用Pychar ...
Python 利用 BeautifulSoup 爬取网站获取新闻流
0. 引言介绍下 Python 用 Beautiful Soup 周期性爬取 xxx 网站获取新闻流: 图 1 项目介绍 1. 开发环境 Python: 3.6.3 BeautifulSoup: ...
Python爬虫之利用BeautifulSoup爬取豆瓣小说（一）——设置代理IP
自己写了一个爬虫爬取豆瓣小说,后来为了应对请求不到数据,增加了请求的头部信息headers,为了应对豆瓣服务器的反爬虫机制:防止请求频率过快而造成“403 forbidden”,乃至封禁本机ip的情况 ...
pyhton 爬虫爬去吾爱精品软件的信息并写入excel
2018的最后一天了,感觉今年有得有失,这里就不再浪费时间了,愿2019万事如意之前的爬虫信息下载后只写入txt文档,想到了以后工作加入有特殊需求,趁放假有时间将这写数据写入excel表格以吾爱精 ...
<爬虫>利用BeautifulSoup爬取百度百科虚拟人物资料存入Mysql数据库
网页情况: 代码: import requests from requests.exceptions import RequestException from bs4 import Beautiful ...
利用Python爬去囧网福利(多线程、urllib、request)
import os; import urllib.request; import re; import threading;# 多线程 from urllib.error import URLErro ...
Python爬虫之利用BeautifulSoup爬取豆瓣小说（三）——将小说信息写入文件
#-*-coding:utf-8-*- import urllib2 from bs4 import BeautifulSoup class dbxs: def __init__(self): sel ...

随机推荐

linux系统查找大文件脚本
每次遇到服务器磁盘满,都会很苦恼,但有了下面两种方法就可以轻松找到机器中的大文件了, 第一种:du -sh du -sh 当前目录下个文件或目录的大小: du -sh * 显示前10个占用空间最大的文 ...
LeetCode算法题-Rectangle Overlap（Java实现）
这是悦乐书的第325次更新,第348篇原创 01 看题和准备今天介绍的是LeetCode算法题中Easy级别的第195题(顺位题号是836).矩形表示为数组[x1,y1,x2,y2],其中(x1,y ...
第十届山东省acm省赛补题（2）
http://acm.zju.edu.cn/onlinejudge/showProblem.do?problemCode=4124 L Median Time Limit: 1 Second ...
.net core 学习小结之配置介绍（config）以及热更新
命令行的配置 var settings = new Dictionary<string, string>{ { "name","cyao"}, {& ...
Centos7安装protobuf3.6.1
简介最近学习go语言,需要安装protobuf,但是网上的教程很多都不太适用于centos7 的系统.现在总结下protobuf在centos7下的安装教程. protobuf是Google开发出来 ...
Field baseMapper in com.baomidou.mybatisplus.extension.service.impl.ServiceImpl required a single bean, but xx were found:
在学习使用 mybatis-plus 时,遇到一个奇怪的异常如代码一: 代码一: Error starting ApplicationContext. To display the conditi ...
JS跨域--window.name
JS跨域--window.name:https://www.jianshu.com/p/43ff69d076e3
服务器上部署django项目流程？
1. 简单粗暴项目开发完毕,在部署之前需要再配置文件中将 ALLOWED_HOSTS配置设置为:当前服务器IP或*,如: ALLOWED_HOSTS = ["*",] 然后将源码 ...
大div中，三个小div水平居中
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8&quo ...
mongodb导出导入数据
在使用mongodump导出单个表的时候,遇到了一个错误 # mongodump --host xxx --port 27017 --username 'admin' -p '123456' -d 数 ...

利用BeautifulSoup爬去我爱我家的租房数据

利用BeautifulSoup爬去我爱我家的租房数据的更多相关文章

随机推荐

热门专题