使用requests+pyquery爬取dd373地下城跨五最新商品信息

废话不多说直接上代码：

　　可以使用openpyel库对爬取的信息写入Execl表格中代码我就不上传了

import requests

from urllib.parse import urlencode

from requests import RequestException

from pyquery import PyQuery as pq

def open_sh():

    #获取dd373html信息

    headers = {

        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'

    }

    data = {

        "minPrice":333,

        "maxPrice":""

    }

    url = "https://www.dd373.com/s/rbg22w-x9kjbs-wwf11b-0-0-0-qquvn4-0-0-0-0-0-0-0-0.html?"+urlencode(data)

    try:

        response = requests.get(url,headers=headers)

        if response.status_code == 200:

            return response.text

        return None

    except RequestException:

        print("链接错误",url)

        return None

def doc_page(html):

    # 获取地下城账号信息

    doc = pq(html)

    content = doc("div.content")

    titleText = content.find(".box.money_ner").items()

    for items in titleText:

        product = {

            "地址":items.find("a.titleText").attr("href"),

            "账号信息":items.find("a.titleText").text(),

            "价格":items.find("div.money_text strong span").text()+'元',

            "是否存在":items.find("div.num.left").text()

        }

        print(product)

def page_sh(pagebox):

    # 循环遍历所有分页

    headers = {

        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'

    }

    data = {

        "minPrice": 333,

        "maxPrice": ""

    }

    for page in range(1,pagebox+1):

        url = "https://www.dd373.com/s/rbg22w-x9kjbs-wwf11b-0-0-0-qquvn4-0-0-0-0-0-0-0-%s.html?%s"%(page,urlencode(data))

        try:

            page1 = page_currentpage(url)

            if page1==page:

                response = requests.get(url, headers=headers)

                if response.status_code == 200:

                    doc_page(response.text)

        except Exception as e:

            raise e

def page_currentpage(html):

    # 获取分页中被高亮的页数用于判断是否在 当前页面

    doc = pq(html)

    currentpage= doc("a.nb.currentpage").text()

    return int(currentpage)

def page_box(html):

    # 获取所有的页码

    doc = pq(html)

    pagebox = doc(".pagebox.clear ul li.yeshu").text()[9:-1]

    return int(pagebox)

def main():

    html = open_sh()

    page = page_box(html)

    page_sh(page)

if __name__ == "__main__":

    main()

使用requests+pyquery爬取dd373地下城跨五最新商品信息的更多相关文章

利用Python爬虫爬取指定天猫店铺全店商品信息
本编博客是关于爬取天猫店铺中指定店铺的所有商品基础信息的爬虫,爬虫运行只需要输入相应店铺的域名名称即可,信息将以csv表格的形式保存,可以单店爬取也可以增加一个循环进行同时爬取. 源码展示首先还是完 ...
[实战演练]python3使用requests模块爬取页面内容
本文摘要: 1.安装pip 2.安装requests模块 3.安装beautifulsoup4 4.requests模块浅析 + 发送请求 + 传递URL参数 + 响应内容 + 获取网页编码 + 获取 ...
requests+正则爬取豆瓣图书
#requests+正则爬取豆瓣图书 import requests import re def get_html(url): headers = {'User-Agent':'Mozilla/5.0 ...
requests+正则表达式爬取ip
#requests+正则表达式爬取ip #findall方法,如果表达式中包含有子组,则会把子组单独返回出来,如果有多个子组,则会组合成元祖 import requests import re def ...
一起学爬虫——使用selenium和pyquery爬取京东商品列表
layout: article title: 一起学爬虫--使用selenium和pyquery爬取京东商品列表 mathjax: true --- 今天一起学起使用selenium和pyquery爬 ...
爬虫系列4：Requests+Xpath 爬取动态数据
爬虫系列4:Requests+Xpath 爬取动态数据 [抓取]:参考前文爬虫系列1:https://www.cnblogs.com/yizhiamumu/p/9451093.html [分页]:参 ...
爬虫系列2：Requests+Xpath 爬取租房网站信息
Requests+Xpath 爬取租房网站信息 [抓取]:参考前文爬虫系列1:https://www.cnblogs.com/yizhiamumu/p/9451093.html [分页]:参考前文 ...
爬虫系列1：Requests+Xpath 爬取豆瓣电影TOP
爬虫1:Requests+Xpath 爬取豆瓣电影TOP [抓取]:参考前文爬虫系列1:https://www.cnblogs.com/yizhiamumu/p/9451093.html [分页]: ...
PYTHON 爬虫笔记八:利用Requests+正则表达式爬取猫眼电影top100（实战项目一）
利用Requests+正则表达式爬取猫眼电影top100 目标站点分析流程框架爬虫实战使用requests库获取top100首页: import requests def get_one_pag ...

随机推荐

一、查看MVC4还是MVC5
一.查看MVC版本找到那个dll.属性.就可以看到版本二.MVC添加WebAPI Visual Studio 已向项目“Web”添加 ASP.NET Web API 2 的全部集合个依赖项. 项 ...
Netty 客户端断线重连
client 关闭后会执行 finally 代码块,可以在这里可以进行重连操作 public class NettyClient implements Runnable { private final ...
Azure DevOps to Azure AppServices
Azure DevOps is a complete solution for software development, from planning to building to deploymen ...
nginx springboot配置
1.下载安装nginx 2.nginx.conf文件修改参数上方是代理后的端口,代理的server.下方是需要代理的路径 3.windows 下操作指令启动直接点击Nginx目录下的nginx. ...
Educational Codeforces Round 63 (Rated for Div. 2) D. Beautiful Array 分类讨论连续递推dp
题意:给出一个数列和一个x 可以对数列一个连续的部分每个数乘以x 问该序列可以达到的最大连续序列和是多少思路: 不是所有区间题目都是线段树!!!!!! 这题其实是一个很简单的dp 使用的是分 ...
面向对象编程(OPP)
作者:狐狸家的鱼本文链接:面向对象编程 GitHub:sueRimn 面向对象编程(OPP) 具有灵活.代码可复用.高度模块化等特点,易维护和开发实例对象与new命令 1.对象对象是单个实物的抽 ...
prometheus 集群
思路一统一区域的监控目标,prometheus server两台监控相同的目标群体. 改变后上面这个变化对于监控目标端,会多出一倍的查询请求,但在一台prometheus server宕机的情况下 ...
localhost，127.x.x.x和 0.0.0.0区别
之前遇到过一件很纳闷的事,明明用webpack-dev-server监听了一个端口xx,用localhost:xx可以打开,但是依然可以在localhost:xx来启动另一个服务. 后来我看来了下we ...
野路子码农系列（3）plotly可视化的简单套路
又双叒叕要跟客户汇报了,图都准备好了吗?matplotlib出图嫌丑?那用用plotly吧,让你的图看上去经费爆炸~ P1 起因第一次接触plotly这个库是在我们做的一个列车信号数据挖掘的项目里, ...
kubernetes之flannel
kubernetes网络通信容器间的通信 pod内的容器通信(lo) Pod之间的通信 pod IP <-----> pod IP Pod与Service之间的通信 podIP ...

使用requests+pyquery爬取dd373地下城跨五最新商品信息

使用requests+pyquery爬取dd373地下城跨五最新商品信息的更多相关文章

随机推荐

热门专题