1.安装方法

pip install pyquery

2.引用方法

from pyquery import PyQuery as pq

3.简介

 pyquery 是类型jquery 的一个专供python使用的html解析的库,使用方法类似bs4。

4.使用方法

  4.1 初始化方法:

from pyquery import PyQuery as pq
doc =pq(html) #解析html字符串
doc =pq("http://news.baidu.com/") #解析网页
doc =pq("./a.html") #解析html 文本

4.2 基本CSS选择器

from pyquery import PyQuery as pq
html = '''
<div id="wrap">
<ul class="s_from">
asdasd
<link href="http://asda.com">asdadasdad12312</link>
<link href="http://asda1.com">asdadasdad12312</link>
<link href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
'''
doc = pq(html)
print doc("#wrap .s_from link")

  运行结果:

<link href="http://asda.com">asdadasdad12312</link>
<link href="http://asda1.com">asdadasdad12312</link>
<link href="http://asda2.com">asdadasdad12312</link>

  #是查找id的标签  .是查找class 的标签  link 是查找link 标签 中间的空格表示里层

  4.3 查找子元素

from pyquery import PyQuery as pq
html = '''
<div id="wrap">
<ul class="s_from">
asdasd
<link href="http://asda.com">asdadasdad12312</link>
<link href="http://asda1.com">asdadasdad12312</link>
<link href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
'''
#查找子元素
doc = pq(html)
items=doc("#wrap")
print(items)
print("类型为:%s"%type(items))
link = items.find('.s_from')
print(link)
link = items.children()
print(link)

  运行结果:

<div id="wrap">
<ul class="s_from">
asdasd
<link href="http://asda.com">asdadasdad12312</link>
<link href="http://asda1.com">asdadasdad12312</link>
<link href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
类型为:<class 'pyquery.pyquery.PyQuery'>
<ul class="s_from">
asdasd
<link href="http://asda.com">asdadasdad12312</link>
<link href="http://asda1.com">asdadasdad12312</link>
<link href="http://asda2.com">asdadasdad12312</link>
</ul> <ul class="s_from">
asdasd
<link href="http://asda.com">asdadasdad12312</link>
<link href="http://asda1.com">asdadasdad12312</link>
<link href="http://asda2.com">asdadasdad12312</link>
</ul>

  根据运行结果可以发现返回结果类型为pyquery,并且find方法和children 方法都可以获取里层标签

  4.4查找父元素

from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link href="http://asda.com">asdadasdad12312</link>
<link href="http://asda1.com">asdadasdad12312</link>
<link href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
''' doc = pq(html)
items=doc(".s_from")
print(items)
#查找父元素
parent_href=items.parent()
print(parent_href)

  运行结果:

<ul class="s_from">
asdasd
<link href="http://asda.com">asdadasdad12312</link>
<link href="http://asda1.com">asdadasdad12312</link>
<link href="http://asda2.com">asdadasdad12312</link>
</ul> <div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link href="http://asda.com">asdadasdad12312</link>
<link href="http://asda1.com">asdadasdad12312</link>
<link href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>

  parent可以查找出外层标签包括的内容,与之类似的还有parents,可以获取所有外层节点

  4.5 查找兄弟元素

from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link class='active1 a123' href="http://asda.com">asdadasdad12312</link>
<link class='active2' href="http://asda1.com">asdadasdad12312</link>
<link class='movie1' href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
''' doc = pq(html)
items=doc("link.active1.a123")
print(items)
#查找兄弟元素
siblings_href=items.siblings()
print(siblings_href)

  运行结果:

<link class="active1 a123" href="http://asda.com">asdadasdad12312</link>

<link class="active2" href="http://asda1.com">asdadasdad12312</link>
<link class="movie1" href="http://asda2.com">asdadasdad12312</link>

  根据运行结果可以看出,siblings 返回了同级的其他标签

  结论:子元素查找,父元素查找,兄弟元素查找,这些方法返回的结果类型都是pyquery类型,可以针对结果再次进行选择

  4.6 遍历查找结果

from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link class='active1 a123' href="http://asda.com">asdadasdad12312</link>
<link class='active2' href="http://asda1.com">asdadasdad12312</link>
<link class='movie1' href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
''' doc = pq(html)
its=doc("link").items()
for it in its:
print(it)

  运行结果:

<link class="active1 a123" href="http://asda.com">asdadasdad12312</link>

<link class="active2" href="http://asda1.com">asdadasdad12312</link>

<link class="movie1" href="http://asda2.com">asdadasdad12312</link>

  4.7获取属性信息

from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link class='active1 a123' href="http://asda.com">asdadasdad12312</link>
<link class='active2' href="http://asda1.com">asdadasdad12312</link>
<link class='movie1' href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
''' doc = pq(html)
its=doc("link").items()
for it in its:
print(it.attr('href'))
print(it.attr.href)

  运行结果:

http://asda.com
http://asda.com
http://asda1.com
http://asda1.com
http://asda2.com
http://asda2.com

  4.8 获取文本

from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link class='active1 a123' href="http://asda.com">asdadasdad12312</link>
<link class='active2' href="http://asda1.com">asdadasdad12312</link>
<link class='movie1' href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
''' doc = pq(html)
its=doc("link").items()
for it in its:
print(it.text())

  运行结果

asdadasdad12312
asdadasdad12312
asdadasdad12312

  4.9 获取 HTML信息

from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link class='active1 a123' href="http://asda.com"><a>asdadasdad12312</a></link>
<link class='active2' href="http://asda1.com">asdadasdad12312</link>
<link class='movie1' href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
''' doc = pq(html)
its=doc("link").items()
for it in its:
print(it.html())

  运行结果:

<a>asdadasdad12312</a>
asdadasdad12312
asdadasdad12312

5.常用DOM操作

  5.1 addClass removeClass

  添加,移除class标签

from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link class='active1 a123' href="http://asda.com"><a>asdadasdad12312</a></link>
<link class='active2' href="http://asda1.com">asdadasdad12312</link>
<link class='movie1' href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
''' doc = pq(html)
its=doc("link").items()
for it in its:
print("添加:%s"%it.addClass('active1'))
print("移除:%s"%it.removeClass('active1'))

  运行结果

添加:<link class="active1 a123" href="http://asda.com"><a>asdadasdad12312</a></link>

移除:<link class="a123" href="http://asda.com"><a>asdadasdad12312</a></link>

添加:<link class="active2 active1" href="http://asda1.com">asdadasdad12312</link>

移除:<link class="active2" href="http://asda1.com">asdadasdad12312</link>

添加:<link class="movie1 active1" href="http://asda2.com">asdadasdad12312</link>

移除:<link class="movie1" href="http://asda2.com">asdadasdad12312</link>

  需要注意的是已经存在的class标签不会继续添加

  5.2 attr css

  attr 为获取/修改属性 css 添加style属性

from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link class='active1 a123' href="http://asda.com"><a>asdadasdad12312</a></link>
<link class='active2' href="http://asda1.com">asdadasdad12312</link>
<link class='movie1' href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
''' doc = pq(html)
its=doc("link").items()
for it in its:
print("修改:%s"%it.attr('class','active'))
print("添加:%s"%it.css('font-size','14px'))

  运行结果

C:\Python27\python.exe D:/test_his/test_re_1.py
修改:<link class="active" href="http://asda.com"><a>asdadasdad12312</a></link> 添加:<link class="active" href="http://asda.com" style="font-size: 14px"><a>asdadasdad12312</a></link> 修改:<link class="active" href="http://asda1.com">asdadasdad12312</link> 添加:<link class="active" href="http://asda1.com" style="font-size: 14px">asdadasdad12312</link> 修改:<link class="active" href="http://asda2.com">asdadasdad12312</link> 添加:<link class="active" href="http://asda2.com" style="font-size: 14px">asdadasdad12312</link>

  attr css操作直接修改对象的

  5.3 remove

  remove 移除标签

from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link class='active1 a123' href="http://asda.com"><a>asdadasdad12312</a></link>
<link class='active2' href="http://asda1.com">asdadasdad12312</link>
<link class='movie1' href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
''' doc = pq(html)
its=doc("div")
print('移除前获取文本结果:\n%s'%its.text())
it=its.remove('ul')
print('移除后获取文本结果:\n%s'%it.text())

  运行结果

移除前获取文本结果:
hello nihao
asdasd
asdadasdad12312
asdadasdad12312
asdadasdad12312
移除后获取文本结果:
hello nihao

  其他DOM方法参考:

  http://pyquery.readthedocs.io/en/latest/api.html

6.伪类选择器

from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link class='active1 a123' href="http://asda.com"><a>helloasdadasdad12312</a></link>
<link class='active2' href="http://asda1.com">asdadasdad12312</link>
<link class='movie1' href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
''' doc = pq(html)
its=doc("link:first-child")
print('第一个标签:%s'%its)
its=doc("link:last-child")
print('最后一个标签:%s'%its)
its=doc("link:nth-child(2)")
print('第二个标签:%s'%its)
its=doc("link:gt(0)") #从零开始
print("获取0以后的标签:%s"%its)
its=doc("link:nth-child(2n-1)")
print("获取奇数标签:%s"%its)
its=doc("link:contains('hello')")
print("获取文本包含hello的标签:%s"%its)

  运行结果

第一个标签:<link class="active1 a123" href="http://asda.com"><a>helloasdadasdad12312</a></link>

最后一个标签:<link class="movie1" href="http://asda2.com">asdadasdad12312</link>

第二个标签:<link class="active2" href="http://asda1.com">asdadasdad12312</link>

获取0以后的标签:<link class="active2" href="http://asda1.com">asdadasdad12312</link>
<link class="movie1" href="http://asda2.com">asdadasdad12312</link> 获取奇数标签:<link class="active1 a123" href="http://asda.com"><a>helloasdadasdad12312</a></link>
<link class="movie1" href="http://asda2.com">asdadasdad12312</link> 获取文本包含hello的标签:<link class="active1 a123" href="http://asda.com"><a>helloasdadasdad12312</a></link>

  更多css选择器可以查看:

  http://www.w3school.com.cn/css/index.asp

  

python pyquery 基本用法的更多相关文章

  1. Python爬虫利器六之PyQuery的用法

    前言 你是否觉得 XPath 的用法多少有点晦涩难记呢? 你是否觉得 BeautifulSoup 的语法多少有些悭吝难懂呢? 你是否甚至还在苦苦研究正则表达式却因为少些了一个点而抓狂呢? 你是否已经有 ...

  2. Python回调函数用法实例详解

    本文实例讲述了Python回调函数用法.分享给大家供大家参考.具体分析如下: 一.百度百科上对回调函数的解释: 回调函数就是一个通过函数指针调用的函数.如果你把函数的指针(地址)作为参数传递给另一个函 ...

  3. day01-day04总结- Python 数据类型及其用法

    Python 数据类型及其用法: 本文总结一下Python中用到的各种数据类型,以及如何使用可以使得我们的代码变得简洁. 基本结构 我们首先要看的是几乎任何语言都具有的数据类型,包括字符串.整型.浮点 ...

  4. 【Python】关于Python有意思的用法

    开一篇文章,记录关于Python有意思的用法,不断更新 1.Python树的遍历 def sum(t): tmp=0 for k in t: if not isinstance(k,list): tm ...

  5. python中xrange用法分析

    本文实例讲述了python中xrange用法.分享给大家供大家参考.具体如下: 先来看如下示例: >>> x=xrange(0,8) >>> print x xra ...

  6. 浅谈Python在信息学竞赛中的运用及Python的基本用法

    浅谈Python在信息学竞赛中的运用及Python的基本用法 前言 众所周知,Python是一种非常实用的语言.但是由于其运算时的低效和解释型编译,在信息学竞赛中并不用于完成算法程序.但正如LRJ在& ...

  7. python scapy的用法之ARP主机扫描和ARP欺骗

    python scapy的用法之ARP主机扫描和ARP欺骗 目录: 1.scapy介绍 2.安装scapy 3.scapy常用 4.ARP主机扫描 5.ARP欺骗 一.scapy介绍 scapy是一个 ...

  8. python函数的用法

    python函数的用法 目录: 1.定义.使用函数 1.函数定义:def 2.函数调用:例:myprint() 3.函数可以当作一个值赋值给一个变量 例:a=myprint()    a() 4.写r ...

  9. python 中@ 的用法【转】

    这只是我的个人理解: 在Python的函数中偶尔会看到函数定义的上一行有@functionName的修饰,当解释器读到@的这样的修饰符之后,会先解析@后的内容,直接就把@下一行的函数或者类作为@后边的 ...

随机推荐

  1. .Net Core 使用 NPOI 导入Excel

    由于之前在网上查阅一些资料发现总是不能编译通过,不能正常使用,现把能正常使用的代码贴出: /// <summary> /// Excel导入帮助类 /// </summary> ...

  2. pandas 初识(五)

    1. 如何实现把一个属性(列)拆分成多列,产生pivot,形成向量信息,计算相关性? 例: class_ timestamp count 0 10 2019-01-20 13:23:00 1 1 10 ...

  3. Git实战指南----跟着haibiscuit学Git(第九篇)

    笔名:  haibiscuit 博客园: https://www.cnblogs.com/haibiscuit/ Git地址: https://github.com/haibiscuit?tab=re ...

  4. Windows系统Git安装教程(详解Git安装过程)

    Windows系统Git安装教程(详解Git安装过程)   今天更换电脑系统,需要重新安装Git,正好做个记录,希望对第一次使用的博友能有所帮助! 获取Git安装程序   到Git官网下载,网站地址: ...

  5. http模块

    1.引入http模块 const http = require('http') 2.创建node服务器 在创建node服务器的时候需要使用http模块中的http.creatServer()方法来进行 ...

  6. 强大的django-debug-toolbar,django项目性能分析工具

    强大的django-debug-toolbar,django项目性能分析工具 给大家介绍一个用于django中debug模式下查看网站性能等其他信息的插件django-debug-toolbar 首先 ...

  7. element form 对单个字段做验证

    this.$refs[formName].validateField('phone', phoneError => { //验证手机号码是否正确 if (!phoneError) { conso ...

  8. Feign Ribbon Hystrix 三者关系 | 史上最全, 深度解析

    史上最全: Feign Ribbon Hystrix 三者关系 | 深度解析 疯狂创客圈 Java 分布式聊天室[ 亿级流量]实战系列之 -25[ 博客园 总入口 ] 前言 疯狂创客圈(笔者尼恩创建的 ...

  9. LeetCode 49: 字母异位词分组 Group Anagrams

    LeetCode 49: 字母异位词分组 Group Anagrams 题目: 给定一个字符串数组,将字母异位词组合在一起.字母异位词指字母相同,但排列不同的字符串. Given an array o ...

  10. Java题库——Chapter1 计算机、程序和Java概述

    1)________ is the physical aspect of the computer that can be seen. A)Hardware B) Operating system C ...