python pyquery 基本用法
1.安装方法
pip install pyquery
2.引用方法
from pyquery import PyQuery as pq
3.简介
pyquery 是类型jquery 的一个专供python使用的html解析的库,使用方法类似bs4。
4.使用方法
4.1 初始化方法:
from pyquery import PyQuery as pq
doc =pq(html) #解析html字符串
doc =pq("http://news.baidu.com/") #解析网页
doc =pq("./a.html") #解析html 文本
4.2 基本CSS选择器
from pyquery import PyQuery as pq
html = '''
<div id="wrap">
<ul class="s_from">
asdasd
<link href="http://asda.com">asdadasdad12312</link>
<link href="http://asda1.com">asdadasdad12312</link>
<link href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
'''
doc = pq(html)
print doc("#wrap .s_from link")
运行结果:
<link href="http://asda.com">asdadasdad12312</link>
<link href="http://asda1.com">asdadasdad12312</link>
<link href="http://asda2.com">asdadasdad12312</link>
#是查找id的标签 .是查找class 的标签 link 是查找link 标签 中间的空格表示里层
4.3 查找子元素
from pyquery import PyQuery as pq
html = '''
<div id="wrap">
<ul class="s_from">
asdasd
<link href="http://asda.com">asdadasdad12312</link>
<link href="http://asda1.com">asdadasdad12312</link>
<link href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
'''
#查找子元素
doc = pq(html)
items=doc("#wrap")
print(items)
print("类型为:%s"%type(items))
link = items.find('.s_from')
print(link)
link = items.children()
print(link)
运行结果:
<div id="wrap">
<ul class="s_from">
asdasd
<link href="http://asda.com">asdadasdad12312</link>
<link href="http://asda1.com">asdadasdad12312</link>
<link href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
类型为:<class 'pyquery.pyquery.PyQuery'>
<ul class="s_from">
asdasd
<link href="http://asda.com">asdadasdad12312</link>
<link href="http://asda1.com">asdadasdad12312</link>
<link href="http://asda2.com">asdadasdad12312</link>
</ul> <ul class="s_from">
asdasd
<link href="http://asda.com">asdadasdad12312</link>
<link href="http://asda1.com">asdadasdad12312</link>
<link href="http://asda2.com">asdadasdad12312</link>
</ul>
根据运行结果可以发现返回结果类型为pyquery,并且find方法和children 方法都可以获取里层标签
4.4查找父元素
from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link href="http://asda.com">asdadasdad12312</link>
<link href="http://asda1.com">asdadasdad12312</link>
<link href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
''' doc = pq(html)
items=doc(".s_from")
print(items)
#查找父元素
parent_href=items.parent()
print(parent_href)
运行结果:
<ul class="s_from">
asdasd
<link href="http://asda.com">asdadasdad12312</link>
<link href="http://asda1.com">asdadasdad12312</link>
<link href="http://asda2.com">asdadasdad12312</link>
</ul> <div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link href="http://asda.com">asdadasdad12312</link>
<link href="http://asda1.com">asdadasdad12312</link>
<link href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
parent可以查找出外层标签包括的内容,与之类似的还有parents,可以获取所有外层节点
4.5 查找兄弟元素
from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link class='active1 a123' href="http://asda.com">asdadasdad12312</link>
<link class='active2' href="http://asda1.com">asdadasdad12312</link>
<link class='movie1' href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
''' doc = pq(html)
items=doc("link.active1.a123")
print(items)
#查找兄弟元素
siblings_href=items.siblings()
print(siblings_href)
运行结果:
<link class="active1 a123" href="http://asda.com">asdadasdad12312</link> <link class="active2" href="http://asda1.com">asdadasdad12312</link>
<link class="movie1" href="http://asda2.com">asdadasdad12312</link>
根据运行结果可以看出,siblings 返回了同级的其他标签
结论:子元素查找,父元素查找,兄弟元素查找,这些方法返回的结果类型都是pyquery类型,可以针对结果再次进行选择
4.6 遍历查找结果
from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link class='active1 a123' href="http://asda.com">asdadasdad12312</link>
<link class='active2' href="http://asda1.com">asdadasdad12312</link>
<link class='movie1' href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
''' doc = pq(html)
its=doc("link").items()
for it in its:
print(it)
运行结果:
<link class="active1 a123" href="http://asda.com">asdadasdad12312</link> <link class="active2" href="http://asda1.com">asdadasdad12312</link> <link class="movie1" href="http://asda2.com">asdadasdad12312</link>
4.7获取属性信息
from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link class='active1 a123' href="http://asda.com">asdadasdad12312</link>
<link class='active2' href="http://asda1.com">asdadasdad12312</link>
<link class='movie1' href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
''' doc = pq(html)
its=doc("link").items()
for it in its:
print(it.attr('href'))
print(it.attr.href)
运行结果:
http://asda.com
http://asda.com
http://asda1.com
http://asda1.com
http://asda2.com
http://asda2.com
4.8 获取文本
from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link class='active1 a123' href="http://asda.com">asdadasdad12312</link>
<link class='active2' href="http://asda1.com">asdadasdad12312</link>
<link class='movie1' href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
''' doc = pq(html)
its=doc("link").items()
for it in its:
print(it.text())
运行结果
asdadasdad12312
asdadasdad12312
asdadasdad12312
4.9 获取 HTML信息
from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link class='active1 a123' href="http://asda.com"><a>asdadasdad12312</a></link>
<link class='active2' href="http://asda1.com">asdadasdad12312</link>
<link class='movie1' href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
''' doc = pq(html)
its=doc("link").items()
for it in its:
print(it.html())
运行结果:
<a>asdadasdad12312</a>
asdadasdad12312
asdadasdad12312
5.常用DOM操作
5.1 addClass removeClass
添加,移除class标签
from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link class='active1 a123' href="http://asda.com"><a>asdadasdad12312</a></link>
<link class='active2' href="http://asda1.com">asdadasdad12312</link>
<link class='movie1' href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
''' doc = pq(html)
its=doc("link").items()
for it in its:
print("添加:%s"%it.addClass('active1'))
print("移除:%s"%it.removeClass('active1'))
运行结果
添加:<link class="active1 a123" href="http://asda.com"><a>asdadasdad12312</a></link> 移除:<link class="a123" href="http://asda.com"><a>asdadasdad12312</a></link> 添加:<link class="active2 active1" href="http://asda1.com">asdadasdad12312</link> 移除:<link class="active2" href="http://asda1.com">asdadasdad12312</link> 添加:<link class="movie1 active1" href="http://asda2.com">asdadasdad12312</link> 移除:<link class="movie1" href="http://asda2.com">asdadasdad12312</link>
需要注意的是已经存在的class标签不会继续添加
5.2 attr css
attr 为获取/修改属性 css 添加style属性
from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link class='active1 a123' href="http://asda.com"><a>asdadasdad12312</a></link>
<link class='active2' href="http://asda1.com">asdadasdad12312</link>
<link class='movie1' href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
''' doc = pq(html)
its=doc("link").items()
for it in its:
print("修改:%s"%it.attr('class','active'))
print("添加:%s"%it.css('font-size','14px'))
运行结果
C:\Python27\python.exe D:/test_his/test_re_1.py
修改:<link class="active" href="http://asda.com"><a>asdadasdad12312</a></link> 添加:<link class="active" href="http://asda.com" style="font-size: 14px"><a>asdadasdad12312</a></link> 修改:<link class="active" href="http://asda1.com">asdadasdad12312</link> 添加:<link class="active" href="http://asda1.com" style="font-size: 14px">asdadasdad12312</link> 修改:<link class="active" href="http://asda2.com">asdadasdad12312</link> 添加:<link class="active" href="http://asda2.com" style="font-size: 14px">asdadasdad12312</link>
attr css操作直接修改对象的
5.3 remove
remove 移除标签
from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link class='active1 a123' href="http://asda.com"><a>asdadasdad12312</a></link>
<link class='active2' href="http://asda1.com">asdadasdad12312</link>
<link class='movie1' href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
''' doc = pq(html)
its=doc("div")
print('移除前获取文本结果:\n%s'%its.text())
it=its.remove('ul')
print('移除后获取文本结果:\n%s'%it.text())
运行结果
移除前获取文本结果:
hello nihao
asdasd
asdadasdad12312
asdadasdad12312
asdadasdad12312
移除后获取文本结果:
hello nihao
其他DOM方法参考:
http://pyquery.readthedocs.io/en/latest/api.html
6.伪类选择器
from pyquery import PyQuery as pq
html = '''
<div href="wrap">
hello nihao
<ul class="s_from">
asdasd
<link class='active1 a123' href="http://asda.com"><a>helloasdadasdad12312</a></link>
<link class='active2' href="http://asda1.com">asdadasdad12312</link>
<link class='movie1' href="http://asda2.com">asdadasdad12312</link>
</ul>
</div>
''' doc = pq(html)
its=doc("link:first-child")
print('第一个标签:%s'%its)
its=doc("link:last-child")
print('最后一个标签:%s'%its)
its=doc("link:nth-child(2)")
print('第二个标签:%s'%its)
its=doc("link:gt(0)") #从零开始
print("获取0以后的标签:%s"%its)
its=doc("link:nth-child(2n-1)")
print("获取奇数标签:%s"%its)
its=doc("link:contains('hello')")
print("获取文本包含hello的标签:%s"%its)
运行结果
第一个标签:<link class="active1 a123" href="http://asda.com"><a>helloasdadasdad12312</a></link> 最后一个标签:<link class="movie1" href="http://asda2.com">asdadasdad12312</link> 第二个标签:<link class="active2" href="http://asda1.com">asdadasdad12312</link> 获取0以后的标签:<link class="active2" href="http://asda1.com">asdadasdad12312</link>
<link class="movie1" href="http://asda2.com">asdadasdad12312</link> 获取奇数标签:<link class="active1 a123" href="http://asda.com"><a>helloasdadasdad12312</a></link>
<link class="movie1" href="http://asda2.com">asdadasdad12312</link> 获取文本包含hello的标签:<link class="active1 a123" href="http://asda.com"><a>helloasdadasdad12312</a></link>
更多css选择器可以查看:
http://www.w3school.com.cn/css/index.asp
python pyquery 基本用法的更多相关文章
- Python爬虫利器六之PyQuery的用法
前言 你是否觉得 XPath 的用法多少有点晦涩难记呢? 你是否觉得 BeautifulSoup 的语法多少有些悭吝难懂呢? 你是否甚至还在苦苦研究正则表达式却因为少些了一个点而抓狂呢? 你是否已经有 ...
- Python回调函数用法实例详解
本文实例讲述了Python回调函数用法.分享给大家供大家参考.具体分析如下: 一.百度百科上对回调函数的解释: 回调函数就是一个通过函数指针调用的函数.如果你把函数的指针(地址)作为参数传递给另一个函 ...
- day01-day04总结- Python 数据类型及其用法
Python 数据类型及其用法: 本文总结一下Python中用到的各种数据类型,以及如何使用可以使得我们的代码变得简洁. 基本结构 我们首先要看的是几乎任何语言都具有的数据类型,包括字符串.整型.浮点 ...
- 【Python】关于Python有意思的用法
开一篇文章,记录关于Python有意思的用法,不断更新 1.Python树的遍历 def sum(t): tmp=0 for k in t: if not isinstance(k,list): tm ...
- python中xrange用法分析
本文实例讲述了python中xrange用法.分享给大家供大家参考.具体如下: 先来看如下示例: >>> x=xrange(0,8) >>> print x xra ...
- 浅谈Python在信息学竞赛中的运用及Python的基本用法
浅谈Python在信息学竞赛中的运用及Python的基本用法 前言 众所周知,Python是一种非常实用的语言.但是由于其运算时的低效和解释型编译,在信息学竞赛中并不用于完成算法程序.但正如LRJ在& ...
- python scapy的用法之ARP主机扫描和ARP欺骗
python scapy的用法之ARP主机扫描和ARP欺骗 目录: 1.scapy介绍 2.安装scapy 3.scapy常用 4.ARP主机扫描 5.ARP欺骗 一.scapy介绍 scapy是一个 ...
- python函数的用法
python函数的用法 目录: 1.定义.使用函数 1.函数定义:def 2.函数调用:例:myprint() 3.函数可以当作一个值赋值给一个变量 例:a=myprint() a() 4.写r ...
- python 中@ 的用法【转】
这只是我的个人理解: 在Python的函数中偶尔会看到函数定义的上一行有@functionName的修饰,当解释器读到@的这样的修饰符之后,会先解析@后的内容,直接就把@下一行的函数或者类作为@后边的 ...
随机推荐
- MFC程序出现uafxcwd.lib(afxmem.obj) : error LNK2005: "void * __cdecl operator new(unsigned int)解决办法
在同一个地方摔倒两次之后,决定记录下来这个东西. 问题 1>uafxcwd.lib(afxmem.obj) : error LNK2005: "void * __cdecl opera ...
- How to: Initialize Business Objects with Default Property Values in Entity Framework 如何:在EF中用默认属性值初始化业务对象
When designing business classes, a common task is to ensure that a newly created business object is ...
- idea使用maven中的tomcat插件开启服务出现java.net.BindException: Address already in use: JVM_Bind :8080错误原因
[INFO] create webapp with contextPath: /maven_web 五月 11, 2019 6:05:26 下午 org.apache.coyote.AbstractP ...
- CSS基础属性介绍
css属性分类介绍 css属性分类介绍 CSS分类目录 文本/字体/颜色 文本相关 字体相关 颜色相关 背景相关 大小/布局 大小属性 margin 外边距 padding 内边距 border 边框 ...
- LeetCode刷题191124
博主渣渣一枚,刷刷leetcode给自己瞅瞅,大神们由更好方法还望不吝赐教.题目及解法来自于力扣(LeetCode),传送门. 算法: 给出一个无重叠的 ,按照区间起始端点排序的区间列表. 在列表中插 ...
- 配置 yum 源的两种方法
配置 yum 源的两种方法 由于 redhat的yum在线更新是收费的,如果没有注册的话不能使用,如果要使用,需将redhat的yum卸载后,重启安装,再配置其他源,以下为详细过程: 1.删除red ...
- SQLSERVER预读逻辑读物理读
预读:用估计信息,去硬盘读取数据到缓存.预读100次,也就是估计将要从硬盘中读取了100页数据到缓存. 物理读:查询计划生成好以后,如果缓存缺少所需要的数据,让缓存再次去读硬盘.物理读10页,从硬盘中 ...
- C# 类库项目 无法创建 “资源字典” 文件
1.接触WPF有两个月时间了,准备自己写一个样式库,在vs新建 类库项目后无法创建资源字典. 2.解决办法: 打开项目工程文件 ( project.csproj) 在 <Proper ...
- java.lang.IllegalStateException: getOutputStream() has already been called 解决办法
因为在使用的时候没有使用@ResponseBody这个注解,所以才会报上面的异常
- Linux下安装Redis以及遇到的问题
参考链接:https://www.cnblogs.com/zdd-java/p/10288734.html https://www.cnblogs.com/uncleyong/p/9882843.ht ...