pyquery库的使用

pyquery标签选择

获取了所有的img标签（css选择器，你也可以换成不同的class和id）

 import requests

 import re

 from pyquery import PyQuery as pq

 headers={

     "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",

     "Accept-Encoding": "gzip, deflate",

     "Accept-Language": "zh-CN,zh;q=0.9",

     "Upgrade-Insecure-Requests": "",

     "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36"

   }

 response=requests.get('https://www.zhihu.com/question/35239964/answer/66644148',headers=headers,timeout=9)

 doc=pq(response.content)

 #css选择器

 a=doc('img')#<class 'pyquery.pyquery.PyQuery'>

 print(a)

url初始化（通过访问url得到html代码）

有了pyquery，你甚至不需要再使用requests来get网页

from pyquery import PyQuery as pq

headers={

    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",

    "Accept-Encoding": "gzip, deflate",

    "Accept-Language": "zh-CN,zh;q=0.9",

    "Upgrade-Insecure-Requests": "",

    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36"

  }

doc=pq(url='https://www.zhihu.com/question/35239964/answer/66644148',headers=headers)#直接初始化url得到源代码

print(doc('title').text())

文件初始化（通过文件得到html代码）

#文件初始化

from pyquery import PyQuery as pq

import requests

# 写文件

# headers={

#     "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",

#     "Accept-Encoding": "gzip, deflate",

#     "Accept-Language": "zh-CN,zh;q=0.9",

#     "Upgrade-Insecure-Requests": "1",

#     "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36"

#   }

# result=requests.get('https://www.zhihu.com/question/35239964/answer/66644148',headers=headers)

# # result=pq(url='https://www.zhihu.com/question/35239964/answer/66644148',headers=headers)

# print(type(result))

# with open("zhihu.html",'wb')as f:

#     f.write(result.content)

#读文件

doc=pq(filename='test.html')

print(doc('title'))

css选择器

doc('#content .list li')#得到的是所有的符合这种层级关系的li

查找元素

子元素

find（查内部的元素）

html=‘.....’

doc=py(html)

a=doc('.content')

b=a.find('img')查找.content内的img标签

children（和find一样）

html=‘.....’

doc=py(html)

a=doc('.content')

b=a.children('img')查找.content内的img标签

父元素

html=‘.....’

doc=py(html)

a=doc('.content')

b=a.parent()查找.content的父元素整体

html=‘.....’

doc=py(html)

a=doc('.content')

b=a.parents()遍历输出.content的所有祖先元素整体

当然也可以加上css选择器

html=‘.....’

doc=py(html)

a=doc('.content')

b=a.parents(‘.wrap’)查找.content的祖先节点中为.wrap的标签

兄弟元素

html=‘.....’

doc=py(html)

a=doc('.content')

b=a.siblings()#查找.content的同级标签，也可以加css选择器

遍历查找的元素

from pyquery import PyQuery as pq

headers={

    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",

    "Accept-Encoding": "gzip, deflate",

    "Accept-Language": "zh-CN,zh;q=0.9",

    "Upgrade-Insecure-Requests": "",

    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36"

  }

doc=pq(url='https://www.zhihu.com/question/35239964/answer/66644148',headers=headers)

a=doc('img').items()#得到可迭代对象

for i in a:

    print(i)

<img src="data:image/svg+xml;utf8,&lt;svg%20xmlns='http://www.w3.org/2000/svg'%20width='503'%20height='410'&gt;&lt;/svg&gt;" data-rawwidth="503" data-rawheight="410" class="origin_image zh-lightbox-thumb lazy" width="503" data-original="https://pic4.zhimg.com/d8d9743a16e6db3f36b19a3397469b1d_r.jpg" data-actualsrc="https://pic4.zhimg.com/50/d8d9743a16e6db3f36b19a3397469b1d_hd.jpg"/>

加入i是得到的上述的html元素

获取属性:jpgurl=i.attr('data-original')

获取文本：text=i.text()

获取HTML：html=i.html()//获取i里面的html元素

DOM操作

addclass(添加class)、removeclass（移除class）

attr： .attr('name':'userid')//添加或替换name属性

css： .css('height':'500px')//添加或替换style

remove：

http='''

<div class='wrap'>

helloworld

<p>this is p</p>

</div>'''

from pyquery import PyQuery as pq

doc=pq(http)

wrap=doc('.wrap')

print(wrap.text())#helloworld this is p

wrap.find('p').remove()

print(wrap.text())#helloworld

pyquery库的使用的更多相关文章

python爬虫从入门到放弃（七）之 PyQuery库的使用
PyQuery库也是一个非常强大又灵活的网页解析库,如果你有前端开发经验的,都应该接触过jQuery,那么PyQuery就是你非常绝佳的选择,PyQuery 是 Python 仿照 jQuery 的严 ...
爬虫常用库之pyquery 库
pyquery库是jQuery的Python实现,可以用于解析HTML网页内容,我个人写过的一些抓取网页数据的脚本就是用它来解析html获取数据的.他的官方文档地址是:http://packages. ...
Python中PyQuery库的使用总结
介绍 pyquery库是jQuery的Python实现,可以用于解析HTML网页内容,官方文档地址是:http://packages.python.org/pyquery/ pyquery 可让你用 ...
Python爬虫-- PyQuery库
PyQuery库 PyQuery库也是一个非常强大又灵活的网页解析库,PyQuery 是 Python 仿照 jQuery 的严格实现.语法与 jQuery 几乎完全相同,所以不用再去费心去记一些奇怪 ...
PYTHON 爬虫笔记六:PyQuery库基础用法
知识点一:PyQuery库详解及其基本使用初始化字符串初始化 html = ''' <div> <ul> <li class="item-0"&g ...
学习PyQuery库
学习PyQuery库好了,又是学习的时光啦,今天学习pyquery 来进行网页解析常规导入模块(PyQuery库中的pyquery类) from pyquery import PyQuery as ...
python之爬虫（九）PyQuery库的使用
PyQuery库也是一个非常强大又灵活的网页解析库,如果你有前端开发经验的,都应该接触过jQuery,那么PyQuery就是你非常绝佳的选择,PyQuery 是 Python 仿照 jQuery 的严 ...
Python中PyQuery库的使用
pyquery库是jQuery的Python实现,可以用于解析HTML网页内容,我个人写过的一些抓取网页数据的脚本就是用它来解析html获取数据的. 它的官方文档地址是:http://packages ...
PyQuery库
'''强大又灵活的网页解析库.如果你觉得正则写起来太麻烦,又觉得BeautifulSoup语法太难记,如果你熟悉jQuery的语法,那么PyQuery就是你的绝佳选择.'''from pyquery ...
爬虫6：pyquery库
强大又灵活的网页解析库,如果觉得正则写起来太麻烦,BeautifulSoup语法太难记,而你又熟悉jQuery的语法,那么用PyQuery就是最佳选择一. 初始化 1. 字符串初始化 h ...

随机推荐

LPSTR LPCSTR LPWSTR LPCWSTR区别
LPSTR 一个32位的指向字符串的指针 LPCSTR 一个32位的指向字符串常量的指针 LPWSTR 一个32位的指向unicode字符串的指针 LPCWSTR 个 ...
socket编程了解
Socket 编程 Socket通讯原理描述: 套接字是为特定网络协议(例如TCP/IP,ICMP/IP,UDP/IP等)套件对上的网络应用程序提供者提供当前可移植标准的对象.它们允许程序接受并进行连 ...
ranorex官网
youtube FQ看ranorex https://demo.glyptodon.com 虚拟机安卓实体 ranorex 只支持 Rxbrowser 我想操作安卓机器上的chrome 所以装了 ...
在Python中建立N维数组并赋初值
在Python中,由于不像C++/Java这样的语言可以方便的用a[i][j]=0的方式,建立二维数组并赋初值,所以需要一个相对巧妙的方法. 可以用列表解析的方式,eg: >>> m ...
CSS3box-shadow属性的使用
每次使用box-shadow,都要查阅资料才能实现对应的效果,现在总结一下,方便以后查看. 一.语法: E {box-shadow: inset x-offset y-offset blur-radi ...
UVA11367 Full Tank? 【分层图最短路】
题目 After going through the receipts from your car trip through Europe this summer, you realised that ...
[HNOI2006]公路修建问题 (二分答案,并查集)
题目链接 Solution 二分答案+并查集. 由于考虑到是要求花费的最小值,直接考虑到二分. 然后对于每一个二分出来的答案,模拟 \(Kruskal\) 的过程再做一遍连边. 同时用并查集维护联通块 ...
no main manifest attribute, in demo-1.0.jar
今天想打包一个jar到Linux上运行,发现使用java -jar demo-1.0.jar 运行报错: no main manifest attribute, in demo-1.0.jar 解决方 ...
python 面向对象与类的基本知识
一什么是面向对象,面向对象与类的关系. 面向对象的程序设计是用来解决扩展性. 面向过程:根据业务逻辑从上到下写垒代码函数式:将某功能代码封装到函数中,日后便无需重复编写,仅调用函数即可面向对象 ...
Vue列表渲染，改变数据视图层监测不到的问题
由于 JavaScript 的限制, Vue 不能检测以下变动的数组: 当你利用索引直接设置一个项时,例如: vm.items[indexOfItem] = newValue 当你修改数组的长度时,例 ...

pyquery库的使用

pyquery库的使用的更多相关文章

随机推荐

热门专题