BeautifulSoup库children(),descendants()方法的使用
BeautifulSoup库children(),descendants()方法的使用
示例网站:http://www.pythonscraping.com/pages/page3.html
网站内容:

网站部分重要源代码:
<table id="giftList">
<tr><th>
Item Title
</th><th>
Description
</th><th>
Cost
</th><th>
Image
</th></tr> <tr id="gift1" class="gift"><td>
Vegetable Basket
</td><td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td><td>
$15.00
</td><td>
<img src="../img/gifts/img1.jpg">
</td></tr> <tr id="gift2" class="gift"><td>
Russian Nesting Dolls
</td><td>
Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td><td>
$10,000.52
</td><td>
<img src="../img/gifts/img2.jpg">
</td></tr> <tr id="gift3" class="gift"><td>
Fish Painting
</td><td>
If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
</td><td>
$10,005.00
</td><td>
<img src="../img/gifts/img3.jpg">
</td></tr> <tr id="gift4" class="gift"><td>
Dead Parrot
</td><td>
This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?</span>
</td><td>
$0.50
</td><td>
<img src="../img/gifts/img4.jpg">
</td></tr> <tr id="gift5" class="gift"><td>
Mystery Box
</td><td>
If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
</td><td>
$1.50
</td><td>
<img src="../img/gifts/img6.jpg">
</td></tr>
</table>
1.children()方法的使用
# -*- coding: utf-8 -*-
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/pages/page3.html")
bsObj = BeautifulSoup(html,"lxml") for child in bsObj.find("table",{"id":"giftList"}).children:
print(child)
运行得到的结果为:
<tr><th>
Item Title
</th><th>
Description
</th><th>
Cost
</th><th>
Image
</th></tr> <tr class="gift" id="gift1"><td>
Vegetable Basket
</td><td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td><td>
$15.00
</td><td>
<img src="../img/gifts/img1.jpg"/>
</td></tr> <tr class="gift" id="gift2"><td>
Russian Nesting Dolls
</td><td>
Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td><td>
$10,000.52
</td><td>
<img src="../img/gifts/img2.jpg"/>
</td></tr> <tr class="gift" id="gift3"><td>
Fish Painting
</td><td>
If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
</td><td>
$10,005.00
</td><td>
<img src="../img/gifts/img3.jpg"/>
</td></tr> <tr class="gift" id="gift4"><td>
Dead Parrot
</td><td>
This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?</span>
</td><td>
$0.50
</td><td>
<img src="../img/gifts/img4.jpg"/>
</td></tr> <tr class="gift" id="gift5"><td>
Mystery Box
</td><td>
If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
</td><td>
$1.50
</td><td>
<img src="../img/gifts/img6.jpg"/>
</td></tr>
根据文章中的字面意思来分析:

children()方法指代的是与parent离得最近(也就是下一个)标签,程序中的children指代的是tr这个标签。
实验:将children用tr替换掉会得到与以上相同的结果吗?
# -*- coding: utf-8 -*-
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/pages/page3.html")
bsObj = BeautifulSoup(html,"lxml") for child in bsObj.find("table",{"id":"giftList"}).tr:
print(child)
运行结果为:
<th>
Item Title
</th>
<th>
Description
</th>
<th>
Cost
</th>
<th>
Image
</th>
对以上实验结果进行分析得到:children可以列出所有的子类,而直接指定标签,则不行。
2.descendants()方法的使用
# -*- coding: utf-8 -*-
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/pages/page3.html")
bsObj = BeautifulSoup(html,"lxml") for child in bsObj.find("table",{"id":"giftList"}).descendants:
print(child)
运行结果为:
<tr><th>
Item Title
</th><th>
Description
</th><th>
Cost
</th><th>
Image
</th></tr>
<th>
Item Title
</th> Item Title <th>
Description
</th> Description <th>
Cost
</th> Cost <th>
Image
</th> Image <tr class="gift" id="gift1"><td>
Vegetable Basket
</td><td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td><td>
$15.00
</td><td>
<img src="../img/gifts/img1.jpg"/>
</td></tr>
<td>
Vegetable Basket
</td> Vegetable Basket <td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td> This vegetable basket is the perfect gift for your health conscious (or overweight) friends! <span class="excitingNote">Now with super-colorful bell peppers!</span>
Now with super-colorful bell peppers! <td>
$15.00
</td> $15.00 <td>
<img src="../img/gifts/img1.jpg"/>
</td> <img src="../img/gifts/img1.jpg"/> <tr class="gift" id="gift2"><td>
Russian Nesting Dolls
</td><td>
Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td><td>
$10,000.52
</td><td>
<img src="../img/gifts/img2.jpg"/>
</td></tr>
<td>
Russian Nesting Dolls
</td> Russian Nesting Dolls <td>
Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td> Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"!
<span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
8 entire dolls per set! Octuple the presents! <td>
$10,000.52
</td> $10,000.52 <td>
<img src="../img/gifts/img2.jpg"/>
</td> <img src="../img/gifts/img2.jpg"/> <tr class="gift" id="gift3"><td>
Fish Painting
</td><td>
If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
</td><td>
$10,005.00
</td><td>
<img src="../img/gifts/img3.jpg"/>
</td></tr>
<td>
Fish Painting
</td> Fish Painting <td>
If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
</td> If something seems fishy about this painting, it's because it's a fish!
<span class="excitingNote">Also hand-painted by trained monkeys!</span>
Also hand-painted by trained monkeys! <td>
$10,005.00
</td> $10,005.00 <td>
<img src="../img/gifts/img3.jpg"/>
</td> <img src="../img/gifts/img3.jpg"/> <tr class="gift" id="gift4"><td>
Dead Parrot
</td><td>
This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?</span>
</td><td>
$0.50
</td><td>
<img src="../img/gifts/img4.jpg"/>
</td></tr>
<td>
Dead Parrot
</td> Dead Parrot <td>
This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?</span>
</td> This is an ex-parrot!
<span class="excitingNote">Or maybe he's only resting?</span>
Or maybe he's only resting? <td>
$0.50
</td> $0.50 <td>
<img src="../img/gifts/img4.jpg"/>
</td> <img src="../img/gifts/img4.jpg"/> <tr class="gift" id="gift5"><td>
Mystery Box
</td><td>
If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
</td><td>
$1.50
</td><td>
<img src="../img/gifts/img6.jpg"/>
</td></tr>
<td>
Mystery Box
</td> Mystery Box <td>
If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
</td> If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining.
<span class="excitingNote">Keep your friends guessing!</span>
Keep your friends guessing! <td>
$1.50
</td> $1.50 <td>
<img src="../img/gifts/img6.jpg"/>
</td> <img src="../img/gifts/img6.jpg"/>
BeautifulSoup库children(),descendants()方法的使用的更多相关文章
- BeautifulSoup库的使用方法
from bs4 import BeautifulSoup import lxml html = ''' <html><head><title>The Dormou ...
- 【Python】在Pycharm中安装爬虫库requests , BeautifulSoup , lxml 的解决方法
BeautifulSoup在学习Python过程中可能需要用到一些爬虫库 例如:requests BeautifulSoup和lxml库 前面的两个库,用Pychram都可以通过 File--> ...
- python网络爬虫学习笔记(二)BeautifulSoup库
Beautiful Soup库也称为beautiful4库.bs4库,它可用于解析HTML/XML,并将所有文件.字符串转换为'utf-8'编码.HTML/XML文档是与“标签树一一对应的.具体地说, ...
- BeautifulSoup库
'''灵活又方便的网页解析库,处理高效,支持多种解析器.利用它不用编写正则表达式即可方便的实现网页信息的提取.''' BeautifulSoup库包含的一些解析库: 解析库 使用方法 优势 劣势 py ...
- python BeautifulSoup库的基本使用
Beautiful Soup 是用Python写的一个HTML/XML的解析器,它可以很好的处理不规范标记并生成剖析树(parse tree). 它提供简单又常用的导航(navigating),搜索以 ...
- Python爬虫利器:BeautifulSoup库
Beautiful Soup parses anything you give it, and does the tree traversal stuff for you. BeautifulSoup ...
- PYTHON 爬虫笔记五:BeautifulSoup库基础用法
知识点一:BeautifulSoup库详解及其基本使用方法 什么是BeautifulSoup 灵活又方便的网页解析库,处理高效,支持多种解析器.利用它不用编写正则表达式即可方便实现网页信息的提取库. ...
- python爬虫入门四:BeautifulSoup库(转)
正则表达式可以从html代码中提取我们想要的数据信息,它比较繁琐复杂,编写的时候效率不高,但我们又最好是能够学会使用正则表达式. 我在网络上发现了一篇关于写得很好的教程,如果需要使用正则表达式的话,参 ...
- python之BeautifulSoup库
1. BeautifulSoup库简介 和 lxml 一样,Beautiful Soup 也是一个HTML/XML的解析器,主要的功能也是如何解析和提取 HTML/XML 数据.lxml 只会局部遍历 ...
随机推荐
- 玩转 iOS 10 推送 —— UserNotifications Framework(合集)
iOS 10 came 在今年 6月14号 苹果开发者大会 WWDC 2016 之后,笔者赶紧就去 apple 的开发者网站下载了最新的 Xcode 8 beta 和 iOS 10 beta,然后在自 ...
- PetaPoco 快速上手
今天来给大家分享一个好用的轻型的.net框架的ORM——PetaPoco 本着快速上手的原则,我们通过和EF对比,让大家能快速使用PetaPoco PetaPoco大家可能没有听说过,但大家一定听说过 ...
- 解决https证书验证不通过的问题
1.报错信息 java.security.cert.CertificateException: No name matching api.weibo.com found; nested excepti ...
- Selenium Web 自动化 - 项目持续集成
Selenium Web 自动化 - 项目持续集成 2017-02-13 目录 1环境准备 1.1 安装git 1.2 安装jenkins 1.3 安装jenkins插件 1.4 jekins ...
- js相关小实例——div实现下拉菜单
代码如下: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w ...
- strtok、strtok_s、strtok_r 字符串分割函数
1.strtok函数 函数原型:char * strtok (char *str, const char * delimiters); 参数:str,待分割的字符串(c-string):delimit ...
- 【前端】ACE Editor 简易使用示例
身为一个早已退役的Oier,当然忘不了当年一个个OJ页面上的代码显示和代码编辑器. 其中,洛谷使用的ACE Editor就是之一,非常的简洁美观.以及实际上在前端页面上搭建一个ACE Editor也是 ...
- js动态加载的蒙板弹框
我们访问一些网站时总会遇到这种点击后,背景像被打上一层模板一样,这个是怎么做到的呢? 它是将这个弹框div独立于页面容器wrap,设置position为absolute,将其水平垂直之后都居中,设置弹 ...
- spring循环依赖问题分析
新搞了一个单点登录的项目,用的cas,要把源码的cas-webapp改造成适合我们业务场景的项目,于是新加了一些spring的配置文件. 但是在项目启动时报错了,错误日志如下: 一月 , :: 下午 ...
- ios 个推推送集成
个推推送总结: 个推第三方平台官网地址:http://www.getui.com/cn/index.html 首先去官网注册账号,创建应用,应用的配置信息,创建APNs推送证书上传 P12证书(开发对 ...