BeautifulSoup库children(),descendants()方法的使用

示例网站:http://www.pythonscraping.com/pages/page3.html

网站内容:

网站部分重要源代码:

<table id="giftList">
<tr><th>
Item Title
</th><th>
Description
</th><th>
Cost
</th><th>
Image
</th></tr> <tr id="gift1" class="gift"><td>
Vegetable Basket
</td><td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td><td>
$15.00
</td><td>
<img src="../img/gifts/img1.jpg">
</td></tr> <tr id="gift2" class="gift"><td>
Russian Nesting Dolls
</td><td>
Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td><td>
$10,000.52
</td><td>
<img src="../img/gifts/img2.jpg">
</td></tr> <tr id="gift3" class="gift"><td>
Fish Painting
</td><td>
If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
</td><td>
$10,005.00
</td><td>
<img src="../img/gifts/img3.jpg">
</td></tr> <tr id="gift4" class="gift"><td>
Dead Parrot
</td><td>
This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?</span>
</td><td>
$0.50
</td><td>
<img src="../img/gifts/img4.jpg">
</td></tr> <tr id="gift5" class="gift"><td>
Mystery Box
</td><td>
If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
</td><td>
$1.50
</td><td>
<img src="../img/gifts/img6.jpg">
</td></tr>
</table>

1.children()方法的使用

# -*- coding: utf-8 -*-
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/pages/page3.html")
bsObj = BeautifulSoup(html,"lxml") for child in bsObj.find("table",{"id":"giftList"}).children:
print(child)

运行得到的结果为:

<tr><th>
Item Title
</th><th>
Description
</th><th>
Cost
</th><th>
Image
</th></tr> <tr class="gift" id="gift1"><td>
Vegetable Basket
</td><td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td><td>
$15.00
</td><td>
<img src="../img/gifts/img1.jpg"/>
</td></tr> <tr class="gift" id="gift2"><td>
Russian Nesting Dolls
</td><td>
Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td><td>
$10,000.52
</td><td>
<img src="../img/gifts/img2.jpg"/>
</td></tr> <tr class="gift" id="gift3"><td>
Fish Painting
</td><td>
If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
</td><td>
$10,005.00
</td><td>
<img src="../img/gifts/img3.jpg"/>
</td></tr> <tr class="gift" id="gift4"><td>
Dead Parrot
</td><td>
This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?</span>
</td><td>
$0.50
</td><td>
<img src="../img/gifts/img4.jpg"/>
</td></tr> <tr class="gift" id="gift5"><td>
Mystery Box
</td><td>
If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
</td><td>
$1.50
</td><td>
<img src="../img/gifts/img6.jpg"/>
</td></tr>

根据文章中的字面意思来分析:

children()方法指代的是与parent离得最近(也就是下一个)标签,程序中的children指代的是tr这个标签。

实验:将children用tr替换掉会得到与以上相同的结果吗?

# -*- coding: utf-8 -*-
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/pages/page3.html")
bsObj = BeautifulSoup(html,"lxml") for child in bsObj.find("table",{"id":"giftList"}).tr:
print(child)

运行结果为:

<th>
Item Title
</th>
<th>
Description
</th>
<th>
Cost
</th>
<th>
Image
</th>

对以上实验结果进行分析得到:children可以列出所有的子类,而直接指定标签,则不行。

2.descendants()方法的使用

# -*- coding: utf-8 -*-
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/pages/page3.html")
bsObj = BeautifulSoup(html,"lxml") for child in bsObj.find("table",{"id":"giftList"}).descendants:
print(child)

运行结果为:

<tr><th>
Item Title
</th><th>
Description
</th><th>
Cost
</th><th>
Image
</th></tr>
<th>
Item Title
</th> Item Title <th>
Description
</th> Description <th>
Cost
</th> Cost <th>
Image
</th> Image <tr class="gift" id="gift1"><td>
Vegetable Basket
</td><td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td><td>
$15.00
</td><td>
<img src="../img/gifts/img1.jpg"/>
</td></tr>
<td>
Vegetable Basket
</td> Vegetable Basket <td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td> This vegetable basket is the perfect gift for your health conscious (or overweight) friends! <span class="excitingNote">Now with super-colorful bell peppers!</span>
Now with super-colorful bell peppers! <td>
$15.00
</td> $15.00 <td>
<img src="../img/gifts/img1.jpg"/>
</td> <img src="../img/gifts/img1.jpg"/> <tr class="gift" id="gift2"><td>
Russian Nesting Dolls
</td><td>
Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td><td>
$10,000.52
</td><td>
<img src="../img/gifts/img2.jpg"/>
</td></tr>
<td>
Russian Nesting Dolls
</td> Russian Nesting Dolls <td>
Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td> Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"!
<span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
8 entire dolls per set! Octuple the presents! <td>
$10,000.52
</td> $10,000.52 <td>
<img src="../img/gifts/img2.jpg"/>
</td> <img src="../img/gifts/img2.jpg"/> <tr class="gift" id="gift3"><td>
Fish Painting
</td><td>
If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
</td><td>
$10,005.00
</td><td>
<img src="../img/gifts/img3.jpg"/>
</td></tr>
<td>
Fish Painting
</td> Fish Painting <td>
If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
</td> If something seems fishy about this painting, it's because it's a fish!
<span class="excitingNote">Also hand-painted by trained monkeys!</span>
Also hand-painted by trained monkeys! <td>
$10,005.00
</td> $10,005.00 <td>
<img src="../img/gifts/img3.jpg"/>
</td> <img src="../img/gifts/img3.jpg"/> <tr class="gift" id="gift4"><td>
Dead Parrot
</td><td>
This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?</span>
</td><td>
$0.50
</td><td>
<img src="../img/gifts/img4.jpg"/>
</td></tr>
<td>
Dead Parrot
</td> Dead Parrot <td>
This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?</span>
</td> This is an ex-parrot!
<span class="excitingNote">Or maybe he's only resting?</span>
Or maybe he's only resting? <td>
$0.50
</td> $0.50 <td>
<img src="../img/gifts/img4.jpg"/>
</td> <img src="../img/gifts/img4.jpg"/> <tr class="gift" id="gift5"><td>
Mystery Box
</td><td>
If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
</td><td>
$1.50
</td><td>
<img src="../img/gifts/img6.jpg"/>
</td></tr>
<td>
Mystery Box
</td> Mystery Box <td>
If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
</td> If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining.
<span class="excitingNote">Keep your friends guessing!</span>
Keep your friends guessing! <td>
$1.50
</td> $1.50 <td>
<img src="../img/gifts/img6.jpg"/>
</td> <img src="../img/gifts/img6.jpg"/>

BeautifulSoup库children(),descendants()方法的使用的更多相关文章

  1. BeautifulSoup库的使用方法

    from bs4 import BeautifulSoup import lxml html = ''' <html><head><title>The Dormou ...

  2. 【Python】在Pycharm中安装爬虫库requests , BeautifulSoup , lxml 的解决方法

    BeautifulSoup在学习Python过程中可能需要用到一些爬虫库 例如:requests BeautifulSoup和lxml库 前面的两个库,用Pychram都可以通过 File--> ...

  3. python网络爬虫学习笔记(二)BeautifulSoup库

    Beautiful Soup库也称为beautiful4库.bs4库,它可用于解析HTML/XML,并将所有文件.字符串转换为'utf-8'编码.HTML/XML文档是与“标签树一一对应的.具体地说, ...

  4. BeautifulSoup库

    '''灵活又方便的网页解析库,处理高效,支持多种解析器.利用它不用编写正则表达式即可方便的实现网页信息的提取.''' BeautifulSoup库包含的一些解析库: 解析库 使用方法 优势 劣势 py ...

  5. python BeautifulSoup库的基本使用

    Beautiful Soup 是用Python写的一个HTML/XML的解析器,它可以很好的处理不规范标记并生成剖析树(parse tree). 它提供简单又常用的导航(navigating),搜索以 ...

  6. Python爬虫利器:BeautifulSoup库

    Beautiful Soup parses anything you give it, and does the tree traversal stuff for you. BeautifulSoup ...

  7. PYTHON 爬虫笔记五:BeautifulSoup库基础用法

    知识点一:BeautifulSoup库详解及其基本使用方法 什么是BeautifulSoup 灵活又方便的网页解析库,处理高效,支持多种解析器.利用它不用编写正则表达式即可方便实现网页信息的提取库. ...

  8. python爬虫入门四:BeautifulSoup库(转)

    正则表达式可以从html代码中提取我们想要的数据信息,它比较繁琐复杂,编写的时候效率不高,但我们又最好是能够学会使用正则表达式. 我在网络上发现了一篇关于写得很好的教程,如果需要使用正则表达式的话,参 ...

  9. python之BeautifulSoup库

    1. BeautifulSoup库简介 和 lxml 一样,Beautiful Soup 也是一个HTML/XML的解析器,主要的功能也是如何解析和提取 HTML/XML 数据.lxml 只会局部遍历 ...

随机推荐

  1. 玩转 iOS 10 推送 —— UserNotifications Framework(合集)

    iOS 10 came 在今年 6月14号 苹果开发者大会 WWDC 2016 之后,笔者赶紧就去 apple 的开发者网站下载了最新的 Xcode 8 beta 和 iOS 10 beta,然后在自 ...

  2. PetaPoco 快速上手

    今天来给大家分享一个好用的轻型的.net框架的ORM——PetaPoco 本着快速上手的原则,我们通过和EF对比,让大家能快速使用PetaPoco PetaPoco大家可能没有听说过,但大家一定听说过 ...

  3. 解决https证书验证不通过的问题

    1.报错信息 java.security.cert.CertificateException: No name matching api.weibo.com found; nested excepti ...

  4. Selenium Web 自动化 - 项目持续集成

    Selenium Web 自动化 - 项目持续集成 2017-02-13 目录 1环境准备  1.1 安装git  1.2 安装jenkins  1.3 安装jenkins插件  1.4 jekins ...

  5. js相关小实例——div实现下拉菜单

    代码如下: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w ...

  6. strtok、strtok_s、strtok_r 字符串分割函数

    1.strtok函数 函数原型:char * strtok (char *str, const char * delimiters); 参数:str,待分割的字符串(c-string):delimit ...

  7. 【前端】ACE Editor 简易使用示例

    身为一个早已退役的Oier,当然忘不了当年一个个OJ页面上的代码显示和代码编辑器. 其中,洛谷使用的ACE Editor就是之一,非常的简洁美观.以及实际上在前端页面上搭建一个ACE Editor也是 ...

  8. js动态加载的蒙板弹框

    我们访问一些网站时总会遇到这种点击后,背景像被打上一层模板一样,这个是怎么做到的呢? 它是将这个弹框div独立于页面容器wrap,设置position为absolute,将其水平垂直之后都居中,设置弹 ...

  9. spring循环依赖问题分析

    新搞了一个单点登录的项目,用的cas,要把源码的cas-webapp改造成适合我们业务场景的项目,于是新加了一些spring的配置文件. 但是在项目启动时报错了,错误日志如下: 一月 , :: 下午 ...

  10. ios 个推推送集成

    个推推送总结: 个推第三方平台官网地址:http://www.getui.com/cn/index.html 首先去官网注册账号,创建应用,应用的配置信息,创建APNs推送证书上传 P12证书(开发对 ...