BeautifulSoup库children(),descendants()方法的使用

示例网站:http://www.pythonscraping.com/pages/page3.html

网站内容:

网站部分重要源代码:

<table id="giftList">
<tr><th>
Item Title
</th><th>
Description
</th><th>
Cost
</th><th>
Image
</th></tr> <tr id="gift1" class="gift"><td>
Vegetable Basket
</td><td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td><td>
$15.00
</td><td>
<img src="../img/gifts/img1.jpg">
</td></tr> <tr id="gift2" class="gift"><td>
Russian Nesting Dolls
</td><td>
Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td><td>
$10,000.52
</td><td>
<img src="../img/gifts/img2.jpg">
</td></tr> <tr id="gift3" class="gift"><td>
Fish Painting
</td><td>
If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
</td><td>
$10,005.00
</td><td>
<img src="../img/gifts/img3.jpg">
</td></tr> <tr id="gift4" class="gift"><td>
Dead Parrot
</td><td>
This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?</span>
</td><td>
$0.50
</td><td>
<img src="../img/gifts/img4.jpg">
</td></tr> <tr id="gift5" class="gift"><td>
Mystery Box
</td><td>
If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
</td><td>
$1.50
</td><td>
<img src="../img/gifts/img6.jpg">
</td></tr>
</table>

1.children()方法的使用

# -*- coding: utf-8 -*-
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/pages/page3.html")
bsObj = BeautifulSoup(html,"lxml") for child in bsObj.find("table",{"id":"giftList"}).children:
print(child)

运行得到的结果为:

<tr><th>
Item Title
</th><th>
Description
</th><th>
Cost
</th><th>
Image
</th></tr> <tr class="gift" id="gift1"><td>
Vegetable Basket
</td><td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td><td>
$15.00
</td><td>
<img src="../img/gifts/img1.jpg"/>
</td></tr> <tr class="gift" id="gift2"><td>
Russian Nesting Dolls
</td><td>
Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td><td>
$10,000.52
</td><td>
<img src="../img/gifts/img2.jpg"/>
</td></tr> <tr class="gift" id="gift3"><td>
Fish Painting
</td><td>
If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
</td><td>
$10,005.00
</td><td>
<img src="../img/gifts/img3.jpg"/>
</td></tr> <tr class="gift" id="gift4"><td>
Dead Parrot
</td><td>
This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?</span>
</td><td>
$0.50
</td><td>
<img src="../img/gifts/img4.jpg"/>
</td></tr> <tr class="gift" id="gift5"><td>
Mystery Box
</td><td>
If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
</td><td>
$1.50
</td><td>
<img src="../img/gifts/img6.jpg"/>
</td></tr>

根据文章中的字面意思来分析:

children()方法指代的是与parent离得最近(也就是下一个)标签,程序中的children指代的是tr这个标签。

实验:将children用tr替换掉会得到与以上相同的结果吗?

# -*- coding: utf-8 -*-
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/pages/page3.html")
bsObj = BeautifulSoup(html,"lxml") for child in bsObj.find("table",{"id":"giftList"}).tr:
print(child)

运行结果为:

<th>
Item Title
</th>
<th>
Description
</th>
<th>
Cost
</th>
<th>
Image
</th>

对以上实验结果进行分析得到:children可以列出所有的子类,而直接指定标签,则不行。

2.descendants()方法的使用

# -*- coding: utf-8 -*-
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/pages/page3.html")
bsObj = BeautifulSoup(html,"lxml") for child in bsObj.find("table",{"id":"giftList"}).descendants:
print(child)

运行结果为:

<tr><th>
Item Title
</th><th>
Description
</th><th>
Cost
</th><th>
Image
</th></tr>
<th>
Item Title
</th> Item Title <th>
Description
</th> Description <th>
Cost
</th> Cost <th>
Image
</th> Image <tr class="gift" id="gift1"><td>
Vegetable Basket
</td><td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td><td>
$15.00
</td><td>
<img src="../img/gifts/img1.jpg"/>
</td></tr>
<td>
Vegetable Basket
</td> Vegetable Basket <td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td> This vegetable basket is the perfect gift for your health conscious (or overweight) friends! <span class="excitingNote">Now with super-colorful bell peppers!</span>
Now with super-colorful bell peppers! <td>
$15.00
</td> $15.00 <td>
<img src="../img/gifts/img1.jpg"/>
</td> <img src="../img/gifts/img1.jpg"/> <tr class="gift" id="gift2"><td>
Russian Nesting Dolls
</td><td>
Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td><td>
$10,000.52
</td><td>
<img src="../img/gifts/img2.jpg"/>
</td></tr>
<td>
Russian Nesting Dolls
</td> Russian Nesting Dolls <td>
Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td> Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"!
<span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
8 entire dolls per set! Octuple the presents! <td>
$10,000.52
</td> $10,000.52 <td>
<img src="../img/gifts/img2.jpg"/>
</td> <img src="../img/gifts/img2.jpg"/> <tr class="gift" id="gift3"><td>
Fish Painting
</td><td>
If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
</td><td>
$10,005.00
</td><td>
<img src="../img/gifts/img3.jpg"/>
</td></tr>
<td>
Fish Painting
</td> Fish Painting <td>
If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
</td> If something seems fishy about this painting, it's because it's a fish!
<span class="excitingNote">Also hand-painted by trained monkeys!</span>
Also hand-painted by trained monkeys! <td>
$10,005.00
</td> $10,005.00 <td>
<img src="../img/gifts/img3.jpg"/>
</td> <img src="../img/gifts/img3.jpg"/> <tr class="gift" id="gift4"><td>
Dead Parrot
</td><td>
This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?</span>
</td><td>
$0.50
</td><td>
<img src="../img/gifts/img4.jpg"/>
</td></tr>
<td>
Dead Parrot
</td> Dead Parrot <td>
This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?</span>
</td> This is an ex-parrot!
<span class="excitingNote">Or maybe he's only resting?</span>
Or maybe he's only resting? <td>
$0.50
</td> $0.50 <td>
<img src="../img/gifts/img4.jpg"/>
</td> <img src="../img/gifts/img4.jpg"/> <tr class="gift" id="gift5"><td>
Mystery Box
</td><td>
If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
</td><td>
$1.50
</td><td>
<img src="../img/gifts/img6.jpg"/>
</td></tr>
<td>
Mystery Box
</td> Mystery Box <td>
If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
</td> If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining.
<span class="excitingNote">Keep your friends guessing!</span>
Keep your friends guessing! <td>
$1.50
</td> $1.50 <td>
<img src="../img/gifts/img6.jpg"/>
</td> <img src="../img/gifts/img6.jpg"/>

BeautifulSoup库children(),descendants()方法的使用的更多相关文章

  1. BeautifulSoup库的使用方法

    from bs4 import BeautifulSoup import lxml html = ''' <html><head><title>The Dormou ...

  2. 【Python】在Pycharm中安装爬虫库requests , BeautifulSoup , lxml 的解决方法

    BeautifulSoup在学习Python过程中可能需要用到一些爬虫库 例如:requests BeautifulSoup和lxml库 前面的两个库,用Pychram都可以通过 File--> ...

  3. python网络爬虫学习笔记(二)BeautifulSoup库

    Beautiful Soup库也称为beautiful4库.bs4库,它可用于解析HTML/XML,并将所有文件.字符串转换为'utf-8'编码.HTML/XML文档是与“标签树一一对应的.具体地说, ...

  4. BeautifulSoup库

    '''灵活又方便的网页解析库,处理高效,支持多种解析器.利用它不用编写正则表达式即可方便的实现网页信息的提取.''' BeautifulSoup库包含的一些解析库: 解析库 使用方法 优势 劣势 py ...

  5. python BeautifulSoup库的基本使用

    Beautiful Soup 是用Python写的一个HTML/XML的解析器,它可以很好的处理不规范标记并生成剖析树(parse tree). 它提供简单又常用的导航(navigating),搜索以 ...

  6. Python爬虫利器:BeautifulSoup库

    Beautiful Soup parses anything you give it, and does the tree traversal stuff for you. BeautifulSoup ...

  7. PYTHON 爬虫笔记五:BeautifulSoup库基础用法

    知识点一:BeautifulSoup库详解及其基本使用方法 什么是BeautifulSoup 灵活又方便的网页解析库,处理高效,支持多种解析器.利用它不用编写正则表达式即可方便实现网页信息的提取库. ...

  8. python爬虫入门四:BeautifulSoup库(转)

    正则表达式可以从html代码中提取我们想要的数据信息,它比较繁琐复杂,编写的时候效率不高,但我们又最好是能够学会使用正则表达式. 我在网络上发现了一篇关于写得很好的教程,如果需要使用正则表达式的话,参 ...

  9. python之BeautifulSoup库

    1. BeautifulSoup库简介 和 lxml 一样,Beautiful Soup 也是一个HTML/XML的解析器,主要的功能也是如何解析和提取 HTML/XML 数据.lxml 只会局部遍历 ...

随机推荐

  1. 基于nodejs模拟浏览器post请求爬取json数据

    今天想爬取某网站的后台传来的数据,中间遇到了很多阻碍,花了2个小时才请求到数据,所以我在此总结了一些经验. 首先,放上我所爬取的请求地址http://api.chuchujie.com/api/?v= ...

  2. Triangle Problems

    Triangle Problem songxiuhuan 宋修寰 Import the Junit and eclemma Choose the project and right click, ch ...

  3. ACM 比大小

    比大小 时间限制:3000 ms  |  内存限制:65535 KB 难度:2   描述 给你两个很大的数,你能不能判断出他们两个数的大小呢? 比如123456789123456789要大于-1234 ...

  4. 2953: [Poi2002]商务旅行

    2953: [Poi2002]商务旅行 Time Limit: 3 Sec  Memory Limit: 128 MBSubmit: 8  Solved: 8[Submit][Status] Desc ...

  5. 用php+mysql+ajax+jquery做省市区三级联动

    要求:写一个省市区(或者年月日)的三级联动,实现地区或时间的下拉选择. 实现技术:php ajax 实现:省级下拉变化时市下拉区下拉跟着变化,市级下拉变化时区下拉跟着变化. 使用chinastates ...

  6. Android 启动模式--任务(Task)--桟 的误区

    Android 启动模式--任务(Task)--桟 的误区 写这篇文章是因为前几天的一次面试,面试官说SingleInstance模式会新建一个桟,而SingleTask不会.首先不说这个对不对(非要 ...

  7. 用C写一个web服务器(一) 基础功能

    .container { margin-right: auto; margin-left: auto; padding-left: 15px; padding-right: 15px } .conta ...

  8. .NET Core中的包、元包与框架

    本文为翻译文章,原文:Packages, Metapackages and Frameworks .NET Core是一个由NuGet包组成的平台.一些产品受益于细粒度包的定义,也有一些受益于粗粒度包 ...

  9. 享受release版本发布的好处的同时也应该警惕release可能给你引入一些莫名其妙的大bug

    一般我们发布项目的时候通常都会采用release版本,因为release会在jit层面对我们的il代码进行了优化,比如在迭代和内存操作的性能提升方面,废话不多说, 我先用一个简单的“冒泡排序”体验下r ...

  10. keepalive配置文件详解

    第一部分:全局定义块 1.email通知.作用:有故障,发邮件报警. 2.Lvs负载均衡器标识(lvs_id).在一个网络内,它应该是唯一的. 3.花括号“{}”.用来分隔定义块,因此必须成对出现.如 ...