BeautifulSoup库children(),descendants()方法的使用

示例网站:http://www.pythonscraping.com/pages/page3.html

网站内容:

网站部分重要源代码:

<table id="giftList">
<tr><th>
Item Title
</th><th>
Description
</th><th>
Cost
</th><th>
Image
</th></tr> <tr id="gift1" class="gift"><td>
Vegetable Basket
</td><td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td><td>
$15.00
</td><td>
<img src="../img/gifts/img1.jpg">
</td></tr> <tr id="gift2" class="gift"><td>
Russian Nesting Dolls
</td><td>
Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td><td>
$10,000.52
</td><td>
<img src="../img/gifts/img2.jpg">
</td></tr> <tr id="gift3" class="gift"><td>
Fish Painting
</td><td>
If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
</td><td>
$10,005.00
</td><td>
<img src="../img/gifts/img3.jpg">
</td></tr> <tr id="gift4" class="gift"><td>
Dead Parrot
</td><td>
This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?</span>
</td><td>
$0.50
</td><td>
<img src="../img/gifts/img4.jpg">
</td></tr> <tr id="gift5" class="gift"><td>
Mystery Box
</td><td>
If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
</td><td>
$1.50
</td><td>
<img src="../img/gifts/img6.jpg">
</td></tr>
</table>

1.children()方法的使用

# -*- coding: utf-8 -*-
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/pages/page3.html")
bsObj = BeautifulSoup(html,"lxml") for child in bsObj.find("table",{"id":"giftList"}).children:
print(child)

运行得到的结果为:

<tr><th>
Item Title
</th><th>
Description
</th><th>
Cost
</th><th>
Image
</th></tr> <tr class="gift" id="gift1"><td>
Vegetable Basket
</td><td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td><td>
$15.00
</td><td>
<img src="../img/gifts/img1.jpg"/>
</td></tr> <tr class="gift" id="gift2"><td>
Russian Nesting Dolls
</td><td>
Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td><td>
$10,000.52
</td><td>
<img src="../img/gifts/img2.jpg"/>
</td></tr> <tr class="gift" id="gift3"><td>
Fish Painting
</td><td>
If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
</td><td>
$10,005.00
</td><td>
<img src="../img/gifts/img3.jpg"/>
</td></tr> <tr class="gift" id="gift4"><td>
Dead Parrot
</td><td>
This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?</span>
</td><td>
$0.50
</td><td>
<img src="../img/gifts/img4.jpg"/>
</td></tr> <tr class="gift" id="gift5"><td>
Mystery Box
</td><td>
If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
</td><td>
$1.50
</td><td>
<img src="../img/gifts/img6.jpg"/>
</td></tr>

根据文章中的字面意思来分析:

children()方法指代的是与parent离得最近(也就是下一个)标签,程序中的children指代的是tr这个标签。

实验:将children用tr替换掉会得到与以上相同的结果吗?

# -*- coding: utf-8 -*-
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/pages/page3.html")
bsObj = BeautifulSoup(html,"lxml") for child in bsObj.find("table",{"id":"giftList"}).tr:
print(child)

运行结果为:

<th>
Item Title
</th>
<th>
Description
</th>
<th>
Cost
</th>
<th>
Image
</th>

对以上实验结果进行分析得到:children可以列出所有的子类,而直接指定标签,则不行。

2.descendants()方法的使用

# -*- coding: utf-8 -*-
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/pages/page3.html")
bsObj = BeautifulSoup(html,"lxml") for child in bsObj.find("table",{"id":"giftList"}).descendants:
print(child)

运行结果为:

<tr><th>
Item Title
</th><th>
Description
</th><th>
Cost
</th><th>
Image
</th></tr>
<th>
Item Title
</th> Item Title <th>
Description
</th> Description <th>
Cost
</th> Cost <th>
Image
</th> Image <tr class="gift" id="gift1"><td>
Vegetable Basket
</td><td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td><td>
$15.00
</td><td>
<img src="../img/gifts/img1.jpg"/>
</td></tr>
<td>
Vegetable Basket
</td> Vegetable Basket <td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td> This vegetable basket is the perfect gift for your health conscious (or overweight) friends! <span class="excitingNote">Now with super-colorful bell peppers!</span>
Now with super-colorful bell peppers! <td>
$15.00
</td> $15.00 <td>
<img src="../img/gifts/img1.jpg"/>
</td> <img src="../img/gifts/img1.jpg"/> <tr class="gift" id="gift2"><td>
Russian Nesting Dolls
</td><td>
Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td><td>
$10,000.52
</td><td>
<img src="../img/gifts/img2.jpg"/>
</td></tr>
<td>
Russian Nesting Dolls
</td> Russian Nesting Dolls <td>
Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td> Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"!
<span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
8 entire dolls per set! Octuple the presents! <td>
$10,000.52
</td> $10,000.52 <td>
<img src="../img/gifts/img2.jpg"/>
</td> <img src="../img/gifts/img2.jpg"/> <tr class="gift" id="gift3"><td>
Fish Painting
</td><td>
If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
</td><td>
$10,005.00
</td><td>
<img src="../img/gifts/img3.jpg"/>
</td></tr>
<td>
Fish Painting
</td> Fish Painting <td>
If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
</td> If something seems fishy about this painting, it's because it's a fish!
<span class="excitingNote">Also hand-painted by trained monkeys!</span>
Also hand-painted by trained monkeys! <td>
$10,005.00
</td> $10,005.00 <td>
<img src="../img/gifts/img3.jpg"/>
</td> <img src="../img/gifts/img3.jpg"/> <tr class="gift" id="gift4"><td>
Dead Parrot
</td><td>
This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?</span>
</td><td>
$0.50
</td><td>
<img src="../img/gifts/img4.jpg"/>
</td></tr>
<td>
Dead Parrot
</td> Dead Parrot <td>
This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?</span>
</td> This is an ex-parrot!
<span class="excitingNote">Or maybe he's only resting?</span>
Or maybe he's only resting? <td>
$0.50
</td> $0.50 <td>
<img src="../img/gifts/img4.jpg"/>
</td> <img src="../img/gifts/img4.jpg"/> <tr class="gift" id="gift5"><td>
Mystery Box
</td><td>
If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
</td><td>
$1.50
</td><td>
<img src="../img/gifts/img6.jpg"/>
</td></tr>
<td>
Mystery Box
</td> Mystery Box <td>
If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
</td> If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining.
<span class="excitingNote">Keep your friends guessing!</span>
Keep your friends guessing! <td>
$1.50
</td> $1.50 <td>
<img src="../img/gifts/img6.jpg"/>
</td> <img src="../img/gifts/img6.jpg"/>

BeautifulSoup库children(),descendants()方法的使用的更多相关文章

  1. BeautifulSoup库的使用方法

    from bs4 import BeautifulSoup import lxml html = ''' <html><head><title>The Dormou ...

  2. 【Python】在Pycharm中安装爬虫库requests , BeautifulSoup , lxml 的解决方法

    BeautifulSoup在学习Python过程中可能需要用到一些爬虫库 例如:requests BeautifulSoup和lxml库 前面的两个库,用Pychram都可以通过 File--> ...

  3. python网络爬虫学习笔记(二)BeautifulSoup库

    Beautiful Soup库也称为beautiful4库.bs4库,它可用于解析HTML/XML,并将所有文件.字符串转换为'utf-8'编码.HTML/XML文档是与“标签树一一对应的.具体地说, ...

  4. BeautifulSoup库

    '''灵活又方便的网页解析库,处理高效,支持多种解析器.利用它不用编写正则表达式即可方便的实现网页信息的提取.''' BeautifulSoup库包含的一些解析库: 解析库 使用方法 优势 劣势 py ...

  5. python BeautifulSoup库的基本使用

    Beautiful Soup 是用Python写的一个HTML/XML的解析器,它可以很好的处理不规范标记并生成剖析树(parse tree). 它提供简单又常用的导航(navigating),搜索以 ...

  6. Python爬虫利器:BeautifulSoup库

    Beautiful Soup parses anything you give it, and does the tree traversal stuff for you. BeautifulSoup ...

  7. PYTHON 爬虫笔记五:BeautifulSoup库基础用法

    知识点一:BeautifulSoup库详解及其基本使用方法 什么是BeautifulSoup 灵活又方便的网页解析库,处理高效,支持多种解析器.利用它不用编写正则表达式即可方便实现网页信息的提取库. ...

  8. python爬虫入门四:BeautifulSoup库(转)

    正则表达式可以从html代码中提取我们想要的数据信息,它比较繁琐复杂,编写的时候效率不高,但我们又最好是能够学会使用正则表达式. 我在网络上发现了一篇关于写得很好的教程,如果需要使用正则表达式的话,参 ...

  9. python之BeautifulSoup库

    1. BeautifulSoup库简介 和 lxml 一样,Beautiful Soup 也是一个HTML/XML的解析器,主要的功能也是如何解析和提取 HTML/XML 数据.lxml 只会局部遍历 ...

随机推荐

  1. SQL Sever数据库的基本操作和它的建立

    SQL数据库: 1.数据库概述 (1) 用自定义文件格式保存数据的劣势. (2) DBMS(DataBase Management System,数据库管理系统)和数据库,平时谈到"数据库& ...

  2. ThinkPhp框架:父类及表单验证

    这个知识点,就可以通过"登录"和"注册"的页面来学习这个知识点了首先先做一个"登录"功能一.登录功能(父类)(1)登录的控制器在我的控制器文 ...

  3. 深入理解ajax系列第五篇——进度事件

    前面的话 一般地,使用readystatechange事件探测HTTP请求的完成.XHR2规范草案定义了进度事件Progress Events规范,XMLHttpRequest对象在请求的不同阶段触发 ...

  4. H5 内联 SVG

    HTML5 内联 SVG HTML5 画布 HTML5 画布 vs SVG HTML5 支持内联 SVG. 什么是SVG? SVG 指可伸缩矢量图形 (Scalable Vector Graphics ...

  5. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt".

    昨天遇到一个比较奇怪的问题,运行VS2010调试程序的时候,总是会报一个错,然后程序就挂掉了:无可用源….,弹出一个窗口提示:System.AccessViolationException: Atte ...

  6. cuda编程学习1——hello world!

    将c程序最简单的hello world用cuda编写在GPU上执行,以下为代码: #include<iostream>using namespace std;__global__ void ...

  7. 配置uwsgi

    首先要明确的是,如果你喜欢用命令行的方式(如shell)敲命令,那可以省去任何配置. 但是,绝大多数人,还是不愿意记那么长的命令,反复敲的.所以uwsgi里,就给大家提供了多种配置,省去你启动时候,需 ...

  8. 查看apache,mysql,nginx,php的编译参数

    查看nginx编译参数:/usr/local/nginx/sbin/nginx -V 查看apache编译参数:cat /usr/local/apache2/build/config.nice 查看m ...

  9. AutoMapper.RegExtension[.NET Core版本] 介绍

    Technorati 标签: AutoMapper.RegExtension,AutoMapper.RegExtension .NET CORE AutoMapper.RegExtension 为一个 ...

  10. javaWEB与JSP指令

    JSP三大指令  一个jsp页面中,可以有0~N个指令的定义!1. page --> 最复杂:<%@page language="java" info="xx ...