BeautifulSoup的高级应用之.parent .parents .next_sibling.previous_sibling.next_siblings.previous

继上一篇BeautifulSoup的高级应用，主要解说的是contents children descendants string strings stripped_strings。本篇主要解说.parent .parents .next_sibling .previous_sibling .next_siblings .previous_siblings

本篇博客继续使用上篇的html页面内容：

html_doc = """

<html>

<head><title>The Dormouse's story</title></head>

<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were

<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,

<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and

<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p>

<p class="story">...</p>

</html>"""

继续分析文档树 ,每一个 tag或字符串都有父节点 :被包括在某个 tag中

.parent:

通过 .parent 属性来获取某个元素的父节点.在样例html文档中,标签是标签的父节点:

title_tag = soup.title

title_tag

# <title>The Dormouse's story</title>

title_tag.parent

# <head><title>The Dormouse's story</title></head>

文档title的字符串也有父节点:标签

title_tag.string.parent

# <title>The Dormouse's story</title>

文档的顶层节点比方的父节点是 BeautifulSoup 对象:

html_tag = soup.html

type(html_tag.parent)

# <class 'bs4.BeautifulSoup'>

BeautifulSoup 对象的 .parent 是None。

.parents：

通过元素的.parents属性能够递归得到元素的全部父辈节点 , 以下的样例使用了 .parents方法遍历了标签到根节点的全部节点：

link = soup.a

link

# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

for parent in link.parents:

    if parent is None:

        print(parent)

    else:

        print(parent.name)

# p

# body

# html

# [document]

# None

兄弟节点：

举例说明：

<a>

    <b>text1</b>

    <c>text2</c>

</a>

这里的b和c节点为兄弟节点

.next_sibling 和 .previous_sibling .：

在文档树中 ,使用 .next_sibling 和 .previous_sibling 属性来查询兄弟节点：

sibling_soup = BeautifulSoup("<a><b>text1</b><c>text2</c></b></a>")

sibling_soup.b.next_sibling

# <c>text2</c>

sibling_soup.c.previous_sibling

# <b>text1</b>

b 标签有.next_sibling 属性 ,可是没有 .previous_sibling 属性 ,由于 b标签在同级节点中是第一个 .同理 ,c标签有 .previous_sibling 属性 ,却没有 .next_sibling 属性。

link = soup.a link

# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

link.next_sibling

# u',\n'

注意：第一个a标签的next_sibling 属性值为。\n

link.next_sibling.next_sibling

# <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>

第一个a标签的next_sibling的next_sibling 属性值为Lacie

.next_siblings 和 .previous_siblings.：

通过 .next_siblings 和 .previous_siblings 属性对当前节点的兄弟节点迭代输出：

for sibling in soup.a.next_siblings:

    print(repr(sibling)) # u',\n'

# <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>

# u' and\n'

# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>

# u'; and they lived at the bottom of a well.'

# None

for sibling in soup.find(id="link3").previous_siblings:                                 print(repr(sibling))

# ' and\n'

# <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>

# u',\n'

# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

# u'Once upon a time there were three little sisters; and their names were\n'

# None

回退和前进：

举例html例如以下：

<html><head><title>The Dormouse's story</title></head> <p class="title"><b>The Dormouse's story</b></p>

HTML 解析器把这段字符串转换成一连的事件 : “ 打开标签 ”加入一段字符串 ”,关闭标签 ”,”打开

标签 ”, 等.Beautiful Soup提供了重现解析器初始化过程的方法

.next_element 和 .previous_element .

.next_element 属性指向解析过程中下一个被的对象 (字符串或 tag),结果可能与 .next_sibling 同样 ,但一般是不一样的 .

last_a_tag = soup.find("a", id="link3")

last_a_tag

# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>

last_a_tag.next_sibling

# '; and they lived at the bottom of a well.'

但这个标签的 .next_element 属性结果是在标签被解析之后的内容 ,不是标签后的句子部分 ,应该是字符串 ”Tillie”:

last_a_tag.next_element

# u'Tillie'

.previous_element 属性刚好与.next_element 相反 ,它指向当前被解析的对象的前一个解析对象 :

last_a_tag.previous_element

# u' and\n'

last_a_tag.previous_element.next_element

# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>

.next_elements 和 .previous_elements：

通过 .next_elements 和 .previous_elements 的迭代器就能够向前或后訪问文档解析内容 ,就好像文档正在被解析一样 :

for element in last_a_tag.next_elements:                  print(repr(element))

# u'Tillie'

# u';\nand they lived at the bottom of a well.'

# u'\n\n'

# <p class="story">...</p>

# u'...'

# u'\n'

# None

下一篇将解说一下BeautifulSoup的搜索文档树的高级方法。

BeautifulSoup的高级应用之.parent .parents .next_sibling.previous_sibling.next_siblings.previous_siblings的更多相关文章

jQuery查找——parent/parents/parentsUntil/closest
jquery的parent(),parents(),parentsUntil(),closest()都是向上查找父级元素,具体用法不同 parent():取得一个包含着所有匹配元素的唯一父元素的元素集 ...
BeautifulSoup的高级应用之 contents children descendants string strings stripped_strings
继上一节.BeautifulSoup的高级应用之 find findAll,这一节,主要解说BeautifulSoup有关的其它几个重要应用函数. 本篇中,所使用的html为: html_doc = ...
jQuery 利用 parent() parents() 寻找父级或祖宗元素
$(this).parent().parent().parent().parent().parent().remove(); //此方法通过parent()一级一级往上找 $(this).pare ...
jquery parent() parents() closest()区别
分类: 前端开发 parent是找当前元素的第一个父节点,不管匹不匹配都不继续往下找 parents是找当前元素的所有父节点 closest() 是找当前元素的所有父节点 ,直到找到第一个匹配的父节 ...
parent,parents和closest
1.parent parent() 获得当前匹配元素集合中每个元素的父元素,使用选择器进行筛选是可选的. <ul id="menu" style="width:10 ...
[转载]JQuery.closest(),parent(),parents()寻找父节点
1.通过item-1查找 level-3(查找直接上级) $('li.item-1').closest('ul') $('li.item-1').parent() $('li.item-1').par ...
jquery 常用选择器回顾 ajax() parent() parents() children() siblings() find() eq() has() filter() next()
1. $.ajax() ajax 本身是异步操作,当需要将异步改为同步时: async: false 2.parent() 父级元素和 parents() 祖先元素的区别 parent ...
parent() parents() parentsUntil()三者之间的对比
$(document).ready(function(){ $("span").parent(); });只拿到span的父级标签 $(document).ready(functi ...
初识Python和使用Python爬虫
一.python基础知识了解: 1.特点: Python的语言特性: Python是一门具有强类型(即变量类型是强制要求的).动态性.隐式类型(不需要做变量声明).大小写敏感(var和VAR代表 ...

随机推荐

Spring Security中的MD5盐值加密
在 spring Security 文档中有这么一句话: "盐值的原理非常简单,就是先把密码和盐值指定的内容合并在一起,再使用md5对合并后的内容进行演算,这样一来,就算密码是一个很常见的字 ...
POJ 2369
我们知道,当循环长度为L时,置换群幂次为K ,则结果是GCD(L,K)个积相乘. 于是,我们只需要求出每个循环的长度,求得它们的最小公倍数即为解. #include <iostream> ...
string 简单实现
namespace ss{ class string { friend ostream& operator <<(ostream&, const string&); ...
调用支付宝SDK问题
近期做了一个项目里面要有支付.银联.支付宝,微信支付我先一个一个写吧先说支付宝SDK 支付宝SDK放进project里面之后肯定会报错.这时候你就要一个一个改掉 1. 2. 3. 哎我懒得写了. ...
Android程序之全国天气预报查询(聚合数据开发)
一.项目演示效果例如以下: 项目源码下载地址: http://pan.baidu.com/s/1pL6o5Mb password:5myq 二.使用聚合数据SDK: (1)聚合数据官网地址:http ...
判断QString是否为纯数字，查找自身最长重复子字符串
1.判断QString是否为纯数字 bool IsDigitString(QString strSource) { bool bDigit = false; if (strSource.isEmpty ...
How to resolve unassigned shards in Elasticsearch——写得非常好
How to resolve unassigned shards in Elasticsearch 转自:https://www.datadoghq.com/blog/elasticsearch-un ...
CxImage内存方式转换图像
最近,处于项目需要,需要将Bmp转换为JPEG格式.以前做过,采用的是GDI+的方式,该方式有一个极大地缺陷为无法实现跨平台处理.闲话少说,进入正题. CxImage cxImageBmp(pRGBB ...
JavaScript学习记录一
title: JavaScript学习记录一 toc: true date: 2018-09-11 18:26:52 --<JavaScript高级程序设计(第2版)>学习笔记要多查阅M ...
javaBean 练习—封装学生信息
编写一个封装学生信息的JavaBean对象,在页面中调用该对象,并将学生信息输出在页面中. package com.sp.test; public class Student { private St ...

BeautifulSoup的高级应用 之.parent .parents .next_sibling.previous_sibling.next_siblings.previous_siblings

BeautifulSoup的高级应用 之.parent .parents .next_sibling.previous_sibling.next_siblings.previous_siblings的更多相关文章

随机推荐

热门专题

BeautifulSoup的高级应用之.parent .parents .next_sibling.previous_sibling.next_siblings.previous_siblings

BeautifulSoup的高级应用之.parent .parents .next_sibling.previous_sibling.next_siblings.previous_siblings的更多相关文章