读BeautifulSoup官方文档之html树的修改

修改html树无非是对其中标签的改动, 改动标签的名字(也就是类型), 属性和标签里的内容... 先讲这边提供了很方便的方法来对其进行改动...

 soup = BeautifulSoup('<b class="boldest">Extremely bold</b>')

 tag = soup.b

 tag.name = "blockquote"

 tag['class'] = 'verybold'

 tag['id'] = 1

 tag

 # <blockquote class="verybold" id="1">Extremely bold</blockquote>

 del tag['class']

 del tag['id']

 tag

 # <blockquote>Extremely bold</blockquote>

然后是改动内容 :

markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'

soup = BeautifulSoup(markup)

tag = soup.a

tag.string = "New link text."

tag

# <a href="http://example.com/">New link text.</a>

当然你还可以用append(), 我让我奇怪的是使用append()之后的效果看上去是一样的, 但是调用.contents却会发现其实append()是在.contents代表的那个list中append. 另一方面, 你还可以用一个 NavigableString替代String, 也是一样的.

soup = BeautifulSoup("<a>Foo</a>")

soup.a.append("Bar")

soup

# <html><head></head><body><a>FooBar</a></body></html>

soup.a.contents

# [u'Foo', u'Bar']

soup = BeautifulSoup("<b></b>")

tag = soup.b

tag.append("Hello")

new_string = NavigableString(" there")

tag.append(new_string)

tag

# <b>Hello there.</b>

tag.contents

# [u'Hello', u' there']

当然你还可以用append()在tag内部添加注释 :

from bs4 import Comment

new_comment = Comment("Nice to see you.")

tag.append(new_comment)

tag

# <b>Hello there<!--Nice to see you.--></b>

tag.contents

# [u'Hello', u' there', u'Nice to see you.']

你甚至可以直接创建一个新的tag, 对于new_tag(), 它只有第一个参数是必须的 :

soup = BeautifulSoup("<b></b>")

original_tag = soup.b

new_tag = soup.new_tag("a", href="http://www.example.com")

original_tag.append(new_tag)

original_tag

# <b><a href="http://www.example.com"></a></b>

new_tag.string = "Link text."

original_tag

# <b><a href="http://www.example.com">Link text.</a></b>

除了append()还有insert(), 它的插入, 从原理上来看也是插入了.contents 返回的那个list.

markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'

soup = BeautifulSoup(markup)

tag = soup.a

tag.insert(1, "but did not endorse ")

tag

# <a href="http://example.com/">I linked to but did not endorse <i>example.com</i></a>

tag.contents

# [u'I linked to ', u'but did not endorse', <i>example.com</i>]

有关insert还有insert_before()和insert_after(), 就是插在当前调用tag的前面和后面.

soup = BeautifulSoup("<b>stop</b>")

tag = soup.new_tag("i")

tag.string = "Don't"

soup.b.string.insert_before(tag)

soup.b

# <b><i>Don't</i>stop</b>

soup.b.i.insert_after(soup.new_string(" ever "))

soup.b

# <b><i>Don't</i> ever stop</b>

soup.b.contents

# [<i>Don't</i>, u' ever ', u'stop']

clear()很简单, 就是把所调用标签的内容全部清除.

markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'

soup = BeautifulSoup(markup)

tag = soup.a

tag.clear()

tag

# <a href="http://example.com/"></a>

比clear()更神奇的是另外一个extract(), extract()能够讲所调用的tag从html树中抽出, 同时返回提取的tag, 此时原来html树的该tag被删除.

markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'

soup = BeautifulSoup(markup)

a_tag = soup.a

i_tag = soup.i.extract()

a_tag

# <a href="http://example.com/">I linked to</a>

i_tag

# <i>example.com</i>

print(i_tag.parent)

None

decompose()和extract()的区别就在于它完全毁掉所调用标签, 而不返回(这里要注意remove()是毁掉所调用标签的内容...).

markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'

soup = BeautifulSoup(markup)

a_tag = soup.a

soup.i.decompose()

a_tag

# <a href="http://example.com/">I linked to</a>

replace_with()能够将其调用标签调换成参数标签, 同时返回调用标签...(相当于比extract()多了一个insert()的步骤)

markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'

soup = BeautifulSoup(markup)

a_tag = soup.a

new_tag = soup.new_tag("b")

new_tag.string = "example.net"

a_tag.i.replace_with(new_tag)

a_tag

# <a href="http://example.com/">I linked to <b>example.net</b></a>

还有wrap()和unwrap(), 好像是用参数标签包住调用标签, 而unwrap()则用调用标签的内容替代调用标签本身.

 soup = BeautifulSoup("<p>I wish I was bold.</p>")

 soup.p.string.wrap(soup.new_tag("b"))

 # <b>I wish I was bold.</b>

 soup.p.wrap(soup.new_tag("div")

 # <div><p><b>I wish I was bold.</b></p></div>

 markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'

 soup = BeautifulSoup(markup)

 a_tag = soup.a

 a_tag.i.unwrap()

 a_tag

 # <a href="http://example.com/">I linked to example.com</a>

读BeautifulSoup官方文档之html树的修改的更多相关文章

读BeautifulSoup官方文档之html树的打印
prettify()能返回一个格式良好的html的Unicode字符串 : markup = '<a href="http://example.com/">I link ...
读BeautifulSoup官方文档之html树的搜索(1)
之前介绍了有关的四个对象以及他们的属性, 但是一般情况下要在杂乱的html中提取我们所需的tag(tag中包含的信息)是比较复杂的, 现在我们可以来看看到底有些什么搜索的方法. 最主要的两个方法当然是 ...
读BeautifulSoup官方文档之html树的搜索(2)
除了find()和find_all(), 这里还提供了许多类似的方法我就细讲了, 参数和用法都差不多, 最后四个是next, previous是以.next/previous_element()来说的 ...
读BeautifulSoup官方文档之与bs有关的对象和属性(1)
自从10号又是5天没更, 是, 我再一次断更... 原因是朋友在搞python, 老问我问题, 我python也是很久没碰了, 于是为了解决他的问题, 我只能重新开始研究python, 为了快速找回感 ...
读BeautifulSoup官方文档之与bs有关的对象和属性(2)
上一节说到tag, 这里接着讲, tag有个属性叫做string, tag.string其实就是我们要掌握的四个对象中的第二个 ---- NavigableString, 它代表的是该tag内的te ...
读BeautifulSoup官方文档之与bs有关的对象和属性(3)
上一节说到.string的条件很苛刻, 如果某个tag里面包含了超过一个children, 就会返回None, 但是这里提供另外一种方式 .strings, 它返回的是一个generator, 比如对 ...
读vue-cli3 官方文档的一些学习记录
原来一直以为vue@cli3 就是创建模板的工具,读了官方文档才知道原来这么有用,不少配置让我长见识了 Prefetch 懒加载配置懒加载相信大家都是知道的,使用Import() 语法就可以在需要的 ...
Beautifulsoup官方文档
Beautiful Soup 中文文档原文 by Leonard Richardson (leonardr@segfault.org) 翻译 by Richie Yan (richieyan@gma ...
读jQuery官方文档：$(document).ready()与避免冲突
$(document).ready() 通常你想在DOM结构加载完毕之后才执行相关脚本.使用原生JavaScript,你可能调用window.onload = function() { ... }, ...

随机推荐

[TypeScript] Find the repeated item in an array using TypeScript
Say you have an array that has at least one item repeated. How would you find the repeated item. Thi ...
js进阶正则表达式11RegExp的属性和方法（RegExp的属性和方法，就是RegExp对象.(点)什么的形式）（正则表达式执行之前会被编译）
js进阶正则表达式11RegExp的属性和方法(RegExp的属性和方法,就是RegExp对象.(点)什么的形式)(正则表达式执行之前会被编译) 一.总结 1. RegExp的属性和方法,就是RegE ...
web项目开启日志打印
原文链接:http://blog.csdn.net/qq_37936542/article/details/79045188 参考文章地址:点击打开链接,写的很清晰一:导入log4j包或依赖 ...
SPOJ4491. Primes in GCD Table（gcd(a,b)=d素数，（1<=a<=n,1<=b<=m））加强版
SPOJ4491. Primes in GCD Table Problem code: PGCD Johnny has created a table which encodes the result ...
Android 设置图片透明度
我了解的比较快捷的ImageView设置图片的透明度的方法有: setAlpha(); setImageAlpha(); getDrawable().setAlpha(). 其中setAlpha()已 ...
hibernate annotation 相关主键生成策略
Hibernate 默认的全面支持 13 物种生成策略 : 1. increment 2. identity 3. sequence 4. hilo 5. seqhilo 6. uuid 7. uu ...
Redux中reducer的翻译
reduce有归纳,简化的意思,所以reducer可翻译成归并函数的意思,其实没必要翻译,大体知道就可以了.
Android 如何检索Android设备的唯一ID
关于本文档 Android的开发者在一些特定情况下都需要知道手机中的唯一设备ID.例如,跟踪应用程序的安装,生成用于复制保护的DRM时需要使用设备的唯一ID.在本文档结尾处提供了作为参考的示例代码片段 ...
解决gdal集成libkml的链接错误
作者:朱金灿来源:http://blog.csdn.net/clever101 gdal库在集成libkml出现一些链接错误: 1>libkmldomD.lib(kml_factory.obj ...
WPF--3Dmax+blend+WPF综合运用
引自:http://blog.sina.com.cn/s/blog_95dbdf9e0100we3z.html 本人小菜,WPF刚入门,只是写一下最近的项目心得.欢迎各位前辈们前来拍砖指正,感激不敬! ...

读BeautifulSoup官方文档之html树的修改

读BeautifulSoup官方文档之html树的修改的更多相关文章

随机推荐

热门专题