【python】lxml

lxml是python中处理xml的一个非常强大的库，可以非常方便的解析和生成xml文件。下面的内容翻译了链接中的一部分

1.生成空xml节点

from lxml import etree

root = etree.Element("root")

print(etree.tostring(root, pretty_print=True))

<root/>

2.生成xml子节点

from lxml import etree

root = etree.Element("root")

root.append(etree.Element("child1"))     #方法一

child2 = etree.SubElement(root, "child2")  #方法二

child2 = etree.SubElement(root, "child3")

print(etree.tostring(root))

<root>

  <child1/>

  <child2/>

  <child3/>

</root>

3.生成带内容的xml节点

from lxml import etree

root = etree.Element("root")

root.text = "Hello World"

print(etree.tostring(root, pretty_print=True))

<root>Hello World</root>

4.属性

lxml中将属性以字典的形式存储

生成属性

from lxml import etree

root = etree.Element("root", intersting = "totally")  #方法一

root.set("hello","huhu")  #方法二

root.text = "Hello World"

print(etree.tostring(root))

<root intersting="totally" hello="huhu">Hello World</root>

获取属性

方法一：

root.get("interesting")

root.get("hello")

totally

huhu

方法二：

attributes = root.attrib

print(attributes["interesting"])

遍历属性

for name, value in sorted(root.items()):

     print('%s = %r' % (name, value))

5.生成特殊内容

如下xml，中间的文字被 分割，需要用到.tail

<html><body>Hello<br/>World</body></html>

html = etree.Element("html")

body = etree.SubElement(html, "body")

body.text = "TEXT"

br = etree.SubElement(body, "br")

br.tail = "TAIL"

etree.tostring(html)

6.遍历

遍历节点

for element in root.iter():

     print("%s - %s" % (element.tag, element.text))

遍历指定子节点，将子节点名写入iter()

for element in root.iter("child"):

     print("%s - %s" % (element.tag, element.text))

7.用XPath查找节点内容

build_text_list = etree.XPath("//text()") # lxml.etree only!

print(build_text_list(html))

8.查找节点

iterfind():遍历所有节点匹配表达式

findall():返回满足匹配的节点列表

find():返回满足匹配的第一个

findtext():返回第一个满足匹配条件的.text内容

设有以下xml内容

root = etree.XML("<root><a x='123'>aText<b/><c/><b/></a></root>")

查找子节点

>>> print(root.find("b"))

None

>>> print(root.find("a").tag)

a

查找树中任意节点

>>> print(root.find(".//b").tag)

b

>>> [ b.tag for b in root.iterfind(".//b") ]

['b', 'b']

查找具有指定属性的节点

>>> print(root.findall(".//a[@x]")[0].tag)

a

>>> print(root.findall(".//a[@y]"))

[]

9.字符串解析为XML

>>> some_xml_data = "<root>data</root>"

>>> root = etree.fromstring(some_xml_data)

>>> print(root.tag)

root

>>> etree.tostring(root)

b'<root>data</root>'

10.使用E-factory快速生成XML和HTML

>>> from lxml.builder import E

>>> def CLASS(*args): # class is a reserved word in Python

        return {"class":' '.join(args)}

>>> html = page = (

    E.html(       # create an Element called "html"

      E.head(

        E.title("This is a sample document")

      ),

      E.body(

        E.h1("Hello!", CLASS("title")),

        E.p("This is a paragraph with ", E.b("bold"), " text in it!"),

        E.p("This is another paragraph, with a", "\n      ",

          E.a("link", href="http://www.python.org"), "."),

        E.p("Here are some reservered characters: <spam&egg>."),

        etree.XML("<p>And finally an embedded XHTML fragment.</p>"),

      )

    )

  )

>>> print(etree.tostring(page, pretty_print=True))

<html>

  <head>

    <title>This is a sample document</title>

  </head>

  <body>

    <h1 class="title">Hello!</h1>

    <p>This is a paragraph with <b>bold</b> text in it!</p>

    <p>This is another paragraph, with a

      <a href="http://www.python.org">link</a>.</p>

    <p>Here are some reservered characters: &lt;spam&amp;egg&gt;.</p>

    <p>And finally an embedded XHTML fragment.</p>

  </body>

</html>

【python】lxml的更多相关文章

【python】lxml中多个xml采用相同节点时出现的问题
今天突然发现了一个lxml的坑. 假设我们有一个节点 <id>123</id> 有两个父节点都要用上述节点,则必须把上面的节点写两遍!用同一个会出错! 出错例子: #!/usr ...
【python】lxml查找属性为指定值的节点
假设有如下xml在/home/abc.xml位置 <A> <B id=" name= ...
【python】lxml处理命名空间
有如下xml <A xmlns="http://This/is/a/namespace"> dataB1 dat ...
【python】自动更新pu口袋校园活动
[python]自动更新pu口袋校园活动脚本目标: 1. 自动爬取pu口袋校园活动,筛选出需要的活动,此处我的筛选条件是线上活动,因为可以不用去就可以白嫖学时 2. 自动发送邮件到QQ邮箱,每次只发 ...
【Python②】python之首秀
第一个python程序再次说明:后面所有代码均为Python 3.3.2版本(运行环境:Windows7)编写. 安装配置好python后,我们先来写第一个python程序.打开IDLE (P ...
【python】多进程锁multiprocess.Lock
[python]多进程锁multiprocess.Lock 2013-09-13 13:48 11613人阅读评论(2) 收藏举报分类: Python(38) 同步的方法基本与多线程相同. ...
【python】SQLAlchemy
来源:廖雪峰对比:[python]在python中调用mysql 注意连接数据库方式和数据操作方式! 今天发现了个处理数据库的好东西:SQLAlchemy 一般python处理mysql之类的数据库 ...
【Python】如何安装easy_install?
[Python]如何安装easy_install? http://jingyan.baidu.com/article/b907e627e78fe146e7891c25.html easy_instal ...

随机推荐

Eclipse 自动补全功能失效解决办法及修改快捷键方法
最近在学习Java,前段时间分盘把电脑能坏了,重装系统后发现我的Eclipse的自动补全的功能失效了,那多麻烦呀,什么都得自己打,于是百度后总结了以下解决方法: 1.点击Window-->Pre ...
ThinkPHP报错处理
1,当运行结果提示:找不到该页面(控制器),怎么办? 建造一个空页面:EmptyController <?php namespace Home\Controller; use Think\Con ...
jQuery Ajax无刷新操作一般处理程序 ashx
//前台实例代码 aspx文件 <html xmlns="http://www.w3.org/1999/xhtml"> <head runat="ser ...
iOS网络学习之“远离NSURLConnection 走进NSURLSession”
目前,在iOS的开发中,NURLConnection已经成为了过去式,现在的NSURLConnection已经deprected(iOS7之后),取而代之的是NSURLSession.而且AFNetw ...
C++虚函数、虚继承、对象内存模型（转）
参考:http://blog.csdn.net/hxz_qlh/article/details/14633361 需要注意的是虚继承.多重继承时类的大小.
Delphi中Interface接口的使用方法
示例注释(现在应该知道的): { 1.接口命名约定 I 起头, 就像类从 T 打头一样. 2.接口都是从 IInterface 继承而来; 若是从根接口继承, 可省略. 3.接口成员只能是 ...
2.实现Express中间件
Express提供的大部分功能都是通过中间件函数完成,这些中间件函数在Node.js收到请求的时点和发送响应的时点执行 connect模块提供了中间件框剪方便在全局或路径级别或为单个路由插入 ...
c++ SOA Axis2c 编译安装
Axis2C 安装过程 1设置环境变量 export AXIS2C_HOME=/usr/local/axis2c 2.下载源码包解压编译安装 cd axis2c-src-1.6.0 ./configu ...
史上最详细的CocoaPods安装教程
虽然网上关于CocoaPods安装教程多不胜数,但是我在安装的过程中还是出现了很多错误,所以大家可以照下来步骤装一下,我相信会很好用. 前言在iOS项目中使用第三方类库可以说是非常常见的事,但是要正 ...
springboot 整合Redis
0.导入 maven依赖 <dependency> <groupId>org.springframework.boot</groupId> <artifact ...

【python】lxml

【python】lxml的更多相关文章

随机推荐

热门专题