Python lxml 使用

lxml，是python中用来处理xml和html的功能最丰富和易用的库

from lxml import etree

from lxml import html

h =  '''

<html>

　　<head>

　　　　<meta name="content-type" content="text/html; charset=utf-8" />

　　　　<title>友情链接查询 - 站长工具</title>

　　　　<!-- uRj0Ak8VLEPhjWhg3m9z4EjXJwc -->

　　　　<meta name="Keywords" content="友情链接查询" />

　　　　<meta name="Description" content="友情链接查询" />

　　</head>

　　<body>

　　　　<h1 class="heading">Top News</h1>

　　　　<p style="font-size: 200%">World News only on this page</p>

　　　　Ah, and here's some more text, by the way.

　　　　<p>... and this is a parsed fragment ...</p>

　　　　<a href="http://www.cydf.org.cn/" rel="nofollow" target="_blank">青少年发展基金会</a>

　　　　<a href="http://www.4399.com/flash/32979.htm" target="_blank">洛克王国</a>

　　　　<a href="http://www.4399.com/flash/35538.htm" target="_blank">奥拉星</a>

　　　　<a href="http://game.3533.com/game/" target="_blank">手机游戏</a>

　　　　<a href="http://game.3533.com/tupian/" target="_blank">手机壁纸</a>

　　　　<a href="http://www.4399.com/" target="_blank">4399小游戏</a>

　　　　<a href="http://www.91wan.com/" target="_blank">91wan游戏</a>

　　</body>

</html>

'''

# 第一种使用方法

page = etree.HTML(h)

#hrefs = page.xpath('//a')

href = page.cssselect('a')

for href in hrefs:

     print(href.attrib)

第二种使用方法

def parse_from():

    tree = html.fromstring(h)

    for href in tree.cssselect('a'):

    #for hfre in tree.xpath('//a'):

        a = href

        print(a.text)

        print(a.attrib)

paese_from()

parse_from()

Python lxml 使用的更多相关文章

python笔记：windows 下安装 python lxml
原文:http://blog.csdn.net/zhaokuo719/article/details/8209496 windows 环境下安装 lxml python 1.首先保证你的python ...
python lxml install
之前记得安装libxslt和libxml yum install libxml* -yyum install libxslt* -y wget http://lxml.de/files/lxml-3. ...
Windows下安装Python lxml库（无废话版）
python官网:python-2.7.12.amd64.msihttps://pypi.python.org/pypi/setuptools:setuptools-28.6.0.zipsetupto ...
python lxml教程
目前有很多xml,html文档的parser,如标准库的xml.etree , beautifulsoup , 还有lxml. 都用下来感觉lxml不错,速度也还行,就他了. 围绕三个问题: 问题 ...
python lxml库生成xml文件-节点命名空间问题
lxml库,处理xml很强大,官方文档:https://lxml.de/tutorial.html#namespaces 例如: 我们要生成如下格式的报文: <ttt:jesson xmlns: ...
python处理xml的常用包（lib.xml、ElementTree、lxml）
python处理xml的三种常见机制 dom(随机访问机制) sax(Simple APIs for XML,事件驱动机制) etree python处理xml的三种包标准库中的xml Fredri ...
python网络爬虫之LXML与HTMLParser
Python lxml包用于解析html和XML文件,个人觉得比beautifulsoup要更灵活些 Lxml中的路径表达式如下: 在下面的表格中,我们已列出了一些路径表达式以及表达式的结果: 路径表 ...
python爬微信公众号前10篇历史文章（3）-lxml&xpath初探
理解lxml以及xpath 什么是lxml? python中用来处理XML和HTML的library.与其他相比,它能提供很好的性能, 并且它支持XPath. 具体可以查看官方文档->http: ...
Python爬虫基础之lxml
一.Python lxml的基本应用 <html> <head> <title> The Dormouse's story </title> </ ...

随机推荐

win10 在当前目录下打开cmd
windows7 : 按住 shift 键 , 右键就有该选项. windows10 : 运行下面的注册表文件: 该注册表文件内容如下: Windows Registry Editor Versi ...
vue 跨域：使用vue-cli 配置 proxyTable 实现跨域问题
路径在/config/index.js 中,找到dev.proxyTable.如下配置示例: proxyTable: { '/api': { // 我要请求的地址 target: 'http://oa ...
hdu 1829 &poj 2492 A Bug's Life（推断二分图、带权并查集）
A Bug's Life Time Limit: 15000/5000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others) To ...
java代码格式化
Java source formatting You are probably familiar with the Eclipse hotkeys to automatically format yo ...
Git merge two repositories (ZZ)
转自 https://stackoverflow.com/questions/2428137/how-to-rebase-one-git-repository-onto-another-one If ...
Anaconda+Tensorflow环境安装与配置（转载）
Anaconda+Tensorflow环境安装与配置转载请注明出处:http://www.cnblogs.com/willnote/p/6746499.html Anaconda安装在清华大学 T ...
随机生成数，摘自算法竞赛入门经典P120-P123测试STL。
//#include<bits/stdc++.h> #include<cstring> #include<iostream> #include<cstdio& ...
通过Get方式传递数据
1:因为get传参数有个特点就是不能超过256字节.如果数据大的话会溢出. 解决办法: $data=json_encode($data_array); 然后在拼接超链接: <a href=&qu ...
asp.net 页面延时五秒，跳转到另外的页面
asp.net 页面延时五秒,跳转到另外的页面的实现代码. --前台 <%@ Page Language="C#" AutoEventWireup="true&qu ...
java输出
把一个java对象转化成一个json字符串: JSON.toJSON(user); JSON.toJSONStringWithDateFormat(user, "yyyy-MM-dd HH: ...

Python lxml 使用

Python lxml 使用的更多相关文章

随机推荐

热门专题