perl HTML::TreeBuilder::XPath

HTML::TreeBuilder::XPath 添加XPath 支持HTML::TreeBuilder

use HTML::TreeBuilder::XPath;

my $tree= HTML::TreeBuilder::XPath->new;

$tree->parse_file( "mypage.html");

my $nb=$tree->findvalue( '/html/body//p[@class="section_title"]/span[@class="nb"]');

my $id=$tree->findvalue( '/html/body//p[@class="section_title"]/@id');

my $p= $html->findnodes( '//p[@id="toto"]')->[0];

my $link_texts= $p->findvalue( './a'); # the texts of all a elements in $p

$tree->delete; # to avoid memory leaks, if you parse many HTML documents



描述:



这个模块增加典型的XPath 到HTML::TreeBuilder, 让它容易查询文档



让它更加容易的查询一个文档。

方法:



额外的方法增加到树对象和每个元素



findnodes ($path)



返回在$path找到的节点的列表通过$path,在标量环境返回一个Tree::XPathEngine::NodeSet object.



findnodes_as_string ($path)



返回节点的文本值,作为一个字符串



findnodes_as_strings ($path)





返回结果节点的值的列表



findvalue ($path)





返回任何一个 Tree::XPathEngine::Literal, a Tree::XPathEngine::Boolean



或者一个Tree::XPathEngine::Number object.





如果path返回一个节点集,$nodeset->xpath_to_literal会被自动调用



(因此 a Tree::XPathEngine::Literal is returned)



注意每个对象字符串所带来的开销,



所以你只需要打印找到的值,或者



findvalues ($path)



返回匹配节点的值作为列表,这主要是和findnodes_as_strings一样,除了列表的元素是对象



exists ($path)



如果给定的path存在就返回true



matches($path)



返回真如果元素匹配路径



use LWP::UserAgent;

use HTML::TreeBuilder;

open DATAFH,">>data.html" || die "open data file failed:$!";

my $ua = LWP::UserAgent->new;

$ua->timeout(10);

$ua->env_proxy;

$ua->agent("Mozilla/8.0");

my $response = $ua->get('https://licai.yingyinglicai.com/product/list.htm');

if ($response->is_success) {

print DATAFH $response->decoded_content; # or whatever

# print $response->decoded_content; # or whatever

use HTML::TreeBuilder::XPath;

my $tree= HTML::TreeBuilder::XPath->new;

$tree->parse_file( "data.html");

##查找body内容,<td><div class="fresh"><p class="text-ellipsis-2"><i class="fresh-icon"></i><a href="/detail/11156-261-500-856-0544.htm">变现宝4275号</a></p></div></td>

my @nb=$tree->findvalue( '/html/body//div[@class="fresh"]');

foreach (@nb){print "Product is $_\n"};

}

else {

die $response->status_line;

};

~

~

~

perl HTML::TreeBuilder::XPath的更多相关文章

perl 登录某网站
<pre name="code" class="html">use Net::SMTP; use LWP::UserAgent; use HTTP: ...
perl 爬虫两个技巧
<pre name="code" class="cpp">jrhmpt01:/root/lwp# cat data.html <div cla ...
perl lwp 超时问题
lwp 超时问题: jrhmpt01:/root/async# cat a1.pl use LWP::UserAgent; use utf8; use DBI; use POSIX; use Data ...
perl 循环类选择器 ,爬取内容
jrhmpt01:/root/lwp/0526# cat 0526.txt <div class="TXD_sy_title"><span class=" ...
perl 爬取某理财网站产品信息
use LWP::UserAgent; use utf8; use DBI; $user="root"; $passwd="xxxxx"; $dbh=" ...
perl 爬取数据<1>
use LWP::UserAgent; use POSIX; use DBI; $user="root"; $passwd="11111111"; $dbh=& ...
perl 爬取csdn
<pre name="code" class="python">use LWP::UserAgent; use POSIX; use HTML::T ...
perl 爬取上市公司业绩预告
<pre name="code" class="python">use LWP::UserAgent; use utf8; use DBI; use ...
perl 爬取同花顺数据
use LWP::UserAgent; use utf8; use DBI; $user="root"; $passwd='xxx'; $dbh=""; $db ...

随机推荐

DragonBoard810使用记录
1. 执行~/workdir/Source_Package$ getSource_and_build.sh后该脚本先下载android仓库.repo到~目录,然后将android源码check out ...
JavaScript实现网页右下角弹出窗口代码
<script language="JavaScript"><!--var no = 50;var speed = 1;var ns4up = (document ...
Dungeon Game 解答
Question The demons had captured the princess (P) and imprisoned her in the bottom-right corner of a ...
Divide and Conquer.(Merge Sort) by sixleaves
algo-C1-Introductionhtml, body {overflow-x: initial !important;}html { font-size: 14px; }body { marg ...
使用linux系统做路由转发
使用linux系统(PC机)做路由转发关键字:linux,Fedora,route,iptables,ip_forward 最近做网络实验,在实验过程中需要用到linux的转发功能,但是遇到一些问题 ...
【HDU1856】More is better（并查集基础题）
裸并查集,但有二坑: 1.需要路径压缩,不写的话会TLE 2.根据题目大意,如果0组男孩合作的话,应该最大的子集元素数目为1.所以res初始化为1即可. #include <iostream&g ...
Java配置文件Properties的读取、写入与更新操作
/** * 实现对Java配置文件Properties的读取.写入与更新操作 */ package test; import java.io.BufferedInputStream; import j ...
2D和3D空间中计算两点之间的距离
自己在做游戏的忘记了Unity帮我们提供计算两点之间的距离,在百度搜索了下. 原来有一个公式自己就写了一个方法O(∩_∩)O~,到僵尸到达某一个点之后就向另一个奔跑过去 /// <summary ...
mmc加工配套问题
题目如下,本题还有其它解.
boost库asio详解1——strand与io_service区别
namespace { // strand提供串行执行, 能够保证线程安全, 同时被post或dispatch的方法, 不会被并发的执行. // io_service不能保证线程安全 boost::a ...

perl HTML::TreeBuilder::XPath

perl HTML::TreeBuilder::XPath的更多相关文章

随机推荐

热门专题