XPath 获取两个node中间的HTML Nodes

2015-06-01 16:42 972人阅读评论(0) 收藏举报

//div[@id="Recipe"]//h5[contains(text(),"Ingredients")]/following-sibling::p[count(.|//div[@id="Recipe"]//h5[contains(text(),"Method")]/preceding-sibling::p) = count(//div[@id="Recipe"]//h5[contains(text(),"Method")]/preceding-sibling::p)]

In XPath 1.0 one way to do this is by using the Kayessian method for node-set intersection:

$ns1[count(.|$ns2) = count($ns2)]

The above expression selects exactly the nodes that are part both of the node-set $ns1 and the node-set $ns2.

To apply this to the specific question -- let's say we need to select all nodes between the 2nd and 3rd h3 element in the following XML document:

<html>

  <h3>Title T31</h3>

    <a31/>

    <b31/>

  <h3>Title T32</h3>

    <a32/>

    <b32/>

  <h3>Title T33</h3>

    <a33/>

    <b33/>

  <h3>Title T34</h3>

    <a34/>

    <b34/>

  <h3>Title T35</h3>

</html>

We have to substitute $ns1 with:

/*/h3[2]/following-sibling::node()

and to substitute $ns2 with:

/*/h3[3]/preceding-sibling::node()

Thus, the complete XPath expression is:

/*/h3[2]/following-sibling::node()

             [count(.|/*/h3[3]/preceding-sibling::node())

             =

              count(/*/h3[3]/preceding-sibling::node())

             ]

We can verify that this is the correct XPath expression:

<xsl:stylesheet version="1.0"

 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">

  <xsl:copy-of select=

   "/*/h3[2]/following-sibling::node()

             [count(.|/*/h3[3]/preceding-sibling::node())

             =

              count(/*/h3[3]/preceding-sibling::node())

             ]

   "/>

 </xsl:template>

</xsl:stylesheet>

When this transformation is applied on the XML document presented above, the wanted, correct result is produced:

<a32/>

<b32/>

II. XPath 2.0 solution:

Use the intersect operator:

   /*/h3[2]/following-sibling::node()

intersect

   /*/h3[3]/preceding-sibling::node()

XPath 获取两个node中间的HTML Nodes的更多相关文章

爬虫 xpath 获取方式
回顾 bs4 实例化bs对象,将页面源码数据加载到该对象中定位标签:find('name',class_='xxx') findall() select() 将标签中的文本内容获取 string t ...
Appium根据xpath获取控件
如文章< Appium基于安卓的各种FindElement的控件定位方法实践>所述,Appium拥有众多获取控件的方法.其中一种就是根据控件所在页面的XPATH来定位控件. 本文就是尝试通 ...
Appium依据xpath获取控件实例随笔
如文章<Appium基于安卓的各种FindElement的控件定位方法实践>所述,Appium拥有众多获取控件的方法.当中一种就是依据控件所在页面的XPATH来定位控件. 本文就是尝试通过 ...
【转】Appium根据xpath获取控件实例随笔
原文地址:http://blog.csdn.net/zhubaitian/article/details/39754233 如文章<Appium基于安卓的各种FindElement的控件定位方法 ...
Appium根据xpath获取控件实例随笔
如文章<Appium基于安卓的各种FindElement的控件定位方法实践>所述,Appium拥有众多获取控件的方法.其中一种就是根据控件所在页面的XPATH来定位控件. 本文就是尝试通过 ...
使用python+xpath 获取https://pypi.python.org/pypi/lxml/2.3/的下载链接
使用python+xpath 获取https://pypi.python.org/pypi/lxml/2.3/的下载链接: 使用requests获取html后,分析html中的标签发现所需要的链接在& ...
Java 获取两个日期之间的日期
1.前期需求,两个日期,我们叫他startDate和endDate,然后获取到两个日期之间的日期 /** * 获取两个日期之间的日期 * @param start 开始日期 * @param end ...
xpath获取下一页,兄弟结点的妙用
第一页的情况: 第四页的情况 : 文章的链接: http://tech.huanqiu.com/science/2018-02/11605853_4.html 从上面我们可以看到,如果仅仅用xpat ...
JavaScript实现获取两个排序数组的中位数算法示例
本文实例讲述了JavaScript排序代码实现获取两个排序数组的中位数算法.分享给大家供大家参考,具体如下: 题目给定两个大小为 m 和 n 的有序数组 nums1 和 nums2 . 请找出这两个 ...

随机推荐

左神算法书籍《程序员代码面试指南》——2_02在单链表和双链表中删除倒数第k个字节
[题目]分别实现两个函数,一个可以删除单链表中倒数第K个节点,另一个可以删除双链表中倒数第K个节点.[要求]如果链表长度为N,时间复杂度达到O(N),额外空间复杂度达到O(1).[题解]从头遍历链表, ...
图像通道、Scalar、分离、合成通道
http://lib.csdn.net/article/opencv/33264 http://blog.csdn.net/laohu_tiger/article/details/17359777 h ...
Ubuntu中安装gdal python版本
安装过程: python包是从C++包中编译出来的,所以需要将源码下载进行编译安装 1.GDAL中的矢量数据处理OGR依赖于Geos,在安装GDAL之前要安装Geos Geos的下载地址:http:/ ...
Spring_boot_pom.xml和启动方式
spring-boot-starter-parent 整合第三方常用框架信息(各种依赖信息) spring-boot-starter-web 是Springboot整合SpringMvc Web ...
div代码大全 DIV代码使用说明
一.DIV代码语法 - TOP DIV代码是放入小于与大于符号内,即“<div>”. DIV是一对闭合标签,即“”开始,“结束”的盒子标签. 语法结构: <div>我是内容&l ...
python基础--字符编码以及文件操作
字符编码: 1.运行程序的三个核心硬件:cpu.内存.硬盘任何一个程序要是想要运算,肯定是先从硬盘加载到当前的内存中,然后cpu根据指定的指令去执行操作 2.python解释器运行一个py文件的步骤 ...
TP5中隐藏入口文件的问题 - CSDN博客
使用phpstudy和linux部署的时候 tp5中的官方说明是在入口文件的同级目录下添加一个.htaccess文件文件内容如下: <IfModule mod_rewrite.c>Opt ...
js 正则去除html代码
function delHtmlTag(str){ return str.replace(/<[^>]+>/g,"");//去掉所有的html标记 }
Django项目：CRM(客户关系管理系统)--14--06PerfectCRM实现King_admin注册功能获取内存优化处理
<th >{% get_app_name admin_class.model %}{{ admin_class }} </th> #kingadmin_tags.py # —— ...
python中bisect模块的使用
一般用于二分查找, 当然列表应该是有序表参考于: http://blog.csdn.net/xiaocaiju/article/details/6975714

XPath 获取两个node中间的HTML Nodes

XPath 获取两个node中间的HTML Nodes

XPath 获取两个node中间的HTML Nodes的更多相关文章

随机推荐

热门专题