Html Agility Pack - API
Parser
Selectors
Manipulation
Traversing
Writer
Utilities
Attributes

HTML Parser

HTML Parser allow you to parse HTML and return an HtmlDocument.

Html Parser
Name Description
From File Loads an HTML document from a file.
From String Loads the HTML document from the specified string.
From Web Gets an HTML document from an Internet resource.
From Browser Gets an HTML document from a WebBrowser.

Load Html From String

HtmlDocument.LoadHtml method loads the HTML document from the specified string.

Example

The following example loads an Html from the specified string.

var html = @"<!DOCTYPE html>
<html>
<body>
<h1>This is <b>bold</b> heading</h1>
<p>This is <u>underlined</u> paragraph</p>
<h2>This is <i>italic</i> heading</h2>
</body>
</html> "; var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html); var htmlBody = htmlDoc.DocumentNode.SelectSingleNode("//body"); Console.WriteLine(htmlBody.OuterHtml);

HTML Selectors

Selectors allow you to select HTML node from HtmlDocument.

Methods
Name Description
SelectNodes() Selects a list of nodes matching the XPath expression.
SelectSingleNode(String) Selects the first XmlNode that matches the XPath expression.

HTML SelectSingleNode

SelectSingleNode Method

Selects first HtmlNode matching the HtmlAgilityPack.HtmlNode.XPath expression.

Parameters:

xpath: The XPath expression. May not be null.

Returns:

The first HtmlAgilityPack.HtmlNode that matches the XPath query or a null reference if no matching node was found.

Examples

The following example selects the first node matching the XPath expression using SelectNodes method.

var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html); string name = htmlDoc.DocumentNode
.SelectSingleNode("//td/input")
.Attributes["value"].Value;

///如果用child.SelectSingleNode("//*[@class=\"titlelnk\"]").InnerText这样的方式查询,是永远以整个document为基准来查询,
///这点就不好,理应以当前child节点的html为基准才对。 Write(sw, String.Format("推荐:{0}", hn.SelectSingleNode("//*[@class=\"diggnum\"]").InnerText));
Write(sw, String.Format("标题:{0}", hn.SelectSingleNode("//*[@class=\"titlelnk\"]").InnerText));
Write(sw, String.Format("介绍:{0}", hn.SelectSingleNode("//*[@class=\"post_item_summary\"]").InnerText));
Write(sw, String.Format("信息:{0}", hn.SelectSingleNode("//*[@class=\"post_item_foot\"]").InnerText));

HTML Manipulation

Traversing allow you to traverse through HTML node.

Properties
Name Description
InnerHtml Gets or Sets the HTML between the start and end tags of the object.
InnerText Gets the text between the start and end tags of the object.
OuterHtml Gets the object and its content in HTML.
ParentNode Gets the parent of this node (for nodes that can have parents).
Methods
Name Description
AppendChild() Adds the specified node to the end of the list of children of this node.
AppendChildren() Adds the specified node to the end of the list of children of this node.
Clone() Creates a duplicate of the node
CloneNode(Boolean) Creates a duplicate of the node.
CloneNode(String) Creates a duplicate of the node and changes its name at the same time.
CloneNode(String, Boolean) Creates a duplicate of the node and changes its name at the same time.
CopyFrom(HtmlNode) Creates a duplicate of the node and the subtree under it.
CopyFrom(HtmlNode, Boolean) Creates a duplicate of the node.
CreateNode() Creates an HTML node from a string representing literal HTML.
InsertAfter() Inserts the specified node immediately after the specified reference node.
InsertBefore Inserts the specified node immediately before the specified reference node.
PrependChild Adds the specified node to the beginning of the list of children of this node.
PrependChildren Adds the specified node list to the beginning of the list of children of this node.
Remove Removes node from parent collection
RemoveAll Removes all the children and/or attributes of the current node.
RemoveAllChildren Removes all the children of the current node.
RemoveChild(HtmlNode) Removes the specified child node.
RemoveChild(HtmlNode, Boolean) Removes the specified child node.
ReplaceChild() Replaces the child node oldChild with newChild node.

HTML Traversing

Traversing allow you to traverse through HTML node.

Properties
Name Description
ChildNodes Gets all the children of the node.
FirstChild Gets the first child of the node.
LastChild Gets the last child of the node.
NextSibling Gets the HTML node immediately following this element.
ParentNode Gets the parent of this node (for nodes that can have parents).
Methods
Name Description
Ancestors() Gets all the ancestor of the node.
Ancestors(String) Gets ancestors with matching name.
AncestorsAndSelf() Gets all anscestor nodes and the current node.
AncestorsAndSelf(String) Gets all anscestor nodes and the current node with matching name.
DescendantNodes Gets all Descendant nodes for this node and each of child nodes
DescendantNodesAndSelf Returns a collection of all descendant nodes of this element, in document order
Descendants() Gets all Descendant nodes in enumerated list
Descendants(String) Get all descendant nodes with matching name
DescendantsAndSelf() Returns a collection of all descendant nodes of this element, in document order
DescendantsAndSelf(String) Gets all descendant nodes including this node
Element Gets first generation child node matching name
Elements Gets matching first generation child nodes matching name

HTML Writer

Save HtmlDocument && Write HtmlNode

HtmlDocument - Methods
Name Description
Save(Stream) Saves the HTML document to the specified stream.
Save(StreamWriter) Saves the HTML document to the specified StreamWriter.
Save(TextWriter) Saves the HTML document to the specified TextWriter.
Save(String) Saves the mixed document to the specified file.
Save(XmlWriter) Saves the HTML document to the specified XmlWriter.
Save(Stream, Encoding) Saves the HTML document to the specified stream.
Save(String, Encoding) Saves the mixed document to the specified file.
HtmlNode - Methods
Name Description
WriteContentTo() Saves all the children of the node to a string.
WriteContentTo(TextWriter) Saves all the children of the node to the specified TextWriter.
WriteTo() Saves the current node to a string.
WriteTo(TextWriter) Saves the current node to the specified TextWriter.
WriteTo(XmlWriter) Saves the current node to the specified XmlWriter.

HTML Utilities

HtmlDocument Utilities

HtmlDocument Methods
Name Description
DetectEncoding(Stream) Detects the encoding of an HTML stream.
DetectEncoding(TextReader) Detects the encoding of an HTML text provided on a TextReader.
DetectEncoding(String) Detects the encoding of an HTML file.
DetectEncodingAndLoad(String) Detects the encoding of an HTML document from a file first, and then loads the file.
DetectEncodingAndLoad(String, Boolean) Detects the encoding of an HTML document from a file first, and then loads the file.

HTML Attributes

Traversing allow you to traverse through HTML node.

Methods
Name Description
Add(HtmlAttribute) Adds supplied item to collection
Add(String, String) Adds a new attribute to the collection with the given values
Append(String) Creates and inserts a new attribute as the last attribute in the collection.
Append(HtmlAttribute) Inserts the specified attribute as the last attribute in the collection.
Append(String, string) Creates and inserts a new attribute as the last attribute in the collection.
Remove() Removes all attributes from the collection
Remove(String) Removes an attribute from the list, using its name. If there are more than one attributes with this name, they will all be removed.
Remove(HtmlAttribute) Removes a given attribute from the list.
RemoveAll() Remove all attributes in the list.
RemoveAt() Removes the attribute at the specified index.
SetAttributeValue() Helper method to set the value of an attribute of this node. If the attribute is not found, it will be created automatically.

Html Agility Pack - API的更多相关文章

  1. C# 网络爬虫利器之Html Agility Pack如何快速实现解析Html

    简介 现在越来越多的场景需要我们使用网络爬虫,抓取相关数据便于我们使用,今天我们要讲的主角Html Agility Pack是在爬取的过程当中,能够高效的解析我们抓取到的html数据. 优势 在.NE ...

  2. Html Agility Pack 解析Html

    Hello 好久不见 哈哈,今天给大家分享一个解析Html的类库 Html Agility Pack.这个适用于想获取某网页里面的部分内容.今天就拿我的Csdn的博客列表来举例. 打开页面  用Fir ...

  3. 开源项目Html Agility Pack实现快速解析Html

    这是个很好的的东西,以前做Html解析都是在用htmlparser,用的虽然顺手,但解析速度较慢,碰巧今天找到了这个,就拿过来试,一切出乎意料,非常爽,推荐给各位使用. 下面是一些简单的使用技巧,希望 ...

  4. Html Agility Pack基础类介绍及运用

    第一篇只对Html Agility Pack做了一个大概的介绍,在接下来的章节会比较深入的介绍Html Agility Pack. Html Agility Pack 源码中的类大概有28个左右,其实 ...

  5. HTML WEB 和HTML Agility Pack结合

    现在,在不少应用场合中都希望做到数据抓取,特别是基于网页部分的抓取.其实网页抓取的过程实际上是通过编程的方法,去抓取不同网站网页后,再进行分析筛选的过程.比如,有的比较购物网站,会同时去抓取不同购物网 ...

  6. 一款很不错的html转xml工具-Html Agility Pack

    之前发个一篇关于实现html转成xml的劣作<实现html转Xml>,受到不少网友的关心.该实现方法是借助htmlparser去分解html内容,然后按照dom的结构逐个生成xml字符串. ...

  7. Html Agility Pack解析HTML页

    文章来源:Html Agility Pack解析HTML页 现在,在不少应用场合中都希望做到数据抓取,特别是基于网页部分的抓取.其实网页抓取的过程实际上是通过编程的方法,去抓取不同网站网页后,再进行分 ...

  8. C#解析HTML利器-Html Agility Pack

    今天刚开始做毕设....好吧,的确有点晚.我的毕设设计需要爬取豆瓣的电影推荐,于是就需要解析爬取下来的html,之前用Python玩过解析,但目前我使用的是C#,我觉得C#不比python差,有微软大 ...

  9. 强大而灵活的的Html解析器——Html Agility Pack

    一.概述 Html Agility Pack 简称HAP,是一个强大而灵活的解析Html DOM的.Net类库. 二.官方链接 官网:http://html-agility-pack.net/ NuG ...

随机推荐

  1. Windows8.1 关机异常的解决

    昨天电脑无法正常关机,关机后风扇仍然转,硬盘也在读写,等了很长时间都没有完全关机,只能强制关机.以前其他系统也遇到过这个问题,因此考虑还是驱动问题.回想了下之前装过VirtualBox,考虑到应该是V ...

  2. 【转】QT 串口QSerialPort + 解决接收数据不完整问题

    类:QSerialPort 例程:Examples\Qt-5.9.1\serialport\terminal,该例子完美展示了qt串口收发过程,直接在这上面修改就可以得到自己的串口软件.核心方法 // ...

  3. 构建配置 ProGuard Shrink 混淆和压缩

    官方文档 压缩代码和资源 要尽可能减小 APK 文件,您应该启用压缩来移除 release build 中未使用的代码和资源.此页面介绍如何执行该操作,以及如何指定要在构建时保留或舍弃的代码和资源. ...

  4. 【转载】神奇的css属性pointer-events

    绝对定位元素盖住链接或添加某事件handle的元素后,那么该链接的默认行为(页面跳转)或元素事件将不会被触发.现在Firefox3.6+/Safari4+/Chrome支持一个称为pointer-ev ...

  5. Impala 数值函数大全(转载)

    官网:https://www.cloudera.com/documentation/enterprise/latest/topics/impala_math_functions.html 转载链接1: ...

  6. POJ 1265 pick定理

    pick公式:多边形的面积=多边形边上的格点数目/2+多边形内部的格点数目-1. 多边形边上的格点数目可以枚举每条边求出.如果是水平或者垂直,显然可以得到,否则则是坐标差的最大公约数减1.(注这里是不 ...

  7. 不可不知的Python模块: collections

    原文:http://www.zlovezl.cn/articles/collections-in-python/ Python作为一个“内置电池”的编程语言,标准库里面拥有非常多好用的模块.比如今天想 ...

  8. js的技巧

    字典: if(tac["detail"][sid] || tac["detail"][sid]==0) //判断某项是否存在,0为真 tac["det ...

  9. 设置py文件的路径

    想在IDLE中打开py文件,需要设置PYTHONPATH环境变量: 设置后,就能在IDLE的Path Browser中看到sys.path了: 然后,就可以用import了

  10. Nginx 用log_format设置日志格式

    1.配置文件#vim /usr/local/nginx/conf/nginx.conflog_format access ‘$remote_addr – $remote_user [$time_loc ...