Html Agility Pack - API
Parser
Selectors
Manipulation
Traversing
Writer
Utilities
Attributes

HTML Parser

HTML Parser allow you to parse HTML and return an HtmlDocument.

Html Parser
Name Description
From File Loads an HTML document from a file.
From String Loads the HTML document from the specified string.
From Web Gets an HTML document from an Internet resource.
From Browser Gets an HTML document from a WebBrowser.

Load Html From String

HtmlDocument.LoadHtml method loads the HTML document from the specified string.

Example

The following example loads an Html from the specified string.

var html = @"<!DOCTYPE html>
<html>
<body>
<h1>This is <b>bold</b> heading</h1>
<p>This is <u>underlined</u> paragraph</p>
<h2>This is <i>italic</i> heading</h2>
</body>
</html> "; var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html); var htmlBody = htmlDoc.DocumentNode.SelectSingleNode("//body"); Console.WriteLine(htmlBody.OuterHtml);

HTML Selectors

Selectors allow you to select HTML node from HtmlDocument.

Methods
Name Description
SelectNodes() Selects a list of nodes matching the XPath expression.
SelectSingleNode(String) Selects the first XmlNode that matches the XPath expression.

HTML SelectSingleNode

SelectSingleNode Method

Selects first HtmlNode matching the HtmlAgilityPack.HtmlNode.XPath expression.

Parameters:

xpath: The XPath expression. May not be null.

Returns:

The first HtmlAgilityPack.HtmlNode that matches the XPath query or a null reference if no matching node was found.

Examples

The following example selects the first node matching the XPath expression using SelectNodes method.

var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html); string name = htmlDoc.DocumentNode
.SelectSingleNode("//td/input")
.Attributes["value"].Value;

///如果用child.SelectSingleNode("//*[@class=\"titlelnk\"]").InnerText这样的方式查询,是永远以整个document为基准来查询,
///这点就不好,理应以当前child节点的html为基准才对。 Write(sw, String.Format("推荐:{0}", hn.SelectSingleNode("//*[@class=\"diggnum\"]").InnerText));
Write(sw, String.Format("标题:{0}", hn.SelectSingleNode("//*[@class=\"titlelnk\"]").InnerText));
Write(sw, String.Format("介绍:{0}", hn.SelectSingleNode("//*[@class=\"post_item_summary\"]").InnerText));
Write(sw, String.Format("信息:{0}", hn.SelectSingleNode("//*[@class=\"post_item_foot\"]").InnerText));

HTML Manipulation

Traversing allow you to traverse through HTML node.

Properties
Name Description
InnerHtml Gets or Sets the HTML between the start and end tags of the object.
InnerText Gets the text between the start and end tags of the object.
OuterHtml Gets the object and its content in HTML.
ParentNode Gets the parent of this node (for nodes that can have parents).
Methods
Name Description
AppendChild() Adds the specified node to the end of the list of children of this node.
AppendChildren() Adds the specified node to the end of the list of children of this node.
Clone() Creates a duplicate of the node
CloneNode(Boolean) Creates a duplicate of the node.
CloneNode(String) Creates a duplicate of the node and changes its name at the same time.
CloneNode(String, Boolean) Creates a duplicate of the node and changes its name at the same time.
CopyFrom(HtmlNode) Creates a duplicate of the node and the subtree under it.
CopyFrom(HtmlNode, Boolean) Creates a duplicate of the node.
CreateNode() Creates an HTML node from a string representing literal HTML.
InsertAfter() Inserts the specified node immediately after the specified reference node.
InsertBefore Inserts the specified node immediately before the specified reference node.
PrependChild Adds the specified node to the beginning of the list of children of this node.
PrependChildren Adds the specified node list to the beginning of the list of children of this node.
Remove Removes node from parent collection
RemoveAll Removes all the children and/or attributes of the current node.
RemoveAllChildren Removes all the children of the current node.
RemoveChild(HtmlNode) Removes the specified child node.
RemoveChild(HtmlNode, Boolean) Removes the specified child node.
ReplaceChild() Replaces the child node oldChild with newChild node.

HTML Traversing

Traversing allow you to traverse through HTML node.

Properties
Name Description
ChildNodes Gets all the children of the node.
FirstChild Gets the first child of the node.
LastChild Gets the last child of the node.
NextSibling Gets the HTML node immediately following this element.
ParentNode Gets the parent of this node (for nodes that can have parents).
Methods
Name Description
Ancestors() Gets all the ancestor of the node.
Ancestors(String) Gets ancestors with matching name.
AncestorsAndSelf() Gets all anscestor nodes and the current node.
AncestorsAndSelf(String) Gets all anscestor nodes and the current node with matching name.
DescendantNodes Gets all Descendant nodes for this node and each of child nodes
DescendantNodesAndSelf Returns a collection of all descendant nodes of this element, in document order
Descendants() Gets all Descendant nodes in enumerated list
Descendants(String) Get all descendant nodes with matching name
DescendantsAndSelf() Returns a collection of all descendant nodes of this element, in document order
DescendantsAndSelf(String) Gets all descendant nodes including this node
Element Gets first generation child node matching name
Elements Gets matching first generation child nodes matching name

HTML Writer

Save HtmlDocument && Write HtmlNode

HtmlDocument - Methods
Name Description
Save(Stream) Saves the HTML document to the specified stream.
Save(StreamWriter) Saves the HTML document to the specified StreamWriter.
Save(TextWriter) Saves the HTML document to the specified TextWriter.
Save(String) Saves the mixed document to the specified file.
Save(XmlWriter) Saves the HTML document to the specified XmlWriter.
Save(Stream, Encoding) Saves the HTML document to the specified stream.
Save(String, Encoding) Saves the mixed document to the specified file.
HtmlNode - Methods
Name Description
WriteContentTo() Saves all the children of the node to a string.
WriteContentTo(TextWriter) Saves all the children of the node to the specified TextWriter.
WriteTo() Saves the current node to a string.
WriteTo(TextWriter) Saves the current node to the specified TextWriter.
WriteTo(XmlWriter) Saves the current node to the specified XmlWriter.

HTML Utilities

HtmlDocument Utilities

HtmlDocument Methods
Name Description
DetectEncoding(Stream) Detects the encoding of an HTML stream.
DetectEncoding(TextReader) Detects the encoding of an HTML text provided on a TextReader.
DetectEncoding(String) Detects the encoding of an HTML file.
DetectEncodingAndLoad(String) Detects the encoding of an HTML document from a file first, and then loads the file.
DetectEncodingAndLoad(String, Boolean) Detects the encoding of an HTML document from a file first, and then loads the file.

HTML Attributes

Traversing allow you to traverse through HTML node.

Methods
Name Description
Add(HtmlAttribute) Adds supplied item to collection
Add(String, String) Adds a new attribute to the collection with the given values
Append(String) Creates and inserts a new attribute as the last attribute in the collection.
Append(HtmlAttribute) Inserts the specified attribute as the last attribute in the collection.
Append(String, string) Creates and inserts a new attribute as the last attribute in the collection.
Remove() Removes all attributes from the collection
Remove(String) Removes an attribute from the list, using its name. If there are more than one attributes with this name, they will all be removed.
Remove(HtmlAttribute) Removes a given attribute from the list.
RemoveAll() Remove all attributes in the list.
RemoveAt() Removes the attribute at the specified index.
SetAttributeValue() Helper method to set the value of an attribute of this node. If the attribute is not found, it will be created automatically.

Html Agility Pack - API的更多相关文章

  1. C# 网络爬虫利器之Html Agility Pack如何快速实现解析Html

    简介 现在越来越多的场景需要我们使用网络爬虫,抓取相关数据便于我们使用,今天我们要讲的主角Html Agility Pack是在爬取的过程当中,能够高效的解析我们抓取到的html数据. 优势 在.NE ...

  2. Html Agility Pack 解析Html

    Hello 好久不见 哈哈,今天给大家分享一个解析Html的类库 Html Agility Pack.这个适用于想获取某网页里面的部分内容.今天就拿我的Csdn的博客列表来举例. 打开页面  用Fir ...

  3. 开源项目Html Agility Pack实现快速解析Html

    这是个很好的的东西,以前做Html解析都是在用htmlparser,用的虽然顺手,但解析速度较慢,碰巧今天找到了这个,就拿过来试,一切出乎意料,非常爽,推荐给各位使用. 下面是一些简单的使用技巧,希望 ...

  4. Html Agility Pack基础类介绍及运用

    第一篇只对Html Agility Pack做了一个大概的介绍,在接下来的章节会比较深入的介绍Html Agility Pack. Html Agility Pack 源码中的类大概有28个左右,其实 ...

  5. HTML WEB 和HTML Agility Pack结合

    现在,在不少应用场合中都希望做到数据抓取,特别是基于网页部分的抓取.其实网页抓取的过程实际上是通过编程的方法,去抓取不同网站网页后,再进行分析筛选的过程.比如,有的比较购物网站,会同时去抓取不同购物网 ...

  6. 一款很不错的html转xml工具-Html Agility Pack

    之前发个一篇关于实现html转成xml的劣作<实现html转Xml>,受到不少网友的关心.该实现方法是借助htmlparser去分解html内容,然后按照dom的结构逐个生成xml字符串. ...

  7. Html Agility Pack解析HTML页

    文章来源:Html Agility Pack解析HTML页 现在,在不少应用场合中都希望做到数据抓取,特别是基于网页部分的抓取.其实网页抓取的过程实际上是通过编程的方法,去抓取不同网站网页后,再进行分 ...

  8. C#解析HTML利器-Html Agility Pack

    今天刚开始做毕设....好吧,的确有点晚.我的毕设设计需要爬取豆瓣的电影推荐,于是就需要解析爬取下来的html,之前用Python玩过解析,但目前我使用的是C#,我觉得C#不比python差,有微软大 ...

  9. 强大而灵活的的Html解析器——Html Agility Pack

    一.概述 Html Agility Pack 简称HAP,是一个强大而灵活的解析Html DOM的.Net类库. 二.官方链接 官网:http://html-agility-pack.net/ NuG ...

随机推荐

  1. Go语言之进阶篇爬捧腹网

    1.爬捧腹网 网页规律: https://www.pengfu.com/xiaohua_1.html   下一页 +1 https://www.pengfu.com/xiaohua_2.html 主页 ...

  2. ds18b20驱动及应用程序

    ---------------------------------------------------------------------------------------------------- ...

  3. [leetcode]Remove Duplicates from Sorted Array II @ Python

    原题地址:https://oj.leetcode.com/problems/remove-duplicates-from-sorted-array-ii/ 题意: Follow up for &quo ...

  4. [leetcode]Best Time to Buy and Sell Stock II @ Python

    原题地址:https://oj.leetcode.com/problems/best-time-to-buy-and-sell-stock-ii/ 题意: Say you have an array ...

  5. Python实现爬虫设置代理IP和伪装成浏览器的方法(转载)

    https://www.jb51.net/article/139587.htm chrome_options = webdriver.ChromeOptions() chrome_options.ad ...

  6. (转)Unity3D中脚本的执行顺序和编译顺序(vs工程引用关系)

    自:http://www.cnblogs.com/champ/p/execorder.html 在Unity中可以同时创建很多脚本,并且可以分别绑定到不同的游戏对象上,它们各自都在自己的生命周期中运行 ...

  7. Apache+wsgi+flask部署

    flask自带的web server是开发用途,并不适用与发布,需要借助专业的web服务器. 配置的坑无数,Apache部署,403禁止,莫名其妙无法访问,500内部错误把我搞得崩溃了. 重点参考: ...

  8. 在Excel中输入超过10的带圈数字

    通过改变字体,可以在excel中输入1-20的带圈数字.对于某些人这些还不够用,那怎么办呢? 有了下面这个不知道是哪个大神做的字体,你就可以直接输到100啦 将字体文件文件粘贴至“C:\WINDOWS ...

  9. PyCharm安装第三方库如Requests

    转载: https://blog.csdn.net/fx677588/article/details/56830929 PyCharm安装第三方库是十分方便的,无需pip或其他工具,平台就自带了这个功 ...

  10. javascript学习网址

    教程:JavaScript征途 http://www1.huachu.com.cn/read/readbook.asp?bookid=10109449 教程:JScript 参考 http://msd ...