Html Agility Pack - API
Html Agility Pack - API
Parser
Selectors
Manipulation
Traversing
Writer
Utilities
Attributes
HTML Parser
HTML Parser allow you to parse HTML and return an HtmlDocument.
Html Parser
Name Description
From File Loads an HTML document from a file.
From String Loads the HTML document from the specified string.
From Web Gets an HTML document from an Internet resource.
From Browser Gets an HTML document from a WebBrowser.
Load Html From String
HtmlDocument.LoadHtml method loads the HTML document from the specified string.
Example
The following example loads an Html from the specified string.
var html = @"<!DOCTYPE html>
<html>
<body>
<h1>This is <b>bold</b> heading</h1>
<p>This is <u>underlined</u> paragraph</p>
<h2>This is <i>italic</i> heading</h2>
</body>
</html> ";
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var htmlBody = htmlDoc.DocumentNode.SelectSingleNode("//body");
Console.WriteLine(htmlBody.OuterHtml);
HTML Selectors
Selectors allow you to select HTML node from HtmlDocument.
Methods
Name Description
SelectNodes() Selects a list of nodes matching the XPath expression.
SelectSingleNode(String) Selects the first XmlNode that matches the XPath expression.
HTML SelectSingleNode
SelectSingleNode Method
Selects first HtmlNode matching the HtmlAgilityPack.HtmlNode.XPath expression.
Parameters:
xpath: The XPath expression. May not be null.
Returns:
The first HtmlAgilityPack.HtmlNode that matches the XPath query or a null reference if no matching node was found.
Examples
The following example selects the first node matching the XPath expression using SelectNodes method.
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html); string name = htmlDoc.DocumentNode
.SelectSingleNode("//td/input")
.Attributes["value"].Value;
///如果用child.SelectSingleNode("//*[@class=\"titlelnk\"]").InnerText这样的方式查询,是永远以整个document为基准来查询,
///这点就不好,理应以当前child节点的html为基准才对。 Write(sw, String.Format("推荐:{0}", hn.SelectSingleNode("//*[@class=\"diggnum\"]").InnerText));
Write(sw, String.Format("标题:{0}", hn.SelectSingleNode("//*[@class=\"titlelnk\"]").InnerText));
Write(sw, String.Format("介绍:{0}", hn.SelectSingleNode("//*[@class=\"post_item_summary\"]").InnerText));
Write(sw, String.Format("信息:{0}", hn.SelectSingleNode("//*[@class=\"post_item_foot\"]").InnerText));
HTML Manipulation
Traversing allow you to traverse through HTML node.
Properties
Name Description
InnerHtml Gets or Sets the HTML between the start and end tags of the object.
InnerText Gets the text between the start and end tags of the object.
OuterHtml Gets the object and its content in HTML.
ParentNode Gets the parent of this node (for nodes that can have parents).
Methods
Name Description
AppendChild() Adds the specified node to the end of the list of children of this node.
AppendChildren() Adds the specified node to the end of the list of children of this node.
Clone() Creates a duplicate of the node
CloneNode(Boolean) Creates a duplicate of the node.
CloneNode(String) Creates a duplicate of the node and changes its name at the same time.
CloneNode(String, Boolean) Creates a duplicate of the node and changes its name at the same time.
CopyFrom(HtmlNode) Creates a duplicate of the node and the subtree under it.
CopyFrom(HtmlNode, Boolean) Creates a duplicate of the node.
CreateNode() Creates an HTML node from a string representing literal HTML.
InsertAfter() Inserts the specified node immediately after the specified reference node.
InsertBefore Inserts the specified node immediately before the specified reference node.
PrependChild Adds the specified node to the beginning of the list of children of this node.
PrependChildren Adds the specified node list to the beginning of the list of children of this node.
Remove Removes node from parent collection
RemoveAll Removes all the children and/or attributes of the current node.
RemoveAllChildren Removes all the children of the current node.
RemoveChild(HtmlNode) Removes the specified child node.
RemoveChild(HtmlNode, Boolean) Removes the specified child node.
ReplaceChild() Replaces the child node oldChild with newChild node.
HTML Traversing
Traversing allow you to traverse through HTML node.
Properties
Name Description
ChildNodes Gets all the children of the node.
FirstChild Gets the first child of the node.
LastChild Gets the last child of the node.
NextSibling Gets the HTML node immediately following this element.
ParentNode Gets the parent of this node (for nodes that can have parents).
Methods
Name Description
Ancestors() Gets all the ancestor of the node.
Ancestors(String) Gets ancestors with matching name.
AncestorsAndSelf() Gets all anscestor nodes and the current node.
AncestorsAndSelf(String) Gets all anscestor nodes and the current node with matching name.
DescendantNodes Gets all Descendant nodes for this node and each of child nodes
DescendantNodesAndSelf Returns a collection of all descendant nodes of this element, in document order
Descendants() Gets all Descendant nodes in enumerated list
Descendants(String) Get all descendant nodes with matching name
DescendantsAndSelf() Returns a collection of all descendant nodes of this element, in document order
DescendantsAndSelf(String) Gets all descendant nodes including this node
Element Gets first generation child node matching name
Elements Gets matching first generation child nodes matching name
HTML Writer
Save HtmlDocument && Write HtmlNode
HtmlDocument - Methods
Name Description
Save(Stream) Saves the HTML document to the specified stream.
Save(StreamWriter) Saves the HTML document to the specified StreamWriter.
Save(TextWriter) Saves the HTML document to the specified TextWriter.
Save(String) Saves the mixed document to the specified file.
Save(XmlWriter) Saves the HTML document to the specified XmlWriter.
Save(Stream, Encoding) Saves the HTML document to the specified stream.
Save(String, Encoding) Saves the mixed document to the specified file.
HtmlNode - Methods
Name Description
WriteContentTo() Saves all the children of the node to a string.
WriteContentTo(TextWriter) Saves all the children of the node to the specified TextWriter.
WriteTo() Saves the current node to a string.
WriteTo(TextWriter) Saves the current node to the specified TextWriter.
WriteTo(XmlWriter) Saves the current node to the specified XmlWriter.
HTML Utilities
HtmlDocument Utilities
HtmlDocument Methods
Name Description
DetectEncoding(Stream) Detects the encoding of an HTML stream.
DetectEncoding(TextReader) Detects the encoding of an HTML text provided on a TextReader.
DetectEncoding(String) Detects the encoding of an HTML file.
DetectEncodingAndLoad(String) Detects the encoding of an HTML document from a file first, and then loads the file.
DetectEncodingAndLoad(String, Boolean) Detects the encoding of an HTML document from a file first, and then loads the file.
HTML Attributes
Traversing allow you to traverse through HTML node.
Methods
Name Description
Add(HtmlAttribute) Adds supplied item to collection
Add(String, String) Adds a new attribute to the collection with the given values
Append(String) Creates and inserts a new attribute as the last attribute in the collection.
Append(HtmlAttribute) Inserts the specified attribute as the last attribute in the collection.
Append(String, string) Creates and inserts a new attribute as the last attribute in the collection.
Remove() Removes all attributes from the collection
Remove(String) Removes an attribute from the list, using its name. If there are more than one attributes with this name, they will all be removed.
Remove(HtmlAttribute) Removes a given attribute from the list.
RemoveAll() Remove all attributes in the list.
RemoveAt() Removes the attribute at the specified index.
SetAttributeValue() Helper method to set the value of an attribute of this node. If the attribute is not found, it will be created automatically.
Html Agility Pack - API的更多相关文章
- C# 网络爬虫利器之Html Agility Pack如何快速实现解析Html
简介 现在越来越多的场景需要我们使用网络爬虫,抓取相关数据便于我们使用,今天我们要讲的主角Html Agility Pack是在爬取的过程当中,能够高效的解析我们抓取到的html数据. 优势 在.NE ...
- Html Agility Pack 解析Html
Hello 好久不见 哈哈,今天给大家分享一个解析Html的类库 Html Agility Pack.这个适用于想获取某网页里面的部分内容.今天就拿我的Csdn的博客列表来举例. 打开页面 用Fir ...
- 开源项目Html Agility Pack实现快速解析Html
这是个很好的的东西,以前做Html解析都是在用htmlparser,用的虽然顺手,但解析速度较慢,碰巧今天找到了这个,就拿过来试,一切出乎意料,非常爽,推荐给各位使用. 下面是一些简单的使用技巧,希望 ...
- Html Agility Pack基础类介绍及运用
第一篇只对Html Agility Pack做了一个大概的介绍,在接下来的章节会比较深入的介绍Html Agility Pack. Html Agility Pack 源码中的类大概有28个左右,其实 ...
- HTML WEB 和HTML Agility Pack结合
现在,在不少应用场合中都希望做到数据抓取,特别是基于网页部分的抓取.其实网页抓取的过程实际上是通过编程的方法,去抓取不同网站网页后,再进行分析筛选的过程.比如,有的比较购物网站,会同时去抓取不同购物网 ...
- 一款很不错的html转xml工具-Html Agility Pack
之前发个一篇关于实现html转成xml的劣作<实现html转Xml>,受到不少网友的关心.该实现方法是借助htmlparser去分解html内容,然后按照dom的结构逐个生成xml字符串. ...
- Html Agility Pack解析HTML页
文章来源:Html Agility Pack解析HTML页 现在,在不少应用场合中都希望做到数据抓取,特别是基于网页部分的抓取.其实网页抓取的过程实际上是通过编程的方法,去抓取不同网站网页后,再进行分 ...
- C#解析HTML利器-Html Agility Pack
今天刚开始做毕设....好吧,的确有点晚.我的毕设设计需要爬取豆瓣的电影推荐,于是就需要解析爬取下来的html,之前用Python玩过解析,但目前我使用的是C#,我觉得C#不比python差,有微软大 ...
- 强大而灵活的的Html解析器——Html Agility Pack
一.概述 Html Agility Pack 简称HAP,是一个强大而灵活的解析Html DOM的.Net类库. 二.官方链接 官网:http://html-agility-pack.net/ NuG ...
随机推荐
- SQL Server中取汉字拼音的函数
)) ) ) ) ) ), py end return @pinyin END GOSELECT dbo.fn_GetP ...
- 从零开始学C++之模板(三):缺省模板参数(借助标准模板容器实现Stack模板)、成员模板、关键字typename
一.缺省模板参数 回顾前面的文章,都是自己管理stack的内存,无论是链栈还是数组栈,能否借助标准模板容器管理呢?答案是肯定的,只需要多传一个模板参数即可,而且模板参数还可以是缺省的,如下: temp ...
- [leetcode]Insert Interval @ Python
原题地址:https://oj.leetcode.com/problems/insert-interval/ 题意: Given a set of non-overlapping intervals, ...
- JQuery-Dialog(弹出窗口,遮蔽窗口)
在Ajax中经常用到的弹出窗口和遮蔽窗口.自己写肯定是一个最佳方案,但时间和成本上,还是决定了寻找现成的吧.大概罗列一下.需要我满足我几个条件 一定要简洁方便 拥有遮蔽功能,Model Dialog ...
- ztree默认自动打开第一级
var treeObj = $.fn.zTree.getZTreeObj("tree"); var nodes = treeObj.getNodes(); if (nodes.le ...
- vue组件的hover事件模拟、给第三方组件绑定事件不生效问题
1.vue里面实现hover效果基本需要用事件模拟 <div @mouseover="overShow" @mouseout="outHide"> ...
- laravel 5.5 整合 jwt 报错Method Tymon\JWTAuth\Commands\JWTGenerateCommand::handle() does not exist解决
今天介绍一个在laravel5.5新版本整合jwt 执行 php artisan jwt:generate 再生成密钥时报的一个错误 Method Tymon\JWTAuth\Commands\JW ...
- [置顶] Hadoop2.2.0中HDFS的高可用性实现原理
在Hadoop2.0.0之前,NameNode(NN)在HDFS集群中存在单点故障(single point of failure),每一个集群中存在一个NameNode,如果NN所在的机器出现了故障 ...
- discuz上传头像失败怎么解决
刚安装好的discuz程序,可能需要我们做许多修改,而头像上传失败则是最为常见的问题之一,那么discuz上传头像失败怎么解决呢 进入ftp,打开跟目录下config文件 下载"config ...
- C#.NET常见问题(FAQ)-如何让文本框textbox内容靠右显示
对于TextBox,我可以设置Text-Align属性为right,就可以让文字靠右了 对于Label而言,需要修改AutoSize为False,并修改TextAlign为MiddleRight, ...