Although you can now natively parse HTML using DOMParser and XMLHttpRequest, this is a new feature that is not yet supported by all browsers in use in the wild. The code snippets on this page will let your site work until these new features are more widely available.

Safely parsing simple HTML to DOM

When using XMLHttpRequest to get the HTML of a remote webpage, it is often advantageous to turn that HTML string into DOM for easier manipulation. However, there are potential dangers involved in injecting remote content in a privileged context in your extension, so it can be desirable to parse the HTML safely.

The function below will safely parse simple HTML and return a DOM object which can be manipulated like web page elements. This will remove tags like <script>, <style>, <head>, <body>, <title>, and <iframe>. It will also remove all JavaScript, including element attributes that contain JavaScript.

function HTMLParser(aHTMLString){
var html = document.implementation.createDocument("http://www.w3.org/1999/xhtml", "html", null),
body = document.createElementNS("http://www.w3.org/1999/xhtml", "body");
html.documentElement.appendChild(body); body.appendChild(Components.classes["@mozilla.org/feed-unescapehtml;1"].getService(Components.interfaces.nsIScriptableUnescapeHTML).parseFragment(aHTMLString, false, null, body)); return body;
},

It works by creating a content-level (this is safer than chrome-level) <div> in the current page, then parsing the HTML fragment and attaching that fragment to the <div>. The <div> is returned, and it is never actually appended to the current page. The returned <body> object is of type Element

Here is a sample that counts the number of paragraphs in a string:

var DOMPars = HTMLParser('<p>foo</p><p>bar</p>');
alert(DOMPars.getElementsByTagName('p').length);

If method HTMLParser() returns variable html (instead of body), you have all document object with its complete functions list, therefore you can retrieve info within div tag like this:

var DOMPars = HTMLParser("<div id='userInfo'>John was a mediocre programmer, but people liked him <strong>anyway</strong>.</div>");
alert(DOMPars.getElementById('userInfo').innerHTML);

To parse a complete HTML page, load it into an iframe whose type is content (not chrome). See Using a hidden iframe element to parse HTML to a window's DOM below.

Parsing Complete HTML to DOM

Loading an html document seems much more simpler if its loaded using the XMLHttpRequest object. For that matter we are going to load our HTML document first:

var request = XMLHttpRequest();
request.open("GET", "http://example.org/file.html", false);
request.send(null);

our next step is to create the DOM that we need to feed our newly gathered html information:

var doc = document.implementation.createHTMLDocument("example");
doc.documentElement.innerHTML = request.responseText;

after this any manipulation that we might want to do will be something as simple as the following:

doc.body.textContent = "This is inside the body!";

Using a hidden iframe element to parse HTML to a window's DOM

Sample code may need more work. Create your own function using unique name, ID, and so forth.

var frame = document.getElementById("sample-frame");
if (!frame) {
// create frame
frame = document.createElement("iframe"); // iframe (or browser on older Firefox)
frame.setAttribute("id", "sample-frame");
frame.setAttribute("name", "sample-frame");
frame.setAttribute("type", "content");
frame.setAttribute("collapsed", "true");
document.getElementById("main-window").appendChild(frame);
// or
// document.documentElement.appendChild(frame); // set restrictions as needed
frame.webNavigation.allowAuth = false;
frame.webNavigation.allowImages = false;
frame.webNavigation.allowJavascript = false;
frame.webNavigation.allowMetaRedirects = true;
frame.webNavigation.allowPlugins = false;
frame.webNavigation.allowSubframes = false; // listen for load
frame.addEventListener("load", function (event) {
// the document of the HTML in the DOM
var doc = event.originalTarget;
// skip blank page or frame
if (doc.location.href == "about:blank" || doc.defaultView.frameElement) return; // do something with the DOM of doc
alert(doc.location.href); // when done remove frame or set location "about:blank"
setTimeout(function (){
var frame = document.getElementById("sample-frame");
// remove frame
// frame.destroy(); // if using browser element instead of iframe
frame.parentNode.removeChild(frame);
// or set location "about:blank"
// frame.contentDocument.location.href = "about:blank";
},10);
}, true);
} // load a page
frame.contentDocument.location.href = "http://www.mozilla.org/";
// or
// frame.webNavigation.loadURI("http://www.mozilla.org/",Components.interfaces.nsIWebNavigation,null,null,null);

If you are starting with an HTML string, you can convert it to a data URI and use that to load in the browser element.

Using a hidden XUL iframe (alternate example)

Sometimes, a browser element is overkill, or does not meet your needs, or you can't fulfill its requirements. While working on Donkeyfire, I discovered the iframe XUL element, and it is very easy to implement it.

As an example, I will show a browser overlay .xul file, and some JavaScript code to access it.

Here is some XUL code you can add to your browser overlay .xul file. Don't forget to modify the id and name!

<vbox hidden="false" height="0">
<iframe type="content" src="" name="donkey-browser" hidden="false" id="donkey-browser" height="0"/>
</vbox>

Then, in your extension's "load" event handler:

onLoad: function() {
donkeybrowser = document.getElementById("donkey-browser");
if (donkeybrowser) {
donkeybrowser.style.height = "0px";
donkeybrowser.webNavigation.allowAuth = true;
donkeybrowser.webNavigation.allowImages = false;
donkeybrowser.webNavigation.allowJavascript = false;
donkeybrowser.webNavigation.allowMetaRedirects = true;
donkeybrowser.webNavigation.allowPlugins = false;
donkeybrowser.webNavigation.allowSubframes = false;
donkeybrowser.addEventListener("DOMContentLoaded", function (e) { donkeyfire.donkeybrowser_onPageLoad(e); }, true);
}

With that code, we obtain a reference to the iframe element we declared in the .xul file. The most interesting piece of code here is the DOMContentLoaded event listener we define for the element. Let's take a look at the donkeyfire.donkeybrowser_onPageLoad() handler:

donkeybrowser_onPageLoad: function(aEvent) {
var doc = aEvent.originalTarget;
var url = doc.location.href;
if (aEvent.originalTarget.nodeName == "#document") { // ok, it's a real page, let's do our magic
dump("[DF] URL = "+url+"\n");
var text = doc.evaluate("/html/body/h1",doc,null,XPathResult.STRING_TYPE,null).stringValue;
dump("[DF] TEXT in /html/body/h1 = "+text+"\n");
}
},

As you can see, we obtain full access to the DOM of the page we loaded in background, and we can even evaluate XPath expressions. In the example, we dump() to the console the page's URL and the text contained in the first h1 tag of the page's <body>.

But, we still need to see how to execute the famous loadURI() method using our iframe:

donkeybrowser.webNavigation.loadURI("http://developer.mozilla.org",Components.interfaces.nsIWebNavigation, null, null, null);

Also, I recommend you take a look at the nsIWebNavigation interface.

来自MDNhttps://developer.mozilla.org/en-US/Add-ons/Code_snippets/HTML_to_DOM,应该比较完整了。

HTML to DOM的更多相关文章

  1. 关于DOM的操作以及性能优化问题-重绘重排

     写在前面: 大家都知道DOM的操作很昂贵. 然后贵在什么地方呢? 一.访问DOM元素 二.修改DOM引起的重绘重排 一.访问DOM 像书上的比喻:把DOM和JavaScript(这里指ECMScri ...

  2. 读书笔记:JavaScript DOM 编程艺术(第二版)

    读完还是能学到很多的基础知识,这里记录下,方便回顾与及时查阅. 内容也有自己的一些补充. JavaScript DOM 编程艺术(第二版) 1.JavaScript简史 JavaScript由Nets ...

  3. 页面嵌入dom与被嵌入iframe的攻防

    1.情景一:自己的页面被引入(嵌入)至别人的页面iframe中 if(window.self != window.top){ //url是自己页面的url window.top.location.hr ...

  4. 通俗易懂的来讲讲DOM

    DOM是所有前端开发每天打交道的东西,但是随着jQuery等库的出现,大大简化了DOM操作,导致大家慢慢的“遗忘”了它的本来面貌.不过,要想深入学习前端知识,对DOM的了解是不可或缺的,所以本文力图系 ...

  5. HTML DOM 介绍

    本篇主要介绍DOM内容.DOM 节点.节点属性以及获取HTML元素的方法. 目录 1. 介绍 DOM:介绍DOM,以及对DOM分类和功能的说明. 2. DOM 节点:介绍DOM节点分类和节点层次. 3 ...

  6. HTML DOM 对象

    本篇主要介绍HTML DOM 对象:Document.Element.Attr.Event等4个对象. 目录 1. Document 对象:表示文档树的根节点,大部分属性和方法都是对元素进行操作. 2 ...

  7. 重撸js_2_基础dom操作

    1.node 方法 返回 含义 nodeName String 获取节点名称 nodeType Number 获取节点类型 nodeValue String 节点的值(注意:文本也是节点) 2.inn ...

  8. 虚拟dom与diff算法 分析

    好文集合: 深入浅出React(四):虚拟DOM Diff算法解析 全面理解虚拟DOM,实现虚拟DOM

  9. 窥探Vue.js 2.0 - Virtual DOM到底是个什么鬼?

    引言 你可能听说在Vue.js 2.0已经发布,并且在其中新添加如了一些新功能.其中一个功能就是"Virtual DOM". Virtual DOM是什么 在之前,React和Em ...

  10. jQuery学习之路(2)-DOM操作

    ▓▓▓▓▓▓ 大致介绍 jQuery作为JavaScript库,继承并发扬了JavaScript对DOM对象操作的特性,使开发人员能方便的操作DOM对象. ▓▓▓▓▓▓ jQuery中的DOM操作 看 ...

随机推荐

  1. 最大熵模型 Maximum Entropy Model

    熵的概念在统计学习与机器学习中真是很重要,熵的介绍在这里:信息熵 Information Theory .今天的主题是最大熵模型(Maximum Entropy Model,以下简称MaxEnt),M ...

  2. oracle之检查点(Checkpoint)

    检查点是一个数据库事件,它把修改数据从高速缓存写入磁盘,并更新控制文件和数据文件.检查点分为三类:1)局部检查点:单个实例执行数据库所有数据文件的一个检查点操作,属于此实例的全部脏缓存区写入数据文件. ...

  3. ios iPhone的一些基础知识,扫盲

    iPhone 4(2010 年):初始系统: iOS 4.0(GSM 版) (苹果第一次采用 iOS 为移动系统命名),iOS 4.2.2(CDMA 版)可以升级至:iOS 7.1.2 iPhone ...

  4. CSS使jsp图片旋转90度

    <style > img{ margin:100px auto 0; -moz-transform:rotate(-90deg); -webkit-transform:rotate(-90 ...

  5. hdu 1115(计算多边形重心)

    题意:已知一多边形没有边相交,质量分布均匀.顺序给出多边形的顶点坐标,求其重心. 分析: 求多边形重心的题目大致有这么几种: 1,质量集中在顶点上.n个顶点坐标为(xi,yi),质量为mi,则重心 X ...

  6. C#操作Excel,对Sheet插入次序的控制 (有待完善)

    C#对Excel文件的操作,插入工作表(Worksheet)的方法是 Workbook.Worksheets.Add().通常情况下,我们在EXCEL的工作薄中,使用菜单操作:插入一个新的工作表,那么 ...

  7. ylb:创建数据库、表,对表的增查改删语句

    ylbtech-SQL Server:SQL Server-创建数据库.表,对表的增查改删语句 SQL Server 创建数据库.表,对表的增查改删语句. 1,ylb:创建数据库.表,对表的增查改删语 ...

  8. Dev GridView 获取选中分组下的所有数据行 z

    现在要在DevExpress 的GridView 中实现这样一个功能.就是判断当前的选中行是否是分组行,如果是的话就要获取该分组下的所有数据信息. 如下图(当选中红框中的分组行事.程序要获取该分组下的 ...

  9. jQuery遮罩插件jQuery.blockUI.js简介

    利用Jquery.blockui.js创建可拖动.自定义内容的弹出层 利用Jquery.blockui.js创建可拖动.自定义内容的弹出层 目标 : 1 . 弹出层的内容可以自定义任意的HTML元素 ...

  10. windows下mysql5.7安装及配置

    装完msi后,复制my-default.ini文件,黏贴为my.ini文件,内容修改如下: # For advice on how to change settings please see# htt ...