HtmlCleaner CleanerProperties 参数配置(转自macken博客,链接:http://macken.iteye.com/blog/1579809)
HtmlCleaner CleanerProperties 参数配置
|
Parameter |
Default |
Explanation |
| advancedXmlEscape | true | If this parameter is set to true, ampersand sign (&) that proceeds valid XML character sequences (&XXX;) will not be escaped with &XXX; |
| transResCharsToNCR | false | If this parameter is set to true, reserved XML sequences (&, ", ', <, >) are serialized to their Numeric Character Representations (#&38;, #&34;, #&39;, #&60;, #&62;). This parameter has effect only if advancedXmlEscape is set to true. |
| translateSpecialEntities | true | If true, special HTML entities (i.e. ?, ¡ë, ¡Á) are replaced with unicode characters they represent (?, ¡ë, ¡Á). This doesn't include &, <, >, ", '. |
| transSpecialEntitiesToNCR | false | If this parameter is set to true, special HTML entities (i.e. ¦¡) are serialized to their Numeric Character Representations (#&913;). This parameter has effect only if translateSpecialEntities is set to true. |
| recognizeUnicodeChars | true | If true, HTML characters represented by their codes in form &#XXXX; are replaced with real unicode characters (i.e. §Ø is replaced with §Ø) |
| useCdata | true | If true, HtmlCleaner will treat SCRIPT and STYLE tag contents as CDATA sections, or otherwise it will be regarded as ordinary text (special characters will be escaped). |
| omitUnknownTags | false | Tells whether to skip (ignore) unknown tags during cleanup. |
| treatUnknTagsAsContent | false | Tells whether to treat unknown tags as ordinary content, i.e. <something...> will be transformed to <something...>. This attribute is applicable only if omitUnknownTags is set to false. |
| omitDeprTags | false | Tells whether to skip (ignore) deprecated HTML tags during cleanup. |
| treatDeprTagsAsContent | false | Tells whether to treat deprecated tags as ordinary content, i.e. <font...> will be transformed to <font...>. This attribute is applicable only if omitDeprecatedTags is set to false. |
| omitComments | false | Tells whether to skip HTML comments. |
| omitXmlDeclaration | false | Tells whether or not to put XML declaration line at the beginning of the resulting XML. |
| omitDoctypeDeclaration | true | Tells whether to skip HTML declaration found in the source document. If HTML document being cleaned doesn't contain one it wouldn't be placed in the result anyway. |
| omitXmlnsAttributes | false | This flag is depricated since version 1.3 and namespacesAware should be used instead. |
| omitEnvelope | false | Tells whether to remove open and close tag being serialized. This parameter is introduced in HtmlCleaner 2.2 to replace omitHtmlEnvelope. If set to true, serialization skips open and close tags of the node, outputs only node's children. |
| useEmptyElementTags | true | Specifies how to serialize tags with empty body - if true, compact notation is used(<xxx/>), otherwise - <xxx></xxx> |
| allowMultiWordAttributes | true | Tells parser whether to allow attribute values consisting of multiple words or not. If true, attribute att="a b c" will stay like it is, and if false parser will split this into att="a" b="b" c="c" (this is default browsers' behaviour). |
| allowHtmlInsideAttributes | false | Tells parser whether to allow html tags inside attribute values. For example, when this flag is set att="here is <a href='xxxx'>link</a>" will stay like it is, and if not, parser will end attribute value after "here is". This flag makes sense only if allowMultiWordAttributes is set as well. |
| ignoreQuestAndExclam | true | Tells parser whether to completely ignore tags that have form <?TAGNAME....> or <!TAGNAME....>. This way some HTML/XML processing instructions may be omitted from the resulting xml. |
| namespacesAware | true | If true, namespace prefixes found during parsing will be preserved and all neccessery xml namespace declarations will be added in the root element. If false, all namespace prefixes and all xmlns namespace declarations will be stripped. |
| hyphenReplacement | = | XML doesn't allow double hyphen sequence (--) inside comments. This parameter tells which replacement to use for it when double hyphen is encountered during parsing. |
| pruneTags | empty string | Comma-separated list of tags that will be complitely removed (with all nested elements) from XML tree after parsing. For exampe if pruneTags is "script,style", resulting XML will not contain scripts and styles. |
| booleanAtts | self | Tells cleaner what value to give to boolean attributes, like checked, selected and similar. Allowed values are self - value of attribute is the same as attribute name (checked = "checked"), empty - attribute value is empty string (checked = "") and true - value of attribute is "true" (checked = "true"). |
| nodeByXpath | XPath expression used to select first node that is going to be serialized instead of whole HTML document. For example if this parameter us set to //table[1] only first table in document will be serialized. |
HtmlCleaner CleanerProperties 参数配置(转自macken博客,链接:http://macken.iteye.com/blog/1579809)的更多相关文章
- 文顶顶iOS开发博客链接整理及部分项目源代码下载
文顶顶iOS开发博客链接整理及部分项目源代码下载 网上的iOS开发的教程很多,但是像cnblogs博主文顶顶的博客这样内容图文并茂,代码齐全,示例经典,原理也有阐述,覆盖面宽广,自成系统的系列教程 ...
- 所有博客已经迁移到个人空间 blog.scjia.cc
所有博客已经迁移到个人空间 blog.scjia.cc
- 博客搬家到CSDN:http://blog.csdn.net/yeweiouyang
博客搬家到CSDN:http://blog.csdn.net/yeweiouyang
- Beta版本冲刺计划及安排(附七天冲刺的博客链接)
Beta版本冲刺计划及安排(附七天冲刺的博客链接) 新增组员 本次换人加入我们团队的新成员是原"爸爸说的都队"的队长念其锋同学,经过我们小组严格的两轮面试,他从几个同样前来面试的同 ...
- 《团队作业》五小福团队--UNO的博客链接汇总
<团队作业>五小福团队--UNO的博客链接汇总 <团队作业第一周>五小福团队作业--UNO <团队作业第二周>五小福团队作业--UNO <团队作业第三.第四周 ...
- 团队Alpha博客链接目录
Dipper团队Alpha博客链接目录 团队Alpha冲刺博客 第一次冲刺 第二次冲刺 第三次冲刺 第四次冲刺 第五次冲刺 第六次冲刺 第七次冲刺 第八次冲刺 第九次冲刺 第十次冲刺 第十一次冲刺 第 ...
- Alpha阶段博客链接
博客链接 团队项目启程篇章:http://www.cnblogs.com/liuliudashun/p/5968194.html 团队项目开发篇章1:http://www.cnblogs.com/li ...
- Golang拼接字符串的5种方法及其效率_Chrispink-CSDN博客_golang 字符串拼接效率 https://blog.csdn.net/m0_37422289/article/details/103362740
Different ways to concatenate two strings in Golang - GeeksforGeeks https://www.geeksforgeeks.org/di ...
- 配置WindowsLiveWriter,写cnblogs博客
转载:http://www.haogongju.net/art/2307587 引言 以前写博客一般都是联网在cnblogs上面写,不好的地方就是不联网就写不了,当然我们也可以先记录在word文件,等 ...
随机推荐
- 数组 list互转
数组 list互转 String str[] = list.toArray(new String[]{}); List list= java.util.Arrays.asList(String str ...
- 如何解决苹果Mac系统无法识别U盘
1.在Mac机上打开“磁盘工具”,将U盘重新分区, 2.格式选“exFAT”.该格式分区Win及Mac系统中都可以读和写,特别是可以支持大于4GB的大文件.但是一些高清播放机可能不支持. 3.以 ...
- 使用 sitemesh/decorator装饰器装饰jsp页面(原理及详细配置)
摘要:首先这个Decorator解释一下这个单词:“装饰器”,我觉得其实可以这样理解,他就像我们用到的Frame,他把每个页面共有的东西提炼了出来,也可能我们也会用各种各样的include标签,将我们 ...
- 底层码农的Stanford梦 --- 从SCPD开始 [转]
转载自知乎: https://zhuanlan.zhihu.com/p/25010074 一开始让我写这篇文章的时候,我是拒绝的.毕竟,我不是Stanford毕业的,出来写文章介绍Stanford,难 ...
- angular4.0 父子组建之间的相互通信
父组建---->子组建 传递信息 首先先通过angular脚手架生成两个基本组件,有一个好处是 会自动关联到跟模版,节约时间,而且还是偷懒 ng generate component compo ...
- 【PHP】最详细PHP从入门到精通(五)——PHP错误处理
PHP从入门到精通 之PHP中的字符串 在创建脚本和 web 应用程序时,错误处理是一个重要的部分.如果您的代码缺少错误检测编码,那么程序看上去很不专业,也为安全风险敞开了大门. 本教程介绍了 PH ...
- 快速学习springMVC框架原理
一.通过导图的方法快速去理解springmvc的原理 二.架构流程. 1. 用户发送请求至前端控制器DispatcherServlet 2. DispatcherServlet收到请求调用Handle ...
- Struts2请求参数合法性校验机制
在Action中通过代码执行数据校验 请求参数的输入校验途径一般分两种:客户端校验 :通过JavaScript 完成 (jquery validation插件),目的:过滤正常用户的误操作. 服务器校 ...
- Java版简易画图板的实现
Windows的画图板相信很多人都用过,这次我们就来讲讲Java版本的简易画板的实现. 基本的思路是这样的:画板实现大致分三部分:一是画板界面的实现,二是画板的监听以及画图的实现,三是画板的重绘.(文 ...
- Java之面向对象概述,类,构造方法,static,主方法,对象
一.面向对象概述 面向过程 "面向过程"(Procedure Oriented)是一种以过程为中心的编程思想.这些都是以什么正在发生为主要目标进行编程,不同于面向对象的是谁在受影响 ...