How to Validate XML using Java
Configure Java APIs (SAX, DOM, dom4j, XOM) using JAXP 1.3 to validate XML Documents with DTD and Schema(s).
Many Java XML APIs provide mechanisms to validate XML documents, the JAXP API can be used for most of these XML APIs but subtle configuration differences exists. This article shows five ways of how to configure different Java APIs (including DOM, SAX, dom4j and XOM) using JAXP 1.3 for checking and validating XML with DTD and Schema(s).
Contents
- Setup (Error Handler, Namespace Aware, Input Document, XML Schema, DTD)
- Checking Wellformed-ness (DOM, SAX, dom4j, XOM)
- Validate using internal DTD (DOM, SAX, dom4j, XOM)
- Validate using internal XSD (DOM, SAX, dom4j, XOM)
- Validate using external Schema(s) (DOM, SAX, dom4j, XOM)
- Validate using internal DTD and external Schema(s) (DOM, SAX, dom4j, XOM)
- Conclusion (Sample Code, Resources)
Setup
All underlying examples can be compiled and executed using Java 5.0 (JAXP 1.3) or higher and make use of the following components and settings.
Error Handler
To report errors, it is necessary to provide an ErrorHandler to the underlying implementation. The ErrorHandler used for the examples is a very simple one which reports the error to System.out and continues until the XML document has been fully parsed or until a fatal-error has been reported.
public class SimpleErrorHandler implements ErrorHandler {
public void warning(SAXParseException e) throws SAXException {
System.out.println(e.getMessage());
}
public void error(SAXParseException e) throws SAXException {
System.out.println(e.getMessage());
}
public void fatalError(SAXParseException e) throws SAXException {
System.out.println(e.getMessage());
}
}
Namespace Aware
Namespaces have been introduced to XML after the first specification of XML had received the official W3C Recommendation status. This is the reason why (most of the) XML parser implementations do not support XML Namespaces by default, to handle the validation of XML documents with namespaces correctly it is therefore necessary to configure the underlying parsers to provide support for XML Namespaces.
Input Document
The input document ("contacts.xml") that has been used for all the code examples is shown below.
<!DOCTYPE contacts SYSTEM "contacts.dtd"> <contacts xsi:noNamespaceSchemaLocation="contacts.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<contact title="citizen">
<firstname>Edwin</firstname>
<lastname>Dankert</lastname>
</contact>
</contacts>
XML Schema
The XML Schema ("contacts.xsd") as defined below has been used in the code examples to validate the input document. The input document contains an extra attribute which has not been defined in the XML Schema, this shows that the XML Schema has been used for the validation. When using this XML Schema to validate the input XML document, the following error gets reported:
cvc-complex-type.3.2.2: Attribute 'title' is not allowed to appear in element 'contact'.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="contacts">
<xs:complexType>
<xs:sequence>
<xs:element ref="contact"/>
</xs:sequence>
</xs:complexType>
</xs:element> <xs:element name="contact">
<xs:complexType>
<xs:sequence>
<xs:element name="firstname" type="xs:NCName"/>
<xs:element name="lastname" type="xs:NCName"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
DTD
The DTD ("contacts.dtd") as defined below has been used in the code examples to validate the input document. To highlight that the DTD has been used for the validation, the title attribute in the input document has a value which is not allowed according to this DTD. When using this DTD to validate the input XML document, the following error gets reported:
Attribute "title" with value "citizen" must have a value from the list "MR MS MRS ".
<!ELEMENT contacts (contact*)>
<!ATTLIST contacts xsi:noNamespaceSchemaLocation CDATA #IMPLIED>
<!ATTLIST contacts xmlns:xsi CDATA #IMPLIED> <!ELEMENT contact (firstname,lastname)>
<!ATTLIST contact title (MR|MS|MRS) "MS"> <!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
Checking Wellformed-ness
Before a document can be called XML and not csv, simple text or any other format, it needs to support the basic rules as defined by the XML Recommendation, when it adheres to these rules it is said to be Wellformed XML.
Code Fragments: DOM, SAX, dom4j, XOM
DOM
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true); DocumentBuilder builder = factory.newDocumentBuilder(); builder.setErrorHandler(new SimpleErrorHandler()); Document document = builder.parse(new InputSource("document.xml"));
SAX
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true); SAXParser parser = factory.newSAXParser(); XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());
reader.parse(new InputSource("document.xml"));
dom4j
SAXReader reader = new SAXReader();
reader.setValidation(false);
reader.setErrorHandler(new SimpleErrorHandler());
reader.read("contacts.xml");
XOM
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true); SAXParser parser = factory.newSAXParser(); XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler()); Builder builder = new Builder(reader);
builder.build("contacts.xml");
Validate using internal DTD
Parse the input document using only the DTD (contacts.dtd), as defined by the DOCTYPE in the input document, for validation.
Code Fragments: DOM, SAX, dom4j, XOM
DOM
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true); DocumentBuilder builder = factory.newDocumentBuilder(); builder.setErrorHandler(new SimpleErrorHandler()); Document document = builder.parse(new InputSource("document.xml"));
SAX
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true); SAXParser parser = factory.newSAXParser(); XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());
reader.parse(new InputSource("document.xml"));
dom4j
SAXReader reader = new SAXReader();
reader.setValidation(true);
reader.setErrorHandler(new SimpleErrorHandler());
reader.read("contacts.xml");
XOM
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true); SAXParser parser = factory.newSAXParser(); XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler()); Builder builder = new Builder(reader);
builder.build("contacts.xml");
Validate using internal XSD
Parse the input document using only the XML Schema (contacts.xsd), as defined by the noNamespaceSchemaLocation attribute in the input document, for validation.
Code Fragments: DOM, SAX, dom4j, XOM
DOM
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);
factory.setAttribute("http://java.sun.com/xml/jaxp/properties/schemaLanguage",
"http://www.w3.org/2001/XMLSchema"); DocumentBuilder builder = factory.newDocumentBuilder(); builder.setErrorHandler(new SimpleErrorHandler()); Document document = builder.parse(new InputSource("document.xml"));
SAX
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true); SAXParser parser = factory.newSAXParser();
parser.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage",
"http://www.w3.org/2001/XMLSchema"); XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());
reader.parse(new InputSource("document.xml"));
dom4j
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true); SAXParser parser = factory.newSAXParser();
parser.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage",
"http://www.w3.org/2001/XMLSchema"); SAXReader reader = new SAXReader(parser.getXMLReader());
reader.setValidation(true);
reader.setErrorHandler(new SimpleErrorHandler());
reader.read("contacts.xml");
XOM
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true); SAXParser parser = factory.newSAXParser();
parser.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage",
"http://www.w3.org/2001/XMLSchema"); XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler()); Builder builder = new Builder(reader);
builder.build("contacts.xml");
Validate using external Schema
Parse the input document using the schema (contacts.xsd), as defined externally by the source-code.
Code Fragments: DOM, SAX, dom4j, XOM
DOM
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true); SchemaFactory schemaFactory =
SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema"); factory.setSchema(schemaFactory.newSchema(
new Source[] {new StreamSource("contacts.xsd")})); DocumentBuilder builder = factory.newDocumentBuilder(); builder.setErrorHandler(new SimpleErrorHandler()); Document document = builder.parse(new InputSource("document.xml"));
SAX
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true); SchemaFactory schemaFactory =
SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema"); factory.setSchema(schemaFactory.newSchema(
new Source[] {new StreamSource("contacts.xsd")})); SAXParser parser = factory.newSAXParser(); XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());
reader.parse(new InputSource("document.xml"));
dom4j
SAXParserFactory factory = SAXParserFactory.newInstance(); SchemaFactory schemaFactory =
SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema"); factory.setSchema(schemaFactory.newSchema(
new Source[] {new StreamSource("contacts.xsd")})); SAXParser parser = factory.newSAXParser(); SAXReader reader = new SAXReader(parser.getXMLReader());
reader.setValidation(false);
reader.setErrorHandler(new SimpleErrorHandler());
reader.read("contacts.xml");
XOM
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true); SchemaFactory schemaFactory =
SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
factory.setSchema(schemaFactory.newSchema(
new Source[] {new StreamSource("contacts.xsd")})); SAXParser parser = factory.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler()); Builder builder = new Builder(reader);
builder.build("contacts.xml");
Validate using internal DTD and external Schema
Parse the input document using the schema (contacts.xsd), as defined externally by the source-code and the DTD (contacts.dtd), as defined by the DOCTYPE in the input document, for validation.
Code Fragments: DOM, SAX, dom4j, XOM
DOM
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true); SchemaFactory schemaFactory =
SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema"); factory.setSchema(schemaFactory.newSchema(
new Source[] {new StreamSource("contacts.xsd")})); DocumentBuilder builder = factory.newDocumentBuilder(); builder.setErrorHandler(new SimpleErrorHandler()); Document document = builder.parse(new InputSource("document.xml"));
SAX
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true); SchemaFactory schemaFactory =
SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema"); factory.setSchema(schemaFactory.newSchema(
new Source[] {new StreamSource("contacts.xsd")})); SAXParser parser = factory.newSAXParser(); XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());
reader.parse(new InputSource("document.xml"));
dom4j
SAXParserFactory factory = SAXParserFactory.newInstance(); SchemaFactory schemaFactory =
SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema"); factory.setSchema(schemaFactory.newSchema(
new Source[] {new StreamSource("contacts.xsd")})); SAXParser parser = factory.newSAXParser(); SAXReader reader = new SAXReader(parser.getXMLReader());
reader.setValidation(true);
reader.setErrorHandler(new SimpleErrorHandler());
reader.read("contacts.xml");
XOM
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true); SchemaFactory schemaFactory =
SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
factory.setSchema(schemaFactory.newSchema(
new Source[] {new StreamSource("contacts.xsd")})); SAXParser parser = factory.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler()); Builder builder = new Builder(reader);
builder.build("contacts.xml");
Conclusion
Several mechanisms and XML APIs can be used to parse and validate XML, by using JAXP 1.3 the mechanism can mostly stay the same for these different APIs.
Sample Code
Download any of the archives to get the full source-code for the examples above.
The archives consist of the ./contacts.xml input XML file, ./contacts.xsd the XML Schema document, ./contacts.dtd the DTD document and the source-code for the fragments above, located in the ./src directory.
The archive also contains a number XML Hammer validation projects included in the ./xmlhammer-projects directory. To be able to execute these XML Hammer projects, you will need to have the XML Hammer application installed. This can be downloaded from:
Resources
- Extensible Markup Language (XML) 1.0
- Namespaces in XML
- Java 5.0
- Java 6.0
- JAXP
- dom4j
- XOM
- JDOM
How to Validate XML using Java的更多相关文章
- xml与java代码相互装换的工具类
这是一个java操作xml文件的工具类,最大的亮点在于能够通过工具类直接生成xml同样层次结构的java代码,也就是说,只要你定义好了xml的模板,就能一键生成java代码.省下了自己再使用工具类写代 ...
- nested exception is java.lang.RuntimeException: Error parsing Mapper XML. Cause: java.lang.IllegalArgumentException: Result Maps collection already contains value for
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'daoSupport': ...
- xml 解析 java 基础复习
document 解析 sax 解析 dom4j 解析(摘自csdn redarmychen) dom4j是一个Java的XML API,类似于jdom,用来读写XML文件的.dom4j是一个非常 ...
- 使用xml及java代码混合的方式来设置图形界面
参考<疯狂android讲义>第2版2.1节 设置android的图形界面有三种方法: 1.使用纯xml文件 2.使用纯java,代码臃肿复杂,不建议使用 3.使用xml与java混合,前 ...
- spring mybatis 整合问题Error parsing Mapper XML. Cause: java.lang.NullPointerException
14:30:40,872 DEBUG SqlSessionFactoryBean:431 - Parsed configuration file: 'class path resource [myba ...
- Spring 与 mybatis整合 Error parsing Mapper XML. Cause: java.lang.NullPointerException
mapper配置文件中的namespace没有填:而且namespase的值应该填为:mapper的权限定名:否则还是会抛出异常 org.springframework.beans.factory.B ...
- POPTEST老李分享DOM解析XML之java
POPTEST老李分享DOM解析XML之java Java提供了两种XML解析器:树型解释器DOM(Document Object Model,文档对象模型),和流机制解析器SAX(Simple ...
- 使用XStream是实现XML与Java对象的转换(3)--注解
六.使用注解(Annotation) 总是使用XStream对象的别名方法和注册转换器,会让人感到非常的乏味,又会产生很多重复性代码,于是我们可以使用注解的方式来配置要序列化的POJO对象. 1,最基 ...
- XML 和 java对象相互转换
XML 和 java对象相互转换 博客分类: XML 和 JSON 下面使用的是JDK自带的类,没有引用任何第三方jar包. Unmarshaller 类使客户端应用程序能够将 XML 数据转换为 ...
随机推荐
- C#基础(二)——C#中的构造函数
构造函数主要是用来创建对象时为对象赋初值来初始化对象.总与new运算符一起使用在创建对象的语句中 .A a=new A(); 构造函数具有和类一样的名称:但它是一个函数具有函数的所有特性,同一个类里面 ...
- c#面向对象小结
特点: 1:将复杂的事情简单化. 2:面向对象将以前的过程中的执行者,变成了指挥者. 3:面向对象这种思想是符合现在人们思考习惯的一种思想. 过程和对象在我们的程序中是如何体现的呢?过程其实就是函数: ...
- ng表单验证,提交以后才显示错误
只在提交表单后显示错误信息 有时候不想在用户正在输入的时候显示错误信息. 当前错误信息会在用户输入表单时立即显示. 由于Angular很棒的数据绑定特性,这是可以发生的. 因为所有的事务都可以在一瞬间 ...
- TDirectory.GetLastAccessTime获取指定目录最后访问时间
使用函数: System.IOUtils.TDirectory.GetLastAccessTime 函数定义: class function GetLastAccessTime(const Path: ...
- TDirectory.GetAttributes、TDirectory.SetAttributes获取和设置文件夹属性
使用函数: System.IOUtils.TDirectory.GetAttributes//获取属性 System.IOUtils.TDirectory.SetAttributes//设置属性 注: ...
- Djang DJANGO_SETTINGS_MODULE
在 site-packages\django 新建一个文件 ’settings.py‘ 内容如下: DEBUG = TrueDEFAULT_FROM_EMAIL = 'alangwansui@qq.c ...
- Hadoop MapReduce Next Generation - Setting up a Single Node Cluster
Hadoop MapReduce Next Generation - Setting up a Single Node Cluster. Purpose This document describes ...
- hibernate配置文件详细解释
<!--标准的XML文件的起始行,version='1.0'表明XML的版本,encoding='gb2312'表明XML文件的编码方式--> <?xml version='1.0' ...
- ubuntu 安装mysql及修改编码
643 netstat -tap | grep mysql 645 apt-get install mysql-server mysql-client 646 netstat -tap | ...
- [BZOJ 3530] [Sdoi2014] 数数 【AC自动机+DP】
题目链接:BZOJ - 3530 题目分析 明显是 AC自动机+DP,外加数位统计. WZY 神犇出的良心省选题,然而去年我太弱..比现在还要弱得多.. 其实现在做这道题,我自己也没想出完整解法.. ...