正如这个相关的问题所描述的,似乎没有一种标准的方法可以针对XSD验证XSL文件,然后使用XSL模板将它们转换为目录解析器确定的文件路径。
XSL模板可以是XSLT1.0或XSLT2.0,后者需要Saxon9HE。
给出答案可以工作,但存在一些不受欢迎的问题,包括:
XMLCatalogResolver和CatalogResolver。SchemaFactory来执行验证。看起来,代码的这些方面应该由现有的API来处理,特别是从DOM中提取XSD URI所需的扭曲。
存在一个包含整个示例的存储库,该示例包含目录文件、架构定义和XML测试。存在上述问题的主要源文件如下:
package src;
import java.io.*;
import java.net.URI;
import java.util.*;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import javax.xml.parsers.*;
import javax.xml.xpath.*;
import javax.xml.XMLConstants;
import org.w3c.dom.*;
import org.xml.sax.*;
import org.apache.xml.resolver.tools.CatalogResolver;
import org.apache.xerces.util.XMLCatalogResolver;
import static org.apache.xerces.jaxp.JAXPConstants.JAXP_SCHEMA_LANGUAGE;
import static org.apache.xerces.jaxp.JAXPConstants.W3C_XML_SCHEMA;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Schema;
import javax.xml.validation.Validator;
import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
/**
* Download http://xerces.apache.org/xml-commons/components/resolver/CatalogManager.properties
*/
public class TestXSD {
private final static String ENTITY_RESOLVER =
"http://apache.org/xml/properties/internal/entity-resolver";
/**
* This program reads an XML file, performs validation, reads an XSL
* file, transforms the input XML, and then writes the transformed document
* to standard output.
*
* args[0] - The XSL file used to transform the XML file
* args[1] - The XML file to transform using the XSL file
*/
public static void main( String args[] ) throws Exception {
// For validation error messages.
ErrorHandler errorHandler = new DocumentErrorHandler();
// Read the CatalogManager.properties file.
CatalogResolver resolver = new CatalogResolver();
XMLCatalogResolver xmlResolver = createXMLCatalogResolver( resolver );
logDebug( "READ XML INPUT SOURCE" );
// Load an XML document in preparation to transform it.
InputSource xmlInput = new InputSource( new InputStreamReader(
new FileInputStream( args[1] ) ) );
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
dbFactory.setAttribute( JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA );
dbFactory.setNamespaceAware( true );
DocumentBuilder builder = dbFactory.newDocumentBuilder();
builder.setEntityResolver( xmlResolver );
builder.setErrorHandler( errorHandler );
logDebug( "PARSE XML INTO DOCUMENT MODEL" );
Document xmlDocument = builder.parse( xmlInput );
logDebug( "CONVERT XML DOCUMENT MODEL INTO DOMSOURCE" );
DOMSource xml = new DOMSource( xmlDocument );
logDebug( "GET XML SCHEMA DEFINITION" );
String schemaURI = getSchemaURI( xmlDocument );
logDebug( "SCHEMA URI: " + schemaURI );
if( schemaURI != null ) {
logDebug( "CREATE SCHEMA FACTORY" );
// Create a Schema factory to obtain a Schema for XML validation...
SchemaFactory sFactory = SchemaFactory.newInstance( W3C_XML_SCHEMA );
sFactory.setResourceResolver( xmlResolver );
logDebug( "CREATE XSD INPUT SOURCE" );
String xsdFileURI = xmlResolver.resolveURI( schemaURI );
logDebug( "CREATE INPUT SOURCE XSD FROM: " + xsdFileURI );
InputSource xsd = new InputSource(
new FileInputStream( new File( new URI( xsdFileURI ) ) ) );
logDebug( "CREATE SCHEMA OBJECT FOR XSD" );
Schema schema = sFactory.newSchema( new SAXSource( xsd ) );
logDebug( "CREATE VALIDATOR FOR SCHEMA" );
Validator validator = schema.newValidator();
logDebug( "VALIDATE XML AGAINST XSD" );
validator.validate( xml );
}
logDebug( "READ XSL INPUT SOURCE" );
// Load an XSL template for transforming XML documents.
InputSource xslInput = new InputSource( new InputStreamReader(
new FileInputStream( args[0] ) ) );
logDebug( "PARSE XSL INTO DOCUMENT MODEL" );
Document xslDocument = builder.parse( xslInput );
transform( xmlDocument, xslDocument, resolver );
System.out.println();
}
private static void transform(
Document xml, Document xsl, CatalogResolver resolver ) throws Exception
{
if( versionAtLeast( xsl, 2 ) ) {
useXSLT2Transformer();
}
logDebug( "CREATE TRANSFORMER FACTORY" );
// Create the transformer used for the document.
TransformerFactory tFactory = TransformerFactory.newInstance();
tFactory.setURIResolver( resolver );
logDebug( "CREATE TRANSFORMER FROM XSL" );
Transformer transformer = tFactory.newTransformer( new DOMSource( xsl ) );
logDebug( "CREATE RESULT OUTPUT STREAM" );
// This enables writing the results to standard output.
Result out = new StreamResult( new OutputStreamWriter( System.out ) );
logDebug( "TRANSFORM THE XML AND WRITE TO STDOUT" );
// Transform the document using a given stylesheet.
transformer.transform( new DOMSource( xml ), out );
}
/**
* Answers whether the given XSL document version is greater than or
* equal to the given required version number.
*
* @param xsl The XSL document to check for version compatibility.
* @param version The version number to compare against.
*
* @return true iff the XSL document version is greater than or equal
* to the version parameter.
*/
private static boolean versionAtLeast( Document xsl, float version ) {
Element root = xsl.getDocumentElement();
float docVersion = Float.parseFloat( root.getAttribute( "version" ) );
return docVersion >= version;
}
/**
* Enables Saxon9's XSLT2 transformer for XSLT2 files.
*/
private static void useXSLT2Transformer() {
System.setProperty("javax.xml.transform.TransformerFactory",
"net.sf.saxon.TransformerFactoryImpl");
}
/**
* Creates an XMLCatalogResolver based on the file names found in
* the given CatalogResolver. The resulting XMLCatalogResolver will
* contain the absolute path to all the files known to the given
* CatalogResolver.
*
* @param resolver The CatalogResolver to examine for catalog file names.
* @return An XMLCatalogResolver instance with the same number of catalog
* files as found in the given CatalogResolver.
*/
private static XMLCatalogResolver createXMLCatalogResolver(
CatalogResolver resolver ) {
int index = 0;
List files = resolver.getCatalog().getCatalogManager().getCatalogFiles();
String catalogs[] = new String[ files.size() ];
XMLCatalogResolver xmlResolver = new XMLCatalogResolver();
for( Object file : files ) {
catalogs[ index ] = (new File( file.toString() )).getAbsolutePath();
index++;
}
xmlResolver.setCatalogList( catalogs );
return xmlResolver;
}
private static String[] parseNameValue( String nv ) {
Pattern p = Pattern.compile( "\\s*(\\w+)=\"([^\"]*)\"\\s*" );
Matcher m = p.matcher( nv );
String result[] = new String[2];
if( m.find() ) {
result[0] = m.group(1);
result[1] = m.group(2);
}
return result;
}
/**
* Retrieves the XML schema definition using an XSD.
*
* @param node The document (or child node) to traverse seeking processing
* instruction nodes.
* @return null if no XSD is present in the XML document.
* @throws IOException Never thrown (uses StringReader).
*/
private static String getSchemaURI( Node node ) throws IOException {
String result = null;
if( node.getNodeType() == Node.PROCESSING_INSTRUCTION_NODE ) {
ProcessingInstruction pi = (ProcessingInstruction)node;
logDebug( "NODE IS PROCESSING INSTRUCTION" );
if( "xml-model".equals( pi.getNodeName() ) ) {
logDebug( "PI IS XML MODEL" );
// Hack to get the attributes.
String data = pi.getData();
if( data != null ) {
final String attributes[] = pi.getData().trim().split( "\\s+" );
String type = parseNameValue( attributes[0] )[1];
String href = parseNameValue( attributes[1] )[1];
// TODO: Schema should = http://www.w3.org/2001/XMLSchema
//String schema = attributes.getNamedItem( "schematypens" );
if( "application/xml".equalsIgnoreCase( type ) && href != null ) {
result = href;
}
}
}
}
else {
// Try to get the schema type information.
NamedNodeMap attrs = node.getAttributes();
if( attrs != null ) {
// TypeInfo.toString() returns values of the form:
// schemaLocation="uri schemaURI"
// The following loop extracts the schema URI.
for( int i = 0; i < attrs.getLength(); i++ ) {
Attr attribute = (Attr)attrs.item( i );
TypeInfo typeInfo = attribute.getSchemaTypeInfo();
String attr[] = parseNameValue( typeInfo.toString() );
if( "schemaLocation".equalsIgnoreCase( attr[0] ) ) {
result = attr[1].split( "\\s" )[1];
break;
}
}
}
// Look deeper for the schema URI.
if( result == null ) {
NodeList list = node.getChildNodes();
for( int i = 0; i < list.getLength(); i++ ) {
result = getSchemaURI( list.item( i ) );
if( result != null ) {
break;
}
}
}
}
return result;
}
/**
* Writes a message to standard output.
*/
private static void logDebug( String s ) {
System.out.println( s );
}
}代码中最有问题的部分是:
getSchemaURI方法;以及if( schemaURI != null ) { ... }代码块。我认为它们是冗余和脆弱的,但不知道有哪些机制可以避免手动解析和验证XSD,XSD的文件路径是使用XML目录查找的。
在不直接涉及SAX的情况下,如何使用目录解析器来使用XSD验证XML文件和转换目录中指定了XSL文件路径的文档(在DOM中)?
发布于 2014-10-30 13:37:53
/**
* Retrieves the XML schema definition using an XSD.
*
* @param node The document (or child node) to traverse seeking processing
* instruction nodes.
* @return null if no XSD is present in the XML document.
* @throws IOException Never thrown (uses StringReader).
*/
private static String getSchemaURI( Node node ) throws IOException {
String result = null;
if( node.getNodeType() == Node.PROCESSING_INSTRUCTION_NODE ) {
ProcessingInstruction pi = (ProcessingInstruction)node;
logDebug( "NODE IS PROCESSING INSTRUCTION" );
if( "xml-model".equals( pi.getNodeName() ) ) {
logDebug( "PI IS XML MODEL" );
// Hack to get the attributes.
String data = pi.getData();
if( data != null ) {
final String attributes[] = pi.getData().trim().split( "\\s+" );
String type = parseNameValue( attributes[0] )[1];
String href = parseNameValue( attributes[1] )[1];
// TODO: Schema should = http://www.w3.org/2001/XMLSchema
//String schema = attributes.getNamedItem( "schematypens" );
if( "application/xml".equalsIgnoreCase( type ) && href != null ) {
result = href;
}
}
}
}
else {
// Try to get the schema type information.
NamedNodeMap attrs = node.getAttributes();
if( attrs != null ) {
// TypeInfo.toString() returns values of the form:
// schemaLocation="uri schemaURI"
// The following loop extracts the schema URI.
for( int i = 0; i < attrs.getLength(); i++ ) {
Attr attribute = (Attr)attrs.item( i );
TypeInfo typeInfo = attribute.getSchemaTypeInfo();
String attr[] = parseNameValue( typeInfo.toString() );
if( "schemaLocation".equalsIgnoreCase( attr[0] ) ) {
result = attr[1].split( "\\s" )[1];
break;
}
}
}
// Look deeper for the schema URI.
if( result == null ) {
NodeList list = node.getChildNodes();
for( int i = 0; i < list.getLength(); i++ ) {
result = getSchemaURI( list.item( i ) );
if( result != null ) {
break;
}
}
}
}
return result;
}首先:两个空格制表符和if- of语句的新行的结合让我很难读懂。
现在,我没有办法解决你的主要问题。我想你得去别的地方问这个问题,我不能帮你像这样重构你的程序的大部分。我所能做的就是回顾代码,因为它是基于我在Java中的知识。
我相信这种方法会受到影响,因为在决定是否使用它之前,您会尝试验证所有的内容。
// Hack to get the attributes.
String data = pi.getData();
if( data != null ) {
final String attributes[] = pi.getData().trim().split( "\\s+" );data没有其他用途。所以为什么不
final String attributes[] = data.trim().split( "\\s+" );而不是?
final String attributes[] = pi.getData().trim().split( "\\s+" );
String type = parseNameValue( attributes[0] )[1];
String href = parseNameValue( attributes[1] )[1];
// TODO: Schema should = http://www.w3.org/2001/XMLSchema
//String schema = attributes.getNamedItem( "schematypens" );
if( "application/xml".equalsIgnoreCase( type ) && href != null ) {
result = href;
}在这段代码之后,您可以使用return result。有一个其他块,但是如果到达这个代码片段,则不会执行它。
因此,type和href在这个函数中没有其他用途。此外,result最初是null。
所以真正相关的是这样做:
final String attributes[] = pi.getData().trim().split( "\\s+" );
String type = parseNameValue( attributes[0] )[1];
// TODO: Schema should = http://www.w3.org/2001/XMLSchema
//String schema = attributes.getNamedItem( "schematypens" );
if( "application/xml".equalsIgnoreCase( type )) {
result = parseNameValue( attributes[1] )[1]; //href
}不需要验证href是否为null,因为无论如何,您只是将null设置为null。
我也觉得这个功能应该分成三部分:
ProcessingInstruction节点的一个函数。
从SchemaURI中确定node.getAttributes()的一个函数
以及从SchemaURI中确定node.getChildNodes()的一个函数。
这将消除您在这里拥有的语句的深度嵌套,并使您更容易理解代码。
https://codereview.stackexchange.com/questions/62678
复制相似问题