使用Java进行离线XML验证

使用Java进行离线XML验证,java,xml,Java,Xml,我需要弄清楚如何在模式离线时验证XML文件。在四处查看了几天之后,我发现基本上我需要有一个对模式的内部引用。我需要找到它们,下载它们,并将引用更改为本地系统路径。我找不到的正是如何做到这一点。在何处以及如何将引用更改为内部点而不是外部点?下载模式的最佳方式是什么?有三种方法。它们的共同点是您需要架构文档的本地副本。我假设实例文档当前使用xsi:schemaLocation和/或xsi:noNamespaceSchemaLocation来指向web上保存架构文档的位置 (a) 修改实例文档以引用架

我需要弄清楚如何在模式离线时验证XML文件。在四处查看了几天之后,我发现基本上我需要有一个对模式的内部引用。我需要找到它们,下载它们,并将引用更改为本地系统路径。我找不到的正是如何做到这一点。在何处以及如何将引用更改为内部点而不是外部点?下载模式的最佳方式是什么?

有三种方法。它们的共同点是您需要架构文档的本地副本。我假设实例文档当前使用xsi:schemaLocation和/或xsi:noNamespaceSchemaLocation来指向web上保存架构文档的位置

(a) 修改实例文档以引用架构文档的本地副本。这通常是不方便的

(b) 重定向引用,以便将对远程文件的请求重定向到本地文件。设置的方法取决于您使用的模式验证器以及如何调用它

(c) 告诉模式处理器忽略xsi:schemaLocation和xsi:noNamespaceSchemaLocation的值,并根据使用模式处理器的调用API提供的模式进行验证。同样,详细信息取决于您使用的架构处理器

我的首选方法是(c):如果只是因为在验证源文档时,那么从定义上讲,您并不完全信任它——那么为什么您应该信任它包含正确的xsi:schemaLocation属性呢?

是一个简单但功能强大的命令行工具,它可以根据目标架构对单个或多个XML文件执行脱机验证。它可以按文件名、目录或URL扫描本地xml文件

XmlValidate根据模式名称空间和映射到本地文件的配置文件自动添加schemaLocation。该工具将根据配置文件中引用的任何XML模式进行验证

以下是配置文件中命名空间到目标架构的映射示例:

http://www.opengis.net/kml/2.2=${XV_HOME}/schemas/kml22.xsd
http://appengine.google.com/ns/1.0=C:/xml/appengine-web.xsd
urn:oasis:names:tc:ciq:xsdschema:xAL:2.0=C:/xml/xAL.xsd
请注意,上面的${XV_HOME}标记只是运行XmlValidate的顶级目录的别名。该位置也可以是完整的文件路径

XmlValidate是一个开源项目(源代码可用),运行于。可以下载捆绑的应用程序(JavaJAR、示例等)

如果对多个XML文件以批处理模式运行XmlValidate,它将提供验证结果的摘要

Errors: 17  Warnings: 0  Files: 11  Time: 1506 ms
Valid files 8/11 (73%)

您可以将自己的和的实现设置为,以便 of将从本地路径提供架构

我已经编写了一个额外的类来进行脱机验证。你可以这样称呼它

new XmlSchemaValidator().validate(xmlStream, schemaStream, "https://schema.datacite.org/meta/kernel-4.1/",
                        "schemas/datacite/kernel-4.1/");
两个输入流正在通过。一个用于xml,一个用于模式。baseUrl和localPath(类路径上的相对)作为第三个和第四个参数传递。最后两个参数由验证器用于在本地路径或相对于所提供的baseUrl查找其他模式

我已经用来自的一组模式和示例进行了测试

完整示例:

 @Test
 public void validate4() throws Exception {
        InputStream xmlStream = Thread.currentThread().getContextClassLoader().getResourceAsStream(
                        "schemas/datacite/kernel-4.1/example/datacite-example-complicated-v4.1.xml");
        InputStream schemaStream = Thread.currentThread().getContextClassLoader()
                        .getResourceAsStream("schemas/datacite/kernel-4.1/metadata.xsd");
        new XmlSchemaValidator().validate(xmlStream, schemaStream, "https://schema.datacite.org/meta/kernel-4.1/",
                        "schemas/datacite/kernel-4.1/");
 }
XmlSchemaValidator将根据模式验证xml,并在本地搜索包含的模式。它使用ResourceResolver覆盖标准行为并在本地搜索

public class XmlSchemaValidator {
    /**
     * @param xmlStream
     *            xml data as a stream
     * @param schemaStream
     *            schema as a stream
     * @param baseUri
     *            to search for relative pathes on the web
     * @param localPath
     *            to search for schemas on a local directory
     * @throws SAXException
     *             if validation fails
     * @throws IOException
     *             not further specified
     */
    public void validate(InputStream xmlStream, InputStream schemaStream, String baseUri, String localPath)
                    throws SAXException, IOException {
        Source xmlFile = new StreamSource(xmlStream);
        SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
        factory.setResourceResolver((type, namespaceURI, publicId, systemId, baseURI) -> {
            LSInput input = new DOMInputImpl();
            input.setPublicId(publicId);
            input.setSystemId(systemId);
            input.setBaseURI(baseUri);
            input.setCharacterStream(new InputStreamReader(
                            getSchemaAsStream(input.getSystemId(), input.getBaseURI(), localPath)));
            return input;
        });
        Schema schema = factory.newSchema(new StreamSource(schemaStream));
        javax.xml.validation.Validator validator = schema.newValidator();
        validator.validate(xmlFile);
    }

    private InputStream getSchemaAsStream(String systemId, String baseUri, String localPath) {
        InputStream in = getSchemaFromClasspath(systemId, localPath);
        // You could just return in; , if you are sure that everything is on
        // your machine. Here I call getSchemaFromWeb as last resort.
        return in == null ? getSchemaFromWeb(baseUri, systemId) : in;
    }

    private InputStream getSchemaFromClasspath(String systemId, String localPath) {
        System.out.println("Try to get stuff from localdir: " + localPath + systemId);
        return Thread.currentThread().getContextClassLoader().getResourceAsStream(localPath + systemId);
    }

    /*
     * You can leave out the webstuff if you are sure that everything is
     * available on your machine
     */
    private InputStream getSchemaFromWeb(String baseUri, String systemId) {
        try {
            URI uri = new URI(systemId);
            if (uri.isAbsolute()) {
                System.out.println("Get stuff from web: " + systemId);
                return urlToInputStream(uri.toURL(), "text/xml");
            }
            System.out.println("Get stuff from web: Host: " + baseUri + " Path: " + systemId);
            return getSchemaRelativeToBaseUri(baseUri, systemId);
        } catch (Exception e) {
            // maybe the systemId is not a valid URI or
            // the web has nothing to offer under this address
        }
        return null;
    }

    private InputStream urlToInputStream(URL url, String accept) {
        HttpURLConnection con = null;
        InputStream inputStream = null;
        try {
            con = (HttpURLConnection) url.openConnection();
            con.setConnectTimeout(15000);
            con.setRequestProperty("User-Agent", "Name of my application.");
            con.setReadTimeout(15000);
            con.setRequestProperty("Accept", accept);
            con.connect();
            int responseCode = con.getResponseCode();
            if (responseCode == HttpURLConnection.HTTP_MOVED_PERM
                            || responseCode == HttpURLConnection.HTTP_MOVED_TEMP || responseCode == 307
                            || responseCode == 303) {
                String redirectUrl = con.getHeaderField("Location");
                try {
                    URL newUrl = new URL(redirectUrl);
                    return urlToInputStream(newUrl, accept);
                } catch (MalformedURLException e) {
                    URL newUrl = new URL(url.getProtocol() + "://" + url.getHost() + redirectUrl);
                    return urlToInputStream(newUrl, accept);
                }
            }
            inputStream = con.getInputStream();
            return inputStream;
        } catch (SocketTimeoutException e) {
            throw new RuntimeException(e);
        } catch (IOException e) {
            throw new RuntimeException(e);
        }

    }

    private InputStream getSchemaRelativeToBaseUri(String baseUri, String systemId) {
        try {
            URL url = new URL(baseUri + systemId);
            return urlToInputStream(url, "text/xml");
        } catch (Exception e) {
            e.printStackTrace();
            throw new RuntimeException(e);
        }
    }
}
印刷品

Try to get stuff from localdir: schemas/datacite/kernel-4.1/http://www.w3.org/2009/01/xml.xsd
Get stuff from web: http://www.w3.org/2009/01/xml.xsd
Try to get stuff from localdir: schemas/datacite/kernel-4.1/include/datacite-titleType-v4.xsd
Try to get stuff from localdir: schemas/datacite/kernel-4.1/include/datacite-contributorType-v4.xsd
Try to get stuff from localdir: schemas/datacite/kernel-4.1/include/datacite-dateType-v4.1.xsd
Try to get stuff from localdir: schemas/datacite/kernel-4.1/include/datacite-resourceType-v4.1.xsd
Try to get stuff from localdir: schemas/datacite/kernel-4.1/include/datacite-relationType-v4.1.xsd
Try to get stuff from localdir: schemas/datacite/kernel-4.1/include/datacite-relatedIdentifierType-v4.xsd
Try to get stuff from localdir: schemas/datacite/kernel-4.1/include/datacite-funderIdentifierType-v4.xsd
Try to get stuff from localdir: schemas/datacite/kernel-4.1/include/datacite-descriptionType-v4.xsd
Try to get stuff from localdir: schemas/datacite/kernel-4.1/include/datacite-nameType-v4.1.xsd
打印显示验证器能够根据一组本地模式进行验证。仅<代码>http://www.w3.org/2009/01/xml.xsd在本地不可用,因此从internet获取

如果还希望从本地路径获取,可以使用该路径查询将URL与本地路径关联的查找结构。