Java 根据孙子的id将XML拆分为更小的块

Java 根据孙子的id将XML拆分为更小的块,java,xml,Java,Xml,我有一个xml,它应该被惟一的BookId节点分割成更小的块。基本上,我需要将每本书过滤成单独的xml,其结构与初始xml相同 其目的是-要求根据XSD验证每个较小的XML,以确定哪个Book/PendingBook无效 请注意,Books节点可以同时包含Book和PendingBook节点 初始XML: <Main xmlns="http://some/url/name"> <Books> <Book> <

我有一个xml,它应该被惟一的BookId节点分割成更小的块。基本上,我需要将每本书过滤成单独的xml,其结构与初始xml相同

其目的是-要求根据XSD验证每个较小的XML,以确定哪个Book/PendingBook无效

请注意,Books节点可以同时包含BookPendingBook节点

初始XML:

<Main xmlns="http://some/url/name">
  <Books>

    <Book>
      <IdentifyingInformation>
        <ID>
          <Year>2021</Year>
          <BookId>001</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </Book>

    <Book>
      <IdentifyingInformation>
        <ID>
          <Year>2020</Year>
          <BookId>002</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </Book>

    <PendingBook>
      <IdentifyingInformation>
        <ID>
          <Year>2020</Year>
          <BookId>003</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </PendingBook>

    <OtherInfo>...</OtherInfo>

  </Books>
</Main>
<Main xmlns="http://some/url/name">
  <Books>

    <Book>
      <IdentifyingInformation>
        <ID>
          <Year>2021</Year>
          <BookId>001</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </Book>

    <OtherInfo>...</OtherInfo>

  </Books>
</Main>
<Main xmlns="http://some/url/name">
  <Books>

    <Book>
      <IdentifyingInformation>
        <ID>
          <Year>2020</Year>
          <BookId>002</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </Book>

    <OtherInfo>...</OtherInfo>

  </Books>
</Main>
<Main xmlns="http://some/url/name">
  <Books>

    <PendingBook>
      <IdentifyingInformation>
        <ID>
          <Year>2021</Year>
          <BookId>003</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </PendingBook>

    <OtherInfo>...</OtherInfo>

  </Books>
</Main>

2021
001
2021-05-10T12:35:00
2020
002
2021-05-10T12:35:00
2020
003
2021-05-10T12:35:00
...
结果应该与下一个xmls类似:

Book_001.xml(BookId=001):

<Main xmlns="http://some/url/name">
  <Books>

    <Book>
      <IdentifyingInformation>
        <ID>
          <Year>2021</Year>
          <BookId>001</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </Book>

    <Book>
      <IdentifyingInformation>
        <ID>
          <Year>2020</Year>
          <BookId>002</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </Book>

    <PendingBook>
      <IdentifyingInformation>
        <ID>
          <Year>2020</Year>
          <BookId>003</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </PendingBook>

    <OtherInfo>...</OtherInfo>

  </Books>
</Main>
<Main xmlns="http://some/url/name">
  <Books>

    <Book>
      <IdentifyingInformation>
        <ID>
          <Year>2021</Year>
          <BookId>001</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </Book>

    <OtherInfo>...</OtherInfo>

  </Books>
</Main>
<Main xmlns="http://some/url/name">
  <Books>

    <Book>
      <IdentifyingInformation>
        <ID>
          <Year>2020</Year>
          <BookId>002</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </Book>

    <OtherInfo>...</OtherInfo>

  </Books>
</Main>
<Main xmlns="http://some/url/name">
  <Books>

    <PendingBook>
      <IdentifyingInformation>
        <ID>
          <Year>2021</Year>
          <BookId>003</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </PendingBook>

    <OtherInfo>...</OtherInfo>

  </Books>
</Main>

2021
001
2021-05-10T12:35:00
...
Book_002.xml(BookId=002):

<Main xmlns="http://some/url/name">
  <Books>

    <Book>
      <IdentifyingInformation>
        <ID>
          <Year>2021</Year>
          <BookId>001</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </Book>

    <Book>
      <IdentifyingInformation>
        <ID>
          <Year>2020</Year>
          <BookId>002</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </Book>

    <PendingBook>
      <IdentifyingInformation>
        <ID>
          <Year>2020</Year>
          <BookId>003</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </PendingBook>

    <OtherInfo>...</OtherInfo>

  </Books>
</Main>
<Main xmlns="http://some/url/name">
  <Books>

    <Book>
      <IdentifyingInformation>
        <ID>
          <Year>2021</Year>
          <BookId>001</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </Book>

    <OtherInfo>...</OtherInfo>

  </Books>
</Main>
<Main xmlns="http://some/url/name">
  <Books>

    <Book>
      <IdentifyingInformation>
        <ID>
          <Year>2020</Year>
          <BookId>002</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </Book>

    <OtherInfo>...</OtherInfo>

  </Books>
</Main>
<Main xmlns="http://some/url/name">
  <Books>

    <PendingBook>
      <IdentifyingInformation>
        <ID>
          <Year>2021</Year>
          <BookId>003</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </PendingBook>

    <OtherInfo>...</OtherInfo>

  </Books>
</Main>

2020
002
2021-05-10T12:35:00
...
PendingBook_003.xml(BookId=003):

<Main xmlns="http://some/url/name">
  <Books>

    <Book>
      <IdentifyingInformation>
        <ID>
          <Year>2021</Year>
          <BookId>001</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </Book>

    <Book>
      <IdentifyingInformation>
        <ID>
          <Year>2020</Year>
          <BookId>002</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </Book>

    <PendingBook>
      <IdentifyingInformation>
        <ID>
          <Year>2020</Year>
          <BookId>003</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </PendingBook>

    <OtherInfo>...</OtherInfo>

  </Books>
</Main>
<Main xmlns="http://some/url/name">
  <Books>

    <Book>
      <IdentifyingInformation>
        <ID>
          <Year>2021</Year>
          <BookId>001</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </Book>

    <OtherInfo>...</OtherInfo>

  </Books>
</Main>
<Main xmlns="http://some/url/name">
  <Books>

    <Book>
      <IdentifyingInformation>
        <ID>
          <Year>2020</Year>
          <BookId>002</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </Book>

    <OtherInfo>...</OtherInfo>

  </Books>
</Main>
<Main xmlns="http://some/url/name">
  <Books>

    <PendingBook>
      <IdentifyingInformation>
        <ID>
          <Year>2021</Year>
          <BookId>003</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </PendingBook>

    <OtherInfo>...</OtherInfo>

  </Books>
</Main>

2021
003
2021-05-10T12:35:00
...
到目前为止,我只将每个ID节点提取到更小的XML中。并手动创建根元素

理想情况下,我希望复制初始xml中的所有元素,并将其放入Books节点的单个Book/PendingBook节点中

我的java示例:

package com.main;

import java.io.File;
import java.util.ArrayList;
import java.util.List;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class ExtractXmls {
    /**
     * @param args
     */
    public static void main(String[] args) throws Exception
    {
        String inputFile = "C:/pathToXML/Main.xml";

        File xmlFile = new File(inputFile);
        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
        Document doc = dBuilder.parse(xmlFile);

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setNamespaceAware(true); // never forget this!

        XPathFactory xfactory = XPathFactory.newInstance();
        XPath xpath = xfactory.newXPath();
        XPathExpression allBookIdsExpression = xpath.compile("//Books/*/IdentifyingInformation/ID/BookId/text()");
        NodeList bookIdNodes = (NodeList) allBookIdsExpression.evaluate(doc, XPathConstants.NODESET);

        //Save all the products
        List<String> bookIds = new ArrayList<>();
        for (int i = 0; i < bookIdNodes.getLength(); ++i) {
            Node bookId = bookIdNodes.item(i);

            System.out.println(bookId.getTextContent());
            bookIds.add(bookId.getTextContent());
        }

        //Now we create and save split XMLs
        for (String bookId : bookIds)
        {
            //With such query I can find node based on bookId
            String xpathQuery = "//ID[BookId='" + bookId + "']";
            xpath = xfactory.newXPath();
            XPathExpression query = xpath.compile(xpathQuery);
            NodeList bookIdNodesFiltered = (NodeList) query.evaluate(doc, XPathConstants.NODESET);
            System.out.println("Found " + bookIdNodesFiltered.getLength() + " bookId(s) for bookId " + bookId);

            //We store the new XML file in bookId.xml e.g. 001.xml
            Document aamcIdXml = dBuilder.newDocument();
            Element root = aamcIdXml.createElement("Main"); //Here I'm recreating root element (don't know if I can avoid it and copy somehow structure of initial xml)
            aamcIdXml.appendChild(root);
            for (int i = 0; i < bookIdNodesFiltered.getLength(); i++) {
                Node node = bookIdNodesFiltered.item(i);
                Node copyNode = aamcIdXml.importNode(node, true);
                root.appendChild(copyNode);
            }


            //At the end, we save the file XML on disk
            TransformerFactory transformerFactory = TransformerFactory.newInstance();
            Transformer transformer = transformerFactory.newTransformer();
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            DOMSource source = new DOMSource(aamcIdXml);

            StreamResult result =  new StreamResult(new File("C:/pathToXML/" + bookId.trim() + ".xml"));
            transformer.transform(source, result);

            System.out.println("Done for " + bookId);
        }
    }

}
package.com.main;
导入java.io.File;
导入java.util.ArrayList;
导入java.util.List;
导入javax.xml.parsers.DocumentBuilder;
导入javax.xml.parsers.DocumentBuilderFactory;
导入javax.xml.transform.OutputKeys;
导入javax.xml.transform.Transformer;
导入javax.xml.transform.TransformerFactory;
导入javax.xml.transform.dom.DOMSource;
导入javax.xml.transform.stream.StreamResult;
导入javax.xml.xpath.xpath;
导入javax.xml.xpath.XPathConstants;
导入javax.xml.xpath.XPathExpression;
导入javax.xml.xpath.XPathFactory;
导入org.w3c.dom.Document;
导入org.w3c.dom.Element;
导入org.w3c.dom.Node;
导入org.w3c.dom.NodeList;
公共类抽取XML{
/**
*@param args
*/
公共静态void main(字符串[]args)引发异常
{
字符串inputFile=“C:/pathToXML/Main.xml”;
文件xmlFile=新文件(inputFile);
DocumentBuilderFactory dbFactory=DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder=dbFactory.newDocumentBuilder();
Document doc=dBuilder.parse(xmlFile);
DocumentBuilderFactory工厂=DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);//永远不要忘记这一点!
XPathFactory xfactory=XPathFactory.newInstance();
XPath=xfactory.newXPath();
XPathExpression allBookIdsExpression=xpath.compile(“//Books/*/identificationinformation/ID/BookId/text()”;
NodeList bookIdNodes=(NodeList)allbookidExpression.evaluate(doc,XPathConstants.NODESET);
//保存所有产品
List bookIds=new ArrayList();
对于(int i=0;i
你几乎可以让它工作了。您可以在循环中更改XPath,迭代图书ID以获取
book
PendingBook
元素,然后使用它。此外,除了
Main
之外,还需要创建
Books
元素,并将
Book
PendingBook
附加到新创建的
Books
元素

XPath是:
//祖先::*[IdentificationInformation/ID/BookId=BookId]

它获取bookId与当前迭代中ID匹配的元素的祖先,即
Book
PendingBook
元素

//Now we create and save split XMLs
        for (String bookId : bookIds)
        {
            //With such query I can find node based on bookId
            String xpathQuery = "//ancestor::*[IdentifyingInformation/ID/BookId=" + bookId + "]";
            xpath = xfactory.newXPath();
            XPathExpression query = xpath.compile(xpathQuery);
            NodeList bookIdNodesFiltered = (NodeList) query.evaluate(doc, XPathConstants.NODESET);
            System.out.println("Found " + bookIdNodesFiltered.getLength() + " bookId(s) for bookId " + bookId);

            //We store the new XML file in bookId.xml e.g. 001.xml
            Document aamcIdXml = dBuilder.newDocument();
            Element root = aamcIdXml.createElement("Main");
            Element booksNode = aamcIdXml.createElement("Books");
            root.appendChild(booksNode);
            //Here I'm recreating root element (don't know if I can avoid it and copy somehow structure of initial xml)
            aamcIdXml.appendChild(root);
            String bookName = "";
            for (int i = 0; i < bookIdNodesFiltered.getLength(); i++) {
                Node node = bookIdNodesFiltered.item(i);
                Node copyNode = aamcIdXml.importNode(node, true);
                bookName = copyNode.getNodeName();
                booksNode.appendChild(copyNode);
            }


            //At the end, we save the file XML on disk
            TransformerFactory transformerFactory = TransformerFactory.newInstance();
            Transformer transformer = transformerFactory.newTransformer();
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            DOMSource source = new DOMSource(aamcIdXml);

            StreamResult result =  new StreamResult(new File(bookName + "_" + bookId.trim() + ".xml"));
            transformer.transform(source, result);

            System.out.println("Done for " + bookId);
        }
//现在我们创建并保存拆分XML
用于(字符串bookId:bookIds)
{