如何在JAVA中读取一个大的XML文件,并基于标记将其拆分为小的XML文件?

如何在JAVA中读取一个大的XML文件,并基于标记将其拆分为小的XML文件?,java,xml,Java,Xml,我是JAVA编程新手,现在我需要JAVA程序来读取一个包含..的大XML文件。。标签。示例输入如下 Input.xml <row> <Name>Filename1</Name> </row> <row> <Name>Filename2</Name> </row> <row> <Name>Filename3</Name> </row> <row>

我是JAVA编程新手,现在我需要JAVA程序来读取一个包含..的大XML文件。。标签。示例输入如下

Input.xml

<row>
<Name>Filename1</Name>
</row>
<row>
<Name>Filename2</Name>
</row>
<row>
<Name>Filename3</Name>
</row>
<row>
<Name>Filename4</Name>
</row>
<row>
<Name>Filename5</Name>
</row>
<row>
<Name>Filename6</Name>
</row>
 .
 .
我首先需要输出一个.xml文件,文件名为filename1.xml 第二。。如filename2.xml等


有人能告诉我们如何用Java简单地完成这项工作吗?如果您能给出一些示例代码,这将非常有用。

最好的方法是使用JAXB封送和解封器来读取和创建xml文件

下面是

我可以建议使用SAXParser并扩展DefaultHandler类的方法。 您可以使用一些布尔值来跟踪您所处的标签

DefaultHandler将通过startElement方法告知您何时处于特定标记中。然后,characters方法将为您提供标记的内容,最后,endElement方法将通知您标记的结束

一旦通知您标记结束,您就可以获取刚刚保存的标记的内容,并从中创建一个文件

看看您的示例,您只需要两个布尔值-boolean inRow和boolean inName,所以这应该不是一个困难的任务=

我省略了实际的代码,你必须自己做。这相当微不足道:

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class ReadXMLFile {

   public static void main(String argv[]) {

    try {

    SAXParserFactory factory = SAXParserFactory.newInstance();
    SAXParser saxParser = factory.newSAXParser();

    DefaultHandler handler = new DefaultHandler() {

    boolean bfname = false;
    boolean blname = false;
    boolean bnname = false;
    boolean bsalary = false;

    public void startElement(String uri, String localName,String qName, 
                Attributes attributes) throws SAXException {

        System.out.println("Start Element :" + qName);

        if (qName.equalsIgnoreCase("FIRSTNAME")) {
            bfname = true;
        }

        if (qName.equalsIgnoreCase("LASTNAME")) {
            blname = true;
        }

        if (qName.equalsIgnoreCase("NICKNAME")) {
            bnname = true;
        }

        if (qName.equalsIgnoreCase("SALARY")) {
            bsalary = true;
        }

    }

    public void endElement(String uri, String localName,
        String qName) throws SAXException {

        System.out.println("End Element :" + qName);

    }

    public void characters(char ch[], int start, int length) throws SAXException {

        if (bfname) {
            System.out.println("First Name : " + new String(ch, start, length));
            bfname = false;
        }

        if (blname) {
            System.out.println("Last Name : " + new String(ch, start, length));
            blname = false;
        }

        if (bnname) {
            System.out.println("Nick Name : " + new String(ch, start, length));
            bnname = false;
        }

        if (bsalary) {
            System.out.println("Salary : " + new String(ch, start, length));
            bsalary = false;
        }

    }

     };

       saxParser.parse("c:\\file.xml", handler);

     } catch (Exception e) {
       e.printStackTrace();
     }

   }

}

您可以使用StAX执行以下操作,因为您说过您的xml很大

为您的用例编写代码

下面的代码使用StAX API来分解您问题中概述的文档:

 import java.io.*;
    import java.util.*;

    import javax.xml.namespace.QName;
    import javax.xml.stream.*;
    import javax.xml.stream.events.*;

    public class Demo {

        public static void main(String[] args) throws Exception  {
            Demo demo = new Demo();
            demo.split("src/forum7408938/input.xml", "nickname");
            //demo.split("src/forum7408938/input.xml", null);
        }

        private void split(String xmlResource, String condition) throws Exception {
            XMLEventFactory xef = XMLEventFactory.newFactory();
            XMLInputFactory xif = XMLInputFactory.newInstance();
            XMLEventReader xer = xif.createXMLEventReader(new FileReader(xmlResource));
            StartElement rootStartElement = xer.nextTag().asStartElement(); // Advance to statements element
            StartDocument startDocument = xef.createStartDocument();
            EndDocument endDocument = xef.createEndDocument();

            XMLOutputFactory xof = XMLOutputFactory.newFactory();
            while(xer.hasNext() && !xer.peek().isEndDocument()) {
                boolean metCondition;
                XMLEvent xmlEvent = xer.nextTag();
                if(!xmlEvent.isStartElement()) {
                    break;
                }
         // Be able to split XML file into n parts with x split elements(from
            // the dummy XML example staff is the split element).
            StartElement breakStartElement = xmlEvent.asStartElement();
            List<XMLEvent> cachedXMLEvents = new ArrayList<XMLEvent>();

            // BOUNTY CRITERIA
            // I'd like to be able to specify condition that must be in the 
            // split element i.e. I want only staff which have nickname, I want 
            // to discard those without nicknames. But be able to also split 
            // without conditions while running split without conditions.
            if(null == condition) {
                cachedXMLEvents.add(breakStartElement);
                metCondition = true;
            } else {
                cachedXMLEvents.add(breakStartElement);
                xmlEvent = xer.nextEvent();
                metCondition = false;
                while(!(xmlEvent.isEndElement() && xmlEvent.asEndElement().getName().equals(breakStartElement.getName()))) {
                    cachedXMLEvents.add(xmlEvent);
                    if(xmlEvent.isStartElement() && xmlEvent.asStartElement().getName().getLocalPart().equals(condition)) {
                        metCondition = true;
                        break;
                    }
                    xmlEvent = xer.nextEvent();
                }
            }

            if(metCondition) {
                // Create a file for the fragment, the name is derived from the value of the id attribute
                FileWriter fileWriter = null;
                fileWriter = new FileWriter("src/forum7408938/" + breakStartElement.getAttributeByName(new QName("id")).getValue() + ".xml");

                // A StAX XMLEventWriter will be used to write the XML fragment
                XMLEventWriter xew = xof.createXMLEventWriter(fileWriter);
                xew.add(startDocument);

                // BOUNTY CRITERIA
                // The content of the spitted files should be wrapped in the 
                // root element from the original file(like in the dummy example
                // company)
                xew.add(rootStartElement);

                // Write the XMLEvents that were cached while when we were
                // checking the fragment to see if it matched our criteria.
                for(XMLEvent cachedEvent : cachedXMLEvents) {
                    xew.add(cachedEvent);
                }

                // Write the XMLEvents that we still need to parse from this
                // fragment
                xmlEvent = xer.nextEvent();
                while(xer.hasNext() && !(xmlEvent.isEndElement() && xmlEvent.asEndElement().getName().equals(breakStartElement.getName()))) {
                    xew.add(xmlEvent);
                    xmlEvent = xer.nextEvent();
                }
                xew.add(xmlEvent);

                // Close everything we opened
                xew.add(xef.createEndElement(rootStartElement.getName(), null));
                xew.add(endDocument);
                fileWriter.close();
            }
        }
    }

}
试试这个,

import java.io.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import javax.xml.transform.*; 
import javax.xml.transform.dom.DOMSource; 
import javax.xml.transform.stream.StreamResult;

public class Test{
 static public void main(String[] arg) throws Exception{

 DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
 DocumentBuilder builder = factory.newDocumentBuilder();
 Document doc = builder.parse("foo.xml");

 TransformerFactory tranFactory = TransformerFactory.newInstance(); 
 Transformer aTransformer = tranFactory.newTransformer(); 


 NodeList list = doc.getFirstChild().getChildNodes();

 for (int i=0; i<list.getLength(); i++){
    Node element = list.item(i).cloneNode(true);

 if(element.hasChildNodes()){
   Source src = new DOMSource(element); 
   FileOutputStream fs=new FileOutputStream("k" + i + ".xml");
   Result dest = new StreamResult(fs);
   aTransformer.transform(src, dest);
   fs.close();
   }
   }

  }
}

来源:

假设您的文件包含包含这些行的元素:

<root>
    <row><Name>Filename1</Name></row>
    <row><Name>Filename2</Name></row>
    <row><Name>Filename3</Name></row>
    <row><Name>Filename4</Name></row>
    <row><Name>Filename5</Name></row>
    <row><Name>Filename6</Name></row>
</root>
此代码将实现以下功能:

package com.example;

import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

public class Main {

    public static String readXmlFromFile(String fileName) throws Exception {
        BufferedReader reader = new BufferedReader(new FileReader(fileName));
        String line = null;
        StringBuilder stringBuilder = new StringBuilder();
        String lineSeparator = System.getProperty("line.separator");

        while ((line = reader.readLine()) != null) {
            stringBuilder.append(line);
            stringBuilder.append(lineSeparator);
        }

        return stringBuilder.toString();
    }

    public static List<String> divideXmlByTag(String xml, String tag) throws Exception {
        List<String> list = new ArrayList<String>();
        Document document = loadXmlDocument(xml);
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        Transformer transformer = transformerFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        NodeList rowList = document.getElementsByTagName(tag);
        for(int i=0; i<rowList.getLength(); i++) {
            Node rowNode = rowList.item(i);
            if (rowNode.getNodeType() == Node.ELEMENT_NODE) {
                DOMSource source = new DOMSource(rowNode);
                ByteArrayOutputStream baos = new ByteArrayOutputStream();
                StreamResult streamResult = new StreamResult(baos);
                transformer.transform(source, streamResult);
                list.add(baos.toString());
            }
        }
        return list;
    }

    private static Document loadXmlDocument(String xml) throws SAXException, IOException, ParserConfigurationException {
        return loadXmlDocument(new ByteArrayInputStream(xml.getBytes()));
    }

    private static Document loadXmlDocument(InputStream inputStream) throws SAXException, IOException, ParserConfigurationException {
        DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
        documentBuilderFactory.setNamespaceAware(true);
        DocumentBuilder documentBuilder = null;
        documentBuilder = documentBuilderFactory.newDocumentBuilder();
        Document document = documentBuilder.parse(inputStream);
        inputStream.close();
        return document;
    }

    public static void main(String[] args) throws Exception {
        String xmlString = readXmlFromFile("d:/test.xml");
        System.out.println("original xml:\n" + xmlString + "\n");
        System.out.println("divided xml:\n");
        List<String> dividedXmls = divideXmlByTag(xmlString, "row");
        for (String xmlPart : dividedXmls) {
            System.out.println(xmlPart + "\n");
        }

    }
}

您只需将此xml部分写入单独的文件。

因为用户要求以其他方式发布另一个解决方案

在这种情况下使用StAX解析器。它将防止整个文档一次读入内存

将XMLStreamReader前进到子片段的本地根元素。 然后,您可以使用javax.xml.transform API从这个xml片段生成一个新文档。这将把XMLStreamReader推进到该片段的末尾。 对下一个片段重复步骤1

代码示例

对于以下XML,将每个语句部分输出到以account attributes值命名的文件中:

<statements>
   <statement account="123">
      ...stuff...
   </statement>
   <statement account="456">
      ...stuff...
   </statement>

如果您是Java新手,那么推荐SAX和StAX解析的人会让您陷入困境!这是相当低级的东西,效率很高,但不是为初学者设计的。你说文件很大,他们都认为这意味着非常大,但根据我的经验,未量化的大文件可能意味着1Mb到20Gb之间的任何内容,因此基于该描述设计解决方案有些为时过早

使用XSLT2.0比使用Java更容易做到这一点。只需要这样一个样式表:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:template match="row">
  <xsl:result-document href="{FileName}">
    <xsl:copy-of select="."/>
  </xsl:result-document>
</xsl:template>
</xsl:stylesheet>

如果必须在Java应用程序中,您可以使用API轻松调用Java转换。

该链接可能会有所帮助Kamlesh是对的。。。使用JDom..@KamleshArya SAXParser=SAX和StAX是最好的选择,因为您有一个大文件,并且不希望使用JavaDOM解析器将其加载到内存中。StAX比SAX更容易使用,也更直接,它具有SAX的优点,并且能够像DOM解析器一样进行操作。一般来说,DOM解析器是最简单的,但在内存消耗方面优化最少。您没有提到文件是否适合内存。否则,您将无法使用DOM。一旦做出决定,您只需要在StAX和SAX之间做出决定。一个比另一个复杂一点,一个比另一个占地面积小一点。PandyAnool发布了一个DOM解决方案,LittleChild使用SAX,constantlearner发布了一个StAX解决方案——选择一个。对于这么简单的任务来说,它看起来不是太复杂了吗=如果有人能给出一个更简单的解决方案,那就太复杂了。
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:template match="row">
  <xsl:result-document href="{FileName}">
    <xsl:copy-of select="."/>
  </xsl:result-document>
</xsl:template>
</xsl:stylesheet>