如何在JAVA中读取一个大的XML文件,并基于标记将其拆分为小的XML文件?
我是JAVA编程新手,现在我需要JAVA程序来读取一个包含..的大XML文件。。标签。示例输入如下 Input.xml如何在JAVA中读取一个大的XML文件,并基于标记将其拆分为小的XML文件?,java,xml,Java,Xml,我是JAVA编程新手,现在我需要JAVA程序来读取一个包含..的大XML文件。。标签。示例输入如下 Input.xml <row> <Name>Filename1</Name> </row> <row> <Name>Filename2</Name> </row> <row> <Name>Filename3</Name> </row> <row>
<row>
<Name>Filename1</Name>
</row>
<row>
<Name>Filename2</Name>
</row>
<row>
<Name>Filename3</Name>
</row>
<row>
<Name>Filename4</Name>
</row>
<row>
<Name>Filename5</Name>
</row>
<row>
<Name>Filename6</Name>
</row>
.
.
我首先需要输出一个.xml文件,文件名为filename1.xml
第二。。如filename2.xml等
有人能告诉我们如何用Java简单地完成这项工作吗?如果您能给出一些示例代码,这将非常有用。最好的方法是使用JAXB封送和解封器来读取和创建xml文件 下面是我可以建议使用SAXParser并扩展DefaultHandler类的方法。 您可以使用一些布尔值来跟踪您所处的标签 DefaultHandler将通过startElement方法告知您何时处于特定标记中。然后,characters方法将为您提供标记的内容,最后,endElement方法将通知您标记的结束 一旦通知您标记结束,您就可以获取刚刚保存的标记的内容,并从中创建一个文件 看看您的示例,您只需要两个布尔值-boolean inRow和boolean inName,所以这应该不是一个困难的任务= 我省略了实际的代码,你必须自己做。这相当微不足道:
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class ReadXMLFile {
public static void main(String argv[]) {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new DefaultHandler() {
boolean bfname = false;
boolean blname = false;
boolean bnname = false;
boolean bsalary = false;
public void startElement(String uri, String localName,String qName,
Attributes attributes) throws SAXException {
System.out.println("Start Element :" + qName);
if (qName.equalsIgnoreCase("FIRSTNAME")) {
bfname = true;
}
if (qName.equalsIgnoreCase("LASTNAME")) {
blname = true;
}
if (qName.equalsIgnoreCase("NICKNAME")) {
bnname = true;
}
if (qName.equalsIgnoreCase("SALARY")) {
bsalary = true;
}
}
public void endElement(String uri, String localName,
String qName) throws SAXException {
System.out.println("End Element :" + qName);
}
public void characters(char ch[], int start, int length) throws SAXException {
if (bfname) {
System.out.println("First Name : " + new String(ch, start, length));
bfname = false;
}
if (blname) {
System.out.println("Last Name : " + new String(ch, start, length));
blname = false;
}
if (bnname) {
System.out.println("Nick Name : " + new String(ch, start, length));
bnname = false;
}
if (bsalary) {
System.out.println("Salary : " + new String(ch, start, length));
bsalary = false;
}
}
};
saxParser.parse("c:\\file.xml", handler);
} catch (Exception e) {
e.printStackTrace();
}
}
}
您可以使用StAX执行以下操作,因为您说过您的xml很大 为您的用例编写代码 下面的代码使用StAX API来分解您问题中概述的文档:
import java.io.*;
import java.util.*;
import javax.xml.namespace.QName;
import javax.xml.stream.*;
import javax.xml.stream.events.*;
public class Demo {
public static void main(String[] args) throws Exception {
Demo demo = new Demo();
demo.split("src/forum7408938/input.xml", "nickname");
//demo.split("src/forum7408938/input.xml", null);
}
private void split(String xmlResource, String condition) throws Exception {
XMLEventFactory xef = XMLEventFactory.newFactory();
XMLInputFactory xif = XMLInputFactory.newInstance();
XMLEventReader xer = xif.createXMLEventReader(new FileReader(xmlResource));
StartElement rootStartElement = xer.nextTag().asStartElement(); // Advance to statements element
StartDocument startDocument = xef.createStartDocument();
EndDocument endDocument = xef.createEndDocument();
XMLOutputFactory xof = XMLOutputFactory.newFactory();
while(xer.hasNext() && !xer.peek().isEndDocument()) {
boolean metCondition;
XMLEvent xmlEvent = xer.nextTag();
if(!xmlEvent.isStartElement()) {
break;
}
// Be able to split XML file into n parts with x split elements(from
// the dummy XML example staff is the split element).
StartElement breakStartElement = xmlEvent.asStartElement();
List<XMLEvent> cachedXMLEvents = new ArrayList<XMLEvent>();
// BOUNTY CRITERIA
// I'd like to be able to specify condition that must be in the
// split element i.e. I want only staff which have nickname, I want
// to discard those without nicknames. But be able to also split
// without conditions while running split without conditions.
if(null == condition) {
cachedXMLEvents.add(breakStartElement);
metCondition = true;
} else {
cachedXMLEvents.add(breakStartElement);
xmlEvent = xer.nextEvent();
metCondition = false;
while(!(xmlEvent.isEndElement() && xmlEvent.asEndElement().getName().equals(breakStartElement.getName()))) {
cachedXMLEvents.add(xmlEvent);
if(xmlEvent.isStartElement() && xmlEvent.asStartElement().getName().getLocalPart().equals(condition)) {
metCondition = true;
break;
}
xmlEvent = xer.nextEvent();
}
}
if(metCondition) {
// Create a file for the fragment, the name is derived from the value of the id attribute
FileWriter fileWriter = null;
fileWriter = new FileWriter("src/forum7408938/" + breakStartElement.getAttributeByName(new QName("id")).getValue() + ".xml");
// A StAX XMLEventWriter will be used to write the XML fragment
XMLEventWriter xew = xof.createXMLEventWriter(fileWriter);
xew.add(startDocument);
// BOUNTY CRITERIA
// The content of the spitted files should be wrapped in the
// root element from the original file(like in the dummy example
// company)
xew.add(rootStartElement);
// Write the XMLEvents that were cached while when we were
// checking the fragment to see if it matched our criteria.
for(XMLEvent cachedEvent : cachedXMLEvents) {
xew.add(cachedEvent);
}
// Write the XMLEvents that we still need to parse from this
// fragment
xmlEvent = xer.nextEvent();
while(xer.hasNext() && !(xmlEvent.isEndElement() && xmlEvent.asEndElement().getName().equals(breakStartElement.getName()))) {
xew.add(xmlEvent);
xmlEvent = xer.nextEvent();
}
xew.add(xmlEvent);
// Close everything we opened
xew.add(xef.createEndElement(rootStartElement.getName(), null));
xew.add(endDocument);
fileWriter.close();
}
}
}
}
试试这个,
import java.io.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
public class Test{
static public void main(String[] arg) throws Exception{
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("foo.xml");
TransformerFactory tranFactory = TransformerFactory.newInstance();
Transformer aTransformer = tranFactory.newTransformer();
NodeList list = doc.getFirstChild().getChildNodes();
for (int i=0; i<list.getLength(); i++){
Node element = list.item(i).cloneNode(true);
if(element.hasChildNodes()){
Source src = new DOMSource(element);
FileOutputStream fs=new FileOutputStream("k" + i + ".xml");
Result dest = new StreamResult(fs);
aTransformer.transform(src, dest);
fs.close();
}
}
}
}
来源:假设您的文件包含包含这些行的元素:
<root>
<row><Name>Filename1</Name></row>
<row><Name>Filename2</Name></row>
<row><Name>Filename3</Name></row>
<row><Name>Filename4</Name></row>
<row><Name>Filename5</Name></row>
<row><Name>Filename6</Name></row>
</root>
此代码将实现以下功能:
package com.example;
import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class Main {
public static String readXmlFromFile(String fileName) throws Exception {
BufferedReader reader = new BufferedReader(new FileReader(fileName));
String line = null;
StringBuilder stringBuilder = new StringBuilder();
String lineSeparator = System.getProperty("line.separator");
while ((line = reader.readLine()) != null) {
stringBuilder.append(line);
stringBuilder.append(lineSeparator);
}
return stringBuilder.toString();
}
public static List<String> divideXmlByTag(String xml, String tag) throws Exception {
List<String> list = new ArrayList<String>();
Document document = loadXmlDocument(xml);
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
NodeList rowList = document.getElementsByTagName(tag);
for(int i=0; i<rowList.getLength(); i++) {
Node rowNode = rowList.item(i);
if (rowNode.getNodeType() == Node.ELEMENT_NODE) {
DOMSource source = new DOMSource(rowNode);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
StreamResult streamResult = new StreamResult(baos);
transformer.transform(source, streamResult);
list.add(baos.toString());
}
}
return list;
}
private static Document loadXmlDocument(String xml) throws SAXException, IOException, ParserConfigurationException {
return loadXmlDocument(new ByteArrayInputStream(xml.getBytes()));
}
private static Document loadXmlDocument(InputStream inputStream) throws SAXException, IOException, ParserConfigurationException {
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
DocumentBuilder documentBuilder = null;
documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document document = documentBuilder.parse(inputStream);
inputStream.close();
return document;
}
public static void main(String[] args) throws Exception {
String xmlString = readXmlFromFile("d:/test.xml");
System.out.println("original xml:\n" + xmlString + "\n");
System.out.println("divided xml:\n");
List<String> dividedXmls = divideXmlByTag(xmlString, "row");
for (String xmlPart : dividedXmls) {
System.out.println(xmlPart + "\n");
}
}
}
您只需将此xml部分写入单独的文件。因为用户要求以其他方式发布另一个解决方案 在这种情况下使用StAX解析器。它将防止整个文档一次读入内存 将XMLStreamReader前进到子片段的本地根元素。 然后,您可以使用javax.xml.transform API从这个xml片段生成一个新文档。这将把XMLStreamReader推进到该片段的末尾。 对下一个片段重复步骤1 代码示例 对于以下XML,将每个语句部分输出到以account attributes值命名的文件中:
<statements>
<statement account="123">
...stuff...
</statement>
<statement account="456">
...stuff...
</statement>
如果您是Java新手,那么推荐SAX和StAX解析的人会让您陷入困境!这是相当低级的东西,效率很高,但不是为初学者设计的。你说文件很大,他们都认为这意味着非常大,但根据我的经验,未量化的大文件可能意味着1Mb到20Gb之间的任何内容,因此基于该描述设计解决方案有些为时过早 使用XSLT2.0比使用Java更容易做到这一点。只需要这样一个样式表:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:template match="row">
<xsl:result-document href="{FileName}">
<xsl:copy-of select="."/>
</xsl:result-document>
</xsl:template>
</xsl:stylesheet>
如果必须在Java应用程序中,您可以使用API轻松调用Java转换。该链接可能会有所帮助Kamlesh是对的。。。使用JDom..@KamleshArya SAXParser=SAX和StAX是最好的选择,因为您有一个大文件,并且不希望使用JavaDOM解析器将其加载到内存中。StAX比SAX更容易使用,也更直接,它具有SAX的优点,并且能够像DOM解析器一样进行操作。一般来说,DOM解析器是最简单的,但在内存消耗方面优化最少。您没有提到文件是否适合内存。否则,您将无法使用DOM。一旦做出决定,您只需要在StAX和SAX之间做出决定。一个比另一个复杂一点,一个比另一个占地面积小一点。PandyAnool发布了一个DOM解决方案,LittleChild使用SAX,constantlearner发布了一个StAX解决方案——选择一个。对于这么简单的任务来说,它看起来不是太复杂了吗=如果有人能给出一个更简单的解决方案,那就太复杂了。
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:template match="row">
<xsl:result-document href="{FileName}">
<xsl:copy-of select="."/>
</xsl:result-document>
</xsl:template>
</xsl:stylesheet>