Java 如何提高拆分xml文件的性能

Java 如何提高拆分xml文件的性能,java,xml,Java,Xml,我已经看到很多关于将XML文件拆分成更小的块的帖子/博客/文章,并决定创建自己的,因为我有一些自定义需求。以下是我的意思,请考虑下面的XML: <?xml version="1.0" encoding="UTF-8" standalone="no" ?> <company> <staff id="1"> <firstname>yong</firstname> <lastname>mook kim</

我已经看到很多关于将XML文件拆分成更小的块的帖子/博客/文章,并决定创建自己的,因为我有一些自定义需求。以下是我的意思,请考虑下面的XML:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?> 
<company>
 <staff id="1">
    <firstname>yong</firstname>
    <lastname>mook kim</lastname>
    <nickname>mkyong</nickname>
    <salary>100000</salary>
   </staff>
 <staff id="2">
    <firstname>yong</firstname>
    <lastname>mook kim</lastname>
    <nickname>mkyong</nickname>
    <salary>100000</salary>
   </staff>
 <staff id="3">
    <firstname>yong</firstname>
    <lastname>mook kim</lastname>
    <nickname>mkyong</nickname>
    <salary>100000</salary>
   </staff>
 <staff id="4">
    <firstname>yong</firstname>
    <lastname>mook kim</lastname>
    <nickname>mkyong</nickname>
    <salary>100000</salary>
   </staff>
 <staff id="5">
    <firstname>yong</firstname>
    <lastname>mook kim</lastname>
    <salary>100000</salary>
   </staff>
</company>
我知道每个编写的
员工的开始/结束流程都会影响绩效。但如果我为每个文件写一次(可能包含n个
员工
)。当然,根元素和拆分元素是可配置的

有什么想法可以改进性能/逻辑吗?我更喜欢一些代码,但好的建议有时会更好

编辑:

这个XML示例实际上是一个伪示例,我尝试拆分的真正XML是拆分元素下大约300-500个不同的元素,所有元素都以随机顺序出现,并且数量不同。毕竟,Stax可能不是最好的解决方案

赏金更新:

我正在寻找一种解决方案(代码),它将:

  • 能够使用x个分割元素将XML文件分割成n个部分(从虚拟XML示例中,staff是分割元素)

  • 吐出的文件的内容应该包装在原始文件的根元素中(就像在虚拟示例公司中一样)

  • 我希望能够指定拆分元素中必须存在的条件,即,我只想要有昵称的员工,我想要丢弃没有昵称的员工。但在运行无条件拆分时,也可以无条件拆分

  • 代码不一定要改进我的解决方案(缺乏良好的逻辑和性能),但它可以工作

不喜欢“但它有效”。我找不到足够多的用于此类操作的Stax示例,用户社区也不是很好。它也不一定是Stax解决方案

我可能要求太多了,但我来这里是为了学习一些东西,为我认为的解决方案提供了很好的奖励。

第一条建议:不要试图编写自己的XML处理代码。使用XML解析器——它将更加可靠,而且可能更快


如果使用XML拉式解析器(例如),您应该能够一次读取一个元素并将其写入磁盘,而不是一次读取整个文档。

我的建议如下。它需要一个流式XSLT 3.0处理器:这意味着实际上它需要Saxon EE 9.3

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">

<xsl:mode streamable="yes">

<xsl:template match="/">
  <xsl:apply-templates select="company/staff"/>
</xsl:template>

<xsl:template match=staff">
  <xsl:variable name="v" as="element(staff)">
    <xsl:copy-of select="."/>
  </xsl:variable>
  <xsl:if test="$v/nickname">
    <xsl:result-document href="{@id}.xml">
      <xsl:copy-of select="$v"/>
    </xsl:result-document>
  </xsl:if>
</xsl:template>

</xsl:stylesheet>


您可以使用StAX执行以下操作:

算法

  • 读取并保持根元素事件
  • 读取XML的第一块:
  • 将事件排队,直到满足条件
  • 如果条件已满足:
  • 写入开始文档事件
  • 写出根起始元素事件
  • 写出拆分开始元素事件
  • 写出排队事件
  • 写出本节的剩余事件
  • 如果不满足条件,则什么也不做
  • 对下一块XML重复步骤2
  • 您的用例代码

    下面的代码使用StAX API来分解您问题中概述的文档:

    package forum7408938;
    
    import java.io.*;
    import java.util.*;
    
    import javax.xml.namespace.QName;
    import javax.xml.stream.*;
    import javax.xml.stream.events.*;
    
    public class Demo {
    
        public static void main(String[] args) throws Exception  {
            Demo demo = new Demo();
            demo.split("src/forum7408938/input.xml", "nickname");
            //demo.split("src/forum7408938/input.xml", null);
        }
    
        private void split(String xmlResource, String condition) throws Exception {
            XMLEventFactory xef = XMLEventFactory.newFactory();
            XMLInputFactory xif = XMLInputFactory.newInstance();
            XMLEventReader xer = xif.createXMLEventReader(new FileReader(xmlResource));
            StartElement rootStartElement = xer.nextTag().asStartElement(); // Advance to statements element
            StartDocument startDocument = xef.createStartDocument();
            EndDocument endDocument = xef.createEndDocument();
    
            XMLOutputFactory xof = XMLOutputFactory.newFactory();
            while(xer.hasNext() && !xer.peek().isEndDocument()) {
                boolean metCondition;
                XMLEvent xmlEvent = xer.nextTag();
                if(!xmlEvent.isStartElement()) {
                    break;
                }
                // BOUNTY CRITERIA
                // Be able to split XML file into n parts with x split elements(from
                // the dummy XML example staff is the split element).
                StartElement breakStartElement = xmlEvent.asStartElement();
                List<XMLEvent> cachedXMLEvents = new ArrayList<XMLEvent>();
    
                // BOUNTY CRITERIA
                // I'd like to be able to specify condition that must be in the 
                // split element i.e. I want only staff which have nickname, I want 
                // to discard those without nicknames. But be able to also split 
                // without conditions while running split without conditions.
                if(null == condition) {
                    cachedXMLEvents.add(breakStartElement);
                    metCondition = true;
                } else {
                    cachedXMLEvents.add(breakStartElement);
                    xmlEvent = xer.nextEvent();
                    metCondition = false;
                    while(!(xmlEvent.isEndElement() && xmlEvent.asEndElement().getName().equals(breakStartElement.getName()))) {
                        cachedXMLEvents.add(xmlEvent);
                        if(xmlEvent.isStartElement() && xmlEvent.asStartElement().getName().getLocalPart().equals(condition)) {
                            metCondition = true;
                            break;
                        }
                        xmlEvent = xer.nextEvent();
                    }
                }
    
                if(metCondition) {
                    // Create a file for the fragment, the name is derived from the value of the id attribute
                    FileWriter fileWriter = null;
                    fileWriter = new FileWriter("src/forum7408938/" + breakStartElement.getAttributeByName(new QName("id")).getValue() + ".xml");
    
                    // A StAX XMLEventWriter will be used to write the XML fragment
                    XMLEventWriter xew = xof.createXMLEventWriter(fileWriter);
                    xew.add(startDocument);
    
                    // BOUNTY CRITERIA
                    // The content of the spitted files should be wrapped in the 
                    // root element from the original file(like in the dummy example
                    // company)
                    xew.add(rootStartElement);
    
                    // Write the XMLEvents that were cached while when we were
                    // checking the fragment to see if it matched our criteria.
                    for(XMLEvent cachedEvent : cachedXMLEvents) {
                        xew.add(cachedEvent);
                    }
    
                    // Write the XMLEvents that we still need to parse from this
                    // fragment
                    xmlEvent = xer.nextEvent();
                    while(xer.hasNext() && !(xmlEvent.isEndElement() && xmlEvent.asEndElement().getName().equals(breakStartElement.getName()))) {
                        xew.add(xmlEvent);
                        xmlEvent = xer.nextEvent();
                    }
                    xew.add(xmlEvent);
    
                    // Close everything we opened
                    xew.add(xef.createEndElement(rootStartElement.getName(), null));
                    xew.add(endDocument);
                    fileWriter.close();
                }
            }
        }
    
    }
    
    用于UM7408938的包;
    导入java.io.*;
    导入java.util.*;
    导入javax.xml.namespace.QName;
    导入javax.xml.stream.*;
    导入javax.xml.stream.events.*;
    公开课演示{
    公共静态void main(字符串[]args)引发异常{
    Demo=newdemo();
    split(“src/forum7408938/input.xml”,“昵称”);
    //split(“src/forum7408938/input.xml”,null);
    }
    私有void拆分(字符串xmlResource,字符串条件)引发异常{
    XMLEventFactory xef=XMLEventFactory.newFactory();
    XMLInputFactory xif=XMLInputFactory.newInstance();
    XMLEventReader xer=xif.createXMLEventReader(新文件读取器(xmlResource));
    StartElement rootStartElement=xer.nextTag().asStartElement();//前进到statements元素
    StartDocument StartDocument=xef.createStartDocument();
    EndDocument EndDocument=xef.createEndDocument();
    XMLOutputFactory xof=XMLOutputFactory.newFactory();
    while(xer.hasNext()&&!xer.peek().isEndDocument()){
    布尔条件;
    XMLEvent XMLEvent=xer.nextTag();
    如果(!xmlEvent.isStartElement()){
    打破
    }
    //赏金标准
    //能够使用x个拆分元素将XML文件拆分为n个部分(从
    //伪XML示例staff是split元素)。
    StartElement breakStartElement=xmlEvent.asStartElement();
    List cachedXMLEvents=new ArrayList();
    //赏金标准
    //我希望能够指定必须在
    //拆分元素,即我只想要有昵称的员工,我想要
    //丢弃那些没有昵称的。但也可以拆分
    //无条件运行时无条件拆分。
    if(null==条件){
    cachedXMLEvents.add(breakStartElement);
    metCondition=true;
    }否则{
    cachedXMLEvents.add(breakStartElement);
    xmlEvent=xer.nextEvent();
    metCondition=false;
    而(!(xmlEvent.isEndElement()&&xmlEvent.asEndElement().getName().equals(breakStartElement.getName())){
    cachedXMLEvents.add(xmlEvent);
    if(xmleevent.isStartElement()&&xmleevent.asStartElement().getName().getLocalPart().equals(条件)){
    metCondition=true;
    打破
    }
    xmlEvent=xer.nextEvent();
    }
    }
    if(metCondition){
    //为片段创建一个文件,该名称由id属性的值派生而来
    FileWriter FileWriter=null;
    fileWriter=新的fileWriter(“src/foru
    
    package forum7408938;
    
    import java.io.*;
    import java.util.*;
    
    import javax.xml.namespace.QName;
    import javax.xml.stream.*;
    import javax.xml.stream.events.*;
    
    public class Demo {
    
        public static void main(String[] args) throws Exception  {
            Demo demo = new Demo();
            demo.split("src/forum7408938/input.xml", "nickname");
            //demo.split("src/forum7408938/input.xml", null);
        }
    
        private void split(String xmlResource, String condition) throws Exception {
            XMLEventFactory xef = XMLEventFactory.newFactory();
            XMLInputFactory xif = XMLInputFactory.newInstance();
            XMLEventReader xer = xif.createXMLEventReader(new FileReader(xmlResource));
            StartElement rootStartElement = xer.nextTag().asStartElement(); // Advance to statements element
            StartDocument startDocument = xef.createStartDocument();
            EndDocument endDocument = xef.createEndDocument();
    
            XMLOutputFactory xof = XMLOutputFactory.newFactory();
            while(xer.hasNext() && !xer.peek().isEndDocument()) {
                boolean metCondition;
                XMLEvent xmlEvent = xer.nextTag();
                if(!xmlEvent.isStartElement()) {
                    break;
                }
                // BOUNTY CRITERIA
                // Be able to split XML file into n parts with x split elements(from
                // the dummy XML example staff is the split element).
                StartElement breakStartElement = xmlEvent.asStartElement();
                List<XMLEvent> cachedXMLEvents = new ArrayList<XMLEvent>();
    
                // BOUNTY CRITERIA
                // I'd like to be able to specify condition that must be in the 
                // split element i.e. I want only staff which have nickname, I want 
                // to discard those without nicknames. But be able to also split 
                // without conditions while running split without conditions.
                if(null == condition) {
                    cachedXMLEvents.add(breakStartElement);
                    metCondition = true;
                } else {
                    cachedXMLEvents.add(breakStartElement);
                    xmlEvent = xer.nextEvent();
                    metCondition = false;
                    while(!(xmlEvent.isEndElement() && xmlEvent.asEndElement().getName().equals(breakStartElement.getName()))) {
                        cachedXMLEvents.add(xmlEvent);
                        if(xmlEvent.isStartElement() && xmlEvent.asStartElement().getName().getLocalPart().equals(condition)) {
                            metCondition = true;
                            break;
                        }
                        xmlEvent = xer.nextEvent();
                    }
                }
    
                if(metCondition) {
                    // Create a file for the fragment, the name is derived from the value of the id attribute
                    FileWriter fileWriter = null;
                    fileWriter = new FileWriter("src/forum7408938/" + breakStartElement.getAttributeByName(new QName("id")).getValue() + ".xml");
    
                    // A StAX XMLEventWriter will be used to write the XML fragment
                    XMLEventWriter xew = xof.createXMLEventWriter(fileWriter);
                    xew.add(startDocument);
    
                    // BOUNTY CRITERIA
                    // The content of the spitted files should be wrapped in the 
                    // root element from the original file(like in the dummy example
                    // company)
                    xew.add(rootStartElement);
    
                    // Write the XMLEvents that were cached while when we were
                    // checking the fragment to see if it matched our criteria.
                    for(XMLEvent cachedEvent : cachedXMLEvents) {
                        xew.add(cachedEvent);
                    }
    
                    // Write the XMLEvents that we still need to parse from this
                    // fragment
                    xmlEvent = xer.nextEvent();
                    while(xer.hasNext() && !(xmlEvent.isEndElement() && xmlEvent.asEndElement().getName().equals(breakStartElement.getName()))) {
                        xew.add(xmlEvent);
                        xmlEvent = xer.nextEvent();
                    }
                    xew.add(xmlEvent);
    
                    // Close everything we opened
                    xew.add(xef.createEndElement(rootStartElement.getName(), null));
                    xew.add(endDocument);
                    fileWriter.close();
                }
            }
        }
    
    }
    
    java    XMLSplitter xmlFileLocation  splitElement filter filterElement
    
    java    XMLSplitter input.xml  staff  true nickname
    
    java    XMLSplitter input.xml  staff 
    
    import java.io.File;
    import java.io.FileWriter;
    import java.io.IOException;
    import java.io.StringReader;
    import java.io.StringWriter;
    
    import javax.xml.parsers.DocumentBuilder;
    import javax.xml.parsers.DocumentBuilderFactory;
    import javax.xml.parsers.ParserConfigurationException;
    import javax.xml.transform.OutputKeys;
    import javax.xml.transform.Transformer;
    import javax.xml.transform.TransformerConfigurationException;
    import javax.xml.transform.TransformerException;
    import javax.xml.transform.TransformerFactory;
    import javax.xml.transform.dom.DOMSource;
    import javax.xml.transform.stream.StreamResult;
    import javax.xml.xpath.XPath;
    import javax.xml.xpath.XPathConstants;
    import javax.xml.xpath.XPathExpression;
    import javax.xml.xpath.XPathExpressionException;
    import javax.xml.xpath.XPathFactory;
    
    import org.w3c.dom.DOMException;
    import org.w3c.dom.DOMImplementation;
    import org.w3c.dom.Document;
    import org.w3c.dom.Element;
    import org.w3c.dom.Node;
    import org.w3c.dom.NodeList;
    import org.xml.sax.InputSource;
    import org.xml.sax.SAXException;
    
    public class XMLSplitter {
    
        DocumentBuilder builder = null;
        XPath xpath = null; 
        Transformer transformer = null;
        String filterElement;
        String splitElement;
        String xmlFileLocation;
        boolean filter = true;
    
    
        public static void main(String[] arg) throws Exception{
    
            XMLSplitter xMLSplitter = null;
            if(arg.length < 4){
    
                if(arg.length < 2){
                    System.out.println("Insufficient arguments !!!");
                    System.out.println("Usage: XMLSplitter xmlFileLocation  splitElement filter filterElement ");
                    return;
                }else{
                    System.out.println("Filter is off...");
                    xMLSplitter = new XMLSplitter();
                    xMLSplitter.init(arg[0],arg[1],false,null);
                }
    
            }else{
                xMLSplitter = new XMLSplitter();
                xMLSplitter.init(arg[0],arg[1],Boolean.parseBoolean(arg[2]),arg[3]);
            }
    
    
    
            xMLSplitter.start();    
    
        }
    
        public void init(String xmlFileLocation, String splitElement, boolean filter, String filterElement ) 
                    throws ParserConfigurationException, TransformerConfigurationException{
    
            //Initialize the Document builder
            System.out.println("Initializing..");
            DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
            domFactory.setNamespaceAware(true);   
            builder = domFactory.newDocumentBuilder();
    
            //Initialize the transformer
            TransformerFactory transformerFactory = TransformerFactory.newInstance();
            transformer = transformerFactory.newTransformer();
            transformer.setOutputProperty(OutputKeys.METHOD, "xml");
            transformer.setOutputProperty(OutputKeys.ENCODING,"UTF-8");
            transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
    
            //Initialize the xpath
            XPathFactory factory = XPathFactory.newInstance();
            xpath = factory.newXPath();
    
            this.filterElement = filterElement;
            this.splitElement = splitElement;
            this.xmlFileLocation = xmlFileLocation;
            this.filter = filter;
    
    
        }   
    
    
        public void start() throws Exception{
    
                //Parser the file 
                System.out.println("Parsing file.");
                Document doc = builder. parse(xmlFileLocation);
    
                //Get the root node name
                System.out.println("Getting root element.");
                XPathExpression rootElementexpr = xpath.compile("/");
                Object rootExprResult = rootElementexpr.evaluate(doc, XPathConstants.NODESET);
                NodeList rootNode = (NodeList) rootExprResult;          
                String rootNodeName = rootNode.item(0).getFirstChild().getNodeName();
    
                //Get the list of split elements
                XPathExpression expr = xpath.compile("//"+splitElement);
                Object result = expr.evaluate(doc, XPathConstants.NODESET);
                NodeList nodes = (NodeList) result;
                System.out.println("Total number of split nodes "+nodes.getLength());
                for (int i = 0; i < nodes.getLength(); i++) {
                    //Wrap each node inside root of the parent xml doc
                    Node sigleNode = wrappInRootElement(rootNodeName,nodes.item(i));
                    //Get the XML string of the fragment
                    String xmlFragment = serializeDocument(sigleNode);
                    //System.out.println(xmlFragment);
                    //Write the xml fragment in file.
                    storeInFile(xmlFragment,i);         
                }
    
        }
    
        private  Node wrappInRootElement(String rootNodeName, Node fragmentDoc) 
                    throws XPathExpressionException, ParserConfigurationException, DOMException, 
                            SAXException, IOException, TransformerException{
    
            //Create empty doc with just root node
            DOMImplementation domImplementation = builder.getDOMImplementation();
            Document doc = domImplementation.createDocument(null,null,null);
            Element theDoc = doc.createElement(rootNodeName);
            doc.appendChild(theDoc);
    
            //Insert the fragment inside the root node 
            InputSource inStream = new InputSource();     
            String xmlString = serializeDocument(fragmentDoc);
            inStream.setCharacterStream(new StringReader(xmlString));       
            Document fr = builder.parse(inStream);
            theDoc.appendChild(doc.importNode(fr.getFirstChild(),true));
            return doc;
        }
    
        private String serializeDocument(Node doc) throws TransformerException, XPathExpressionException{
    
            if(!serializeThisNode(doc)){
                return null;
            }
    
            DOMSource domSource = new DOMSource(doc);                
            StringWriter stringWriter = new StringWriter();
            StreamResult streamResult = new StreamResult(stringWriter);
            transformer.transform(domSource, streamResult);
            String xml = stringWriter.toString();
            return xml;
    
        }
    
        //Check whether node is to be stored in file or rejected based on input
        private boolean serializeThisNode(Node doc) throws XPathExpressionException{
    
             if(!filter){
                 return true;
             }
    
             XPathExpression filterElementexpr = xpath.compile("//"+filterElement);
             Object result = filterElementexpr.evaluate(doc, XPathConstants.NODESET);
             NodeList nodes = (NodeList) result;
    
             if(nodes.item(0) != null){
                 return true;
             }else{
                 return false;
             }       
        }
    
        private void storeInFile(String content, int fileIndex) throws IOException{
    
            if(content == null || content.length() == 0){
                return;
            }
    
            String fileName = splitElement+fileIndex+".xml";
    
            File file = new File(fileName);
            if(file.exists()){
                System.out.println(" The file "+fileName+" already exists !! cannot create the file with the same name ");
                return;
            }
            FileWriter fileWriter = new FileWriter(file);
            fileWriter.write(content);
            fileWriter.close();
            System.out.println("Generated file "+fileName);
    
    
        }
    
    }
    
    <?xml version="1.0" encoding="UTF-8" standalone="no" ?>
    <company>
        <staff id="1">
            <firstname>yong</firstname>
            <lastname>mook kim</lastname>
            <nickname>mkyong</nickname>
            <salary>100000</salary>
            <other>
                <staff>
                ...
                </staff>
            </other>
        </staff>
    </company>
    
    import java.io.File;
    import java.io.FileNotFoundException;
    import java.io.FileReader;
    import java.io.IOException;
    
    import org.apache.commons.io.FileUtils;
    import org.xmlpull.v1.XmlPullParser;
    import org.xmlpull.v1.XmlPullParserException;
    import org.xmlpull.v1.XmlPullParserFactory;
    
    public class XppSample {
    
    private String rootTag;
    private String splitTag;
    private String requiredTag;
    private int flushThreshold;
    private String fileName;
    
    private String rootTagEnd;
    
    private boolean hasRequiredTag = false;
    private int flushCount = 0;
    private int fileNo = 0;
    private String header;
    private XmlPullParser xpp;
    private StringBuilder nodeBuf = new StringBuilder();
    private StringBuilder fileBuf = new StringBuilder();
    
    
    public XppSample(String fileName, String rootTag, String splitTag, String requiredTag, int flushThreshold) throws XmlPullParserException, FileNotFoundException {
    
        this.rootTag = rootTag;
        rootTagEnd = "</" + rootTag + ">";
        this.splitTag = splitTag;
        this.requiredTag = requiredTag;
        this.flushThreshold = flushThreshold;
        this.fileName = fileName; 
    
        XmlPullParserFactory factory = XmlPullParserFactory.newInstance(System.getProperty(XmlPullParserFactory.PROPERTY_NAME), null);
        factory.setNamespaceAware(true);
        xpp = factory.newPullParser();
        xpp.setInput(new FileReader(fileName));
    }
    
    
    public void processDocument() throws XmlPullParserException, IOException {
        int eventType = xpp.getEventType();
        do {
            if(eventType == XmlPullParser.START_TAG) {
                processStartElement(xpp);
            } else if(eventType == XmlPullParser.END_TAG) {
                processEndElement(xpp);
            } else if(eventType == XmlPullParser.TEXT) {
                processText(xpp);
            }
            eventType = xpp.next();
        } while (eventType != XmlPullParser.END_DOCUMENT);
    
        saveFile();
    }
    
    
    public void processStartElement(XmlPullParser xpp) {
    
        int holderForStartAndLength[] = new int[2];
        String name = xpp.getName();
        char ch[] = xpp.getTextCharacters(holderForStartAndLength);
        int start = holderForStartAndLength[0];
        int length = holderForStartAndLength[1];
    
        if(name.equals(rootTag)) {
            int pos = start + length;
            header = new String(ch, 0, pos);
        } else {
            if(requiredTag==null || name.equals(requiredTag)) {
                hasRequiredTag = true;
            }
            nodeBuf.append(xpp.getText());
        }
    }
    
    
    public void flushBuffer() throws IOException {
        if(hasRequiredTag) {
            fileBuf.append(nodeBuf);
            if(((++flushCount)%flushThreshold)==0) {
                saveFile();
            }           
        }
        nodeBuf = new StringBuilder();
        hasRequiredTag = false;
    }
    
    
    public void saveFile() throws IOException {
        if(fileBuf.length()>0) {
            String splitFile = header + fileBuf.toString() + rootTagEnd;
            FileUtils.writeStringToFile(new File((fileNo++) + "_" + fileName), splitFile);
            fileBuf = new StringBuilder();
        }
    }
    
    
    public void processEndElement (XmlPullParser xpp) throws IOException {
    
        String name = xpp.getName();
    
        if(name.equals(rootTag)) {
            flushBuffer();
        } else {
            nodeBuf.append(xpp.getText());
            if(name.equals(splitTag)) {
                flushBuffer();
            }
        }
    }
    
    
    public void processText (XmlPullParser xpp) throws XmlPullParserException {
    
        int holderForStartAndLength[] = new int[2];
        char ch[] = xpp.getTextCharacters(holderForStartAndLength);
        int start = holderForStartAndLength[0];
        int length = holderForStartAndLength[1];
        String content = new String(ch, start, length);
    
        nodeBuf.append(content);
    }
    
    
    public static void main (String args[]) throws XmlPullParserException, IOException {
    
        //XppSample app = new XppSample("input.xml", "company", "staff", "nickname", 3);
        XppSample app = new XppSample("input.xml", "company", "staff", null, 3);
        app.processDocument();
    }
    
    <?xml version="1.0" encoding="UTF-8"?>
    <schema xmlns="http://www.w3.org/2001/XMLSchema"
        targetNamespace="http://www.example.org/NewXMLSchema"
        xmlns:tns="http://www.example.org/NewXMLSchema"
        elementFormDefault="unqualified">
        <element name="company">
            <complexType>
            <sequence>
                <element name="staff" type="tns:stafftype"/>
                </sequence>
            </complexType>
    
        </element>
    
        <complexType name="stafftype">
            <sequence>
            <element name="firstname" type="string" minOccurs="0" />
            <element name="lastname" type="string" minOccurs="0" />
            <element name="nickname" type="string" minOccurs="1" />
            <element name="salary" type="int" minOccurs="0" />
            </sequence>
    
        </complexType>
    
    </schema>
    
    import java.io.BufferedReader;
    import java.io.ByteArrayInputStream;
    import java.io.File;
    import java.io.IOException;
    import java.util.concurrent.ExecutorService;
    import java.util.concurrent.Executors;
    
    import javax.xml.transform.stream.StreamSource;
    import javax.xml.validation.Schema;
    import javax.xml.validation.SchemaFactory;
    import javax.xml.validation.Validator;
    
    import org.xml.sax.SAXException;
    
    public class testXML {
        //  Lookup a factory for the W3C XML Schema language
        static SchemaFactory factory = SchemaFactory
                .newInstance("http://www.w3.org/2001/XMLSchema");
    
        //  Compile the schema. 
        static File schemaLocation = new File("company.xsd");
        static Schema schema = null;
        static {
            try {
                schema = factory.newSchema(schemaLocation);
            } catch (SAXException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
        }
    
        private final ExecutorService pool = Executors.newFixedThreadPool(20);;
    
        boolean validate(StringBuffer splitBuffer) {
            boolean isValid = false;
            Validator validator = schema.newValidator();
            try {
                validator.validate(new StreamSource(new ByteArrayInputStream(
                        splitBuffer.toString().getBytes())));
                isValid = true;
            } catch (SAXException ex) {
                System.out.println(ex.getMessage());
            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
            return isValid;
    
        }
    
        void split(BufferedReader br, String rootElementName,
                String splitElementName) {
            StringBuffer splitBuffer = null;
            String line = null;
            String startRootElement = "<" + rootElementName + ">";
            String endRootElement = "</" + rootElementName + ">";
    
            String startSplitElement = "<" + splitElementName + ">";
            String endSplitElement = "</" + splitElementName + ">";
            String xmlDeclaration = "<?xml version=\"1.0\"";
            boolean startFlag = false, endflag = false;
            try {
                while ((line = br.readLine()) != null) {
                    if (line.contains(xmlDeclaration)
                            || line.contains(startRootElement)
                            || line.contains(endRootElement)) {
                        continue;
                    }
    
                    if (line.contains(startSplitElement)) {
                        startFlag = true;
                        endflag = false;
                        splitBuffer = new StringBuffer(startRootElement);
                        splitBuffer.append(line);
    
                    } else if (line.contains(endSplitElement)) {
                        endflag = true;
                        startFlag = false;
                        splitBuffer.append(line);
                        splitBuffer.append(endRootElement);
    
                    } else if (startFlag) {
                        splitBuffer.append(line);
                    }
    
                    if (endflag) {
                        //process splitBuffer
                        boolean result = validate(splitBuffer);
                        if (result) {
                            //send it to a thread for processing further
                            //it is async so that main thread can continue for next
    
                            pool.submit(new ProcessingHandler(splitBuffer));
    
                        }
                    }
    
                }
            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
    
        }
    }
    
    class ProcessingHandler implements Runnable {
        String splitXML = null;
    
        ProcessingHandler(StringBuffer splitXMLBuffer) {
            this.splitXML = splitXMLBuffer.toString();
        }
    
        @Override
        public void run() {
            // do like writing to a file etc.
    
        }
    
    }
    
    private static void copyEvent(int event, XMLStreamReader  reader, XMLStreamWriter writer) throws XMLStreamException {
        if (event == XMLStreamConstants.START_ELEMENT) {
            String localName = reader.getLocalName();
            String namespace = reader.getNamespaceURI();
            // TODO check this stuff again before setting in production
            if (namespace != null) {
                if (writer.getPrefix(namespace) != null) {
                    writer.writeStartElement(namespace, localName);
                } else {
                    writer.writeStartElement(reader.getPrefix(), localName, namespace);
                }
            } else {
                writer.writeStartElement(localName);
            }
            // first: namespace definition attributes
            if(reader.getNamespaceCount() > 0) {
                int namespaces = reader.getNamespaceCount();
    
                for(int i = 0; i < namespaces; i++) {
                    String namespaceURI = reader.getNamespaceURI(i);
    
                    if(writer.getPrefix(namespaceURI) == null) {
                        String namespacePrefix = reader.getNamespacePrefix(i);
    
                        if(namespacePrefix == null) {
                            writer.writeDefaultNamespace(namespaceURI);
                        } else {
                            writer.writeNamespace(namespacePrefix, namespaceURI);
                        }
                    }
                }
            }
            int attributes = reader.getAttributeCount();
    
            // the write the rest of the attributes
            for (int i = 0; i < attributes; i++) {
                String attributeNamespace = reader.getAttributeNamespace(i);
                if (attributeNamespace != null && attributeNamespace.length() != 0) {
                    writer.writeAttribute(attributeNamespace, reader.getAttributeLocalName(i), reader.getAttributeValue(i));
                } else {
                    writer.writeAttribute(reader.getAttributeLocalName(i), reader.getAttributeValue(i));
                }
            }
        } else if (event == XMLStreamConstants.END_ELEMENT) {
            writer.writeEndElement();
        } else if (event == XMLStreamConstants.CDATA) {
            String array = reader.getText();
            writer.writeCData(array);
        } else if (event == XMLStreamConstants.COMMENT) {
            String array = reader.getText();
            writer.writeComment(array);
        } else if (event == XMLStreamConstants.CHARACTERS) {
            String array = reader.getText();
            if (array.length() > 0 && !reader.isWhiteSpace()) {
                writer.writeCharacters(array);
            }
        } else if (event == XMLStreamConstants.START_DOCUMENT) {
            writer.writeStartDocument();
        } else if (event == XMLStreamConstants.END_DOCUMENT) {
            writer.writeEndDocument();
        }
    }
    
    private static void copySubTree(XMLStreamReader reader, XMLStreamWriter writer) throws XMLStreamException {
        reader.require(XMLStreamConstants.START_ELEMENT, null, null);
    
        copyEvent(XMLStreamConstants.START_ELEMENT, reader, writer);
    
        int level = 1;
        do {
            int event = reader.next();
            if(event == XMLStreamConstants.START_ELEMENT) {
                level++;
            } else if(event == XMLStreamConstants.END_ELEMENT) {
                level--;
            }
    
            copyEvent(event, reader, writer);
        } while(level > 0);
    
    }
    
    private static void parseSubTree(XMLStreamReader reader) throws XMLStreamException {
    
        int level = 1;
        do {
            int event = reader.next();
            if(event == XMLStreamConstants.START_ELEMENT) {
                level++;
                // do stateful stuff here
    
                // for child logic:
                if(reader.getLocalName().equals("Whatever")) {
                    parseSubTreeForWhatever(reader);
                    level --; // read from level 1 to 0 in submethod.
                }
    
                // alternatively, faster
                if(level == 4) {
                    parseSubTreeForWhateverAtRelativeLevel4(reader);
                    level --; // read from level 1 to 0 in submethod.
                }
    
    
            } else if(event == XMLStreamConstants.END_ELEMENT) {
                level--;
                // do stateful stuff here, too
            }
    
        } while(level > 0);
    
    }
    
    import com.ximpleware.*;
    import java.io.*;
    public class gandalf {
        public  static void main(String a[]) throws VTDException, Exception{
            VTDGen vg = new VTDGen();
            if (vg.parseFile("c:\\xml\\gandalf.txt", false)){
                VTDNav vn=vg.getNav();
                AutoPilot ap = new AutoPilot(vn);
                ap.selectXPath("/company/staff[nickname]");
                int i=-1;
                int count=0;
                while((i=ap.evalXPath())!=-1){
                    vn.dumpFragment("c:\\xml\\staff"+count+".xml");
                    count++;
                }
            }
        }
    
    }