Java 针对此特定场景，使用XPath、SAX或DOM从XML文件提取值_Java_Xml_Dom_Xpath_Sax

Java 针对此特定场景，使用XPath、SAX或DOM从XML文件提取值

java xml dom xpath

Java 针对此特定场景，使用XPath、SAX或DOM从XML文件提取值,java,xml,dom,xpath,sax,Java,Xml,Dom,Xpath,Sax,我目前正在从事一个学术项目，在Java和XML中开发。实际任务是解析XML，最好在HashMap中传递所需的值，以便进一步处理。下面是实际XML的简短片段 <root> <BugReport ID = "1"> <Title>"(495584) Firefox - search suggestions passes wrong previous result to form history"</Title> <Turn&

我目前正在从事一个学术项目，在

Java

和

XML

中开发。实际任务是解析

XML

，最好在

HashMap

中传递所需的值，以便进一步处理。下面是实际XML的简短片段

<root>
  <BugReport ID = "1">
    <Title>"(495584) Firefox - search suggestions passes wrong previous result to form history"</Title>

    <Turn>
      <Date>'2009-06-14 18:55:25'</Date>
      <From>'Justin Dolske'</From>
      <Text>
        <Sentence ID = "3.1"> Created an attachment (id=383211) [details] Patch v.2</Sentence>
        <Sentence ID = "3.2"> Ah. So, there's a ._formHistoryResult in the....</Sentence>
        <Sentence ID = "3.3"> The simple fix it to just discard the service's form history result.</Sentence>
        <Sentence ID = "3.4"> Otherwise it's trying to use a old form history result that no longer applies for the search string.</Sentence>
      </Text>
    </Turn>

    <Turn>
      <Date>'2009-06-19 12:07:34'</Date>
      <From>'Gavin Sharp'</From>
      <Text>
        <Sentence ID = "4.1"> (From update of attachment 383211 [details])</Sentence>
        <Sentence ID = "4.2"> Perhaps we should rename one of them to _fhResult just to reduce confusion?</Sentence>
      </Text>
    </Turn>

    <Turn>
      <Date>'2009-06-19 13:17:56'</Date>
      <From>'Justin Dolske'</From>
      <Text>
        <Sentence ID = "5.1"> (In reply to comment #3)</Sentence>
        <Sentence ID = "5.2"> &amp;gt; (From update of attachment 383211 [details] [details])</Sentence> 
        <Sentence ID = "5.3"> &amp;gt; Perhaps we should rename one of them to _fhResult just to reduce confusion?</Sentence>
        <Sentence ID = "5.4"> Good point.</Sentence>
        <Sentence ID = "5.5"> I renamed the one in the wrapper to _formHistResult. </Sentence>
        <Sentence ID = "5.6"> fhResult seemed maybe a bit too short.</Sentence>
      </Text>
    </Turn>

  .....
  and so on
</BugReport>


“（495584）Firefox-搜索建议将错误的先前结果传递到历史记录中”
'2009-06-14 18:55:25'
“贾斯汀·多尔斯克”
已创建附件（id=383211）[详细信息]修补程序v.2
啊。因此，在……中有一个历史结果。。。。
简单的方法是将其修复为只丢弃服务的表单历史记录结果。
否则，它将尝试使用不再适用于搜索字符串的旧表单历史记录结果。
'2009-06-19 12:07:34'
“加文·夏普”
（来自附件383211的更新[详细信息]）
也许我们应该将其中一个重命名为(fhu fhResult)以减少混淆?？
'2009-06-19 13:17:56'
“贾斯汀·多尔斯克”
（回复第3条评论）
&；燃气轮机；（来自附件383211的更新[详情][详情]）
&；燃气轮机；也许我们应该将其中一个重命名为(fhu fhResult)以减少混淆?？
说得好。
我将包装中的一个重命名为_formHistResult。
结果似乎有点太短了。
.....
等等

有很多评论者，比如“Justin Dolske”，他们对这份报告发表了评论，而我真正想要的是评论者列表和他们在整个XML文件中写的所有句子。类似于

if（from==justin dolske）gethisall句子（）

。其他评论者（所有人）也是如此。我尝试了许多不同的方法，仅为“Justin dolske”或其他评论者获取句子，即使是使用

XPath

、

SAX

和

DOM

的通用形式，但都失败了。我对这些技术非常陌生，包括JAVA，任何人都不知道如何实现它

有人能具体指导我如何使用上述任何一种技术，或者有没有其他更好的策略

（注意：稍后我想把它放在一个

hashmap

中，比如

hashmap（key，value）

其中

key=name

是评论者（贾斯汀·多尔斯克）的注释，value是（所有句子））

非常感谢您的紧急帮助。

我建议使用JAXB创建反映您的XML结构的数据模型

完成后，可以将XML加载到Java实例中

使用

Turn.From

作为键，将每个“Turn”放入

Map

完成后，您可以编写：

ListjustinsTurn=allTurns.get（“'Justin Dolske'”）
有几种方法可以实现您的需求

一种方法是使用。网上有几个教程，请随意参考
您还可以考虑创建一个DOM，然后从中提取数据，然后将其放入HashMap中

一个参考实现如下所示：
import java.io.File;
import java.util.ArrayList;
import java.util.HashMap;

import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;

public class XMLReader {

    private HashMap<String,ArrayList<String>> namesSentencesMap;

    public XMLReader() {
        namesSentencesMap = new HashMap<String, ArrayList<String>>();
    }

    private Document getDocument(String fileName){
        Document document = null;

        try{
            document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new File(fileName));
        }catch(Exception exe){
            //handle exception
        }

        return document;
    }

    private void buildNamesSentencesMap(Document document){
        if(document == null){
            return;
        }

        //Get each Turn block
        NodeList turnList = document.getElementsByTagName("Turn");
        String fromName = null;

        NodeList sentenceNodeList = null;
        for(int turnIndex = 0; turnIndex < turnList.getLength(); turnIndex++){
            Element turnElement = (Element)turnList.item(turnIndex);
            //Assumption: <From> element
            Element fromElement = (Element) turnElement.getElementsByTagName("From").item(0); 
            fromName = fromElement.getTextContent();
            //Extracting sentences - First check whether the map contains 
            //an ArrayList corresponding to the name. If yes, then use that,  
            //else create a new one                                              
            ArrayList<String> sentenceList = namesSentencesMap.get(fromName);
            if(sentenceList == null){
                sentenceList = new ArrayList<String>();
            }
            //Extract sentences from the Turn node
            try{
                sentenceNodeList = turnElement.getElementsByTagName("Sentence");
                for(int sentenceIndex = 0; sentenceIndex < sentenceNodeList.getLength(); sentenceIndex++){
                    sentenceList.add(((Element)sentenceNodeList.item(sentenceIndex)).getTextContent());
                }
            }finally{
                sentenceNodeList = null;
            }
            //Put the list back in the map                  
            namesSentencesMap.put(fromName, sentenceList);
        }
    }

    public static void main(String[] args) {
        XMLReader reader = new XMLReader();
        reader.buildNamesSentencesMap(reader.getDocument("<your_xml_file>"));

        for(String names: reader.namesSentencesMap.keySet()){
            System.out.println("Name: "+names+"\tTotal Sentences: "+reader.namesSentencesMap.get(names).size());
        }
    }
}

导入java.io.File；
导入java.util.ArrayList；
导入java.util.HashMap；
导入javax.xml.parsers.DocumentBuilderFactory；
导入org.w3c.dom.Document；
导入org.w3c.dom.Element；
导入org.w3c.dom.NodeList；
公共类XMLReader{
私有HashMap名称语句映射；
公共XMLReader（）{
NameSentencesMap=新HashMap（）；
}
私有文档getDocument（字符串文件名）{
单据=空；
试一试{
document=DocumentBuilderFactory.newInstance（）.newDocumentBuilder（）.parse（新文件（文件名））；
}捕获（异常exe）{
//处理异常
}
归还文件；
}
私有void buildNamesSentenceMap（文档）{
if（document==null）{
返回；
}
//找到每个转弯处
NodeList turnList=document.getElementsByTagName（“Turn”）；
字符串fromName=null；
NodeList语句NodeList=null；
对于（int turnIndex=0；turnIndex

注意：这只是一个演示，您需要修改它以适应需要