Java 针对此特定场景,使用XPath、SAX或DOM从XML文件提取值
我目前正在从事一个学术项目,在Java 针对此特定场景,使用XPath、SAX或DOM从XML文件提取值,java,xml,dom,xpath,sax,Java,Xml,Dom,Xpath,Sax,我目前正在从事一个学术项目,在Java和XML中开发。实际任务是解析XML,最好在HashMap中传递所需的值,以便进一步处理。下面是实际XML的简短片段 <root> <BugReport ID = "1"> <Title>"(495584) Firefox - search suggestions passes wrong previous result to form history"</Title> <Turn&
Java
和XML
中开发。实际任务是解析XML
,最好在HashMap
中传递所需的值,以便进一步处理。下面是实际XML的简短片段
<root>
<BugReport ID = "1">
<Title>"(495584) Firefox - search suggestions passes wrong previous result to form history"</Title>
<Turn>
<Date>'2009-06-14 18:55:25'</Date>
<From>'Justin Dolske'</From>
<Text>
<Sentence ID = "3.1"> Created an attachment (id=383211) [details] Patch v.2</Sentence>
<Sentence ID = "3.2"> Ah. So, there's a ._formHistoryResult in the....</Sentence>
<Sentence ID = "3.3"> The simple fix it to just discard the service's form history result.</Sentence>
<Sentence ID = "3.4"> Otherwise it's trying to use a old form history result that no longer applies for the search string.</Sentence>
</Text>
</Turn>
<Turn>
<Date>'2009-06-19 12:07:34'</Date>
<From>'Gavin Sharp'</From>
<Text>
<Sentence ID = "4.1"> (From update of attachment 383211 [details])</Sentence>
<Sentence ID = "4.2"> Perhaps we should rename one of them to _fhResult just to reduce confusion?</Sentence>
</Text>
</Turn>
<Turn>
<Date>'2009-06-19 13:17:56'</Date>
<From>'Justin Dolske'</From>
<Text>
<Sentence ID = "5.1"> (In reply to comment #3)</Sentence>
<Sentence ID = "5.2"> &gt; (From update of attachment 383211 [details] [details])</Sentence>
<Sentence ID = "5.3"> &gt; Perhaps we should rename one of them to _fhResult just to reduce confusion?</Sentence>
<Sentence ID = "5.4"> Good point.</Sentence>
<Sentence ID = "5.5"> I renamed the one in the wrapper to _formHistResult. </Sentence>
<Sentence ID = "5.6"> fhResult seemed maybe a bit too short.</Sentence>
</Text>
</Turn>
.....
and so on
</BugReport>
“(495584)Firefox-搜索建议将错误的先前结果传递到历史记录中”
'2009-06-14 18:55:25'
“贾斯汀·多尔斯克”
已创建附件(id=383211)[详细信息]修补程序v.2
啊。因此,在……中有一个历史结果。。。。
简单的方法是将其修复为只丢弃服务的表单历史记录结果。
否则,它将尝试使用不再适用于搜索字符串的旧表单历史记录结果。
'2009-06-19 12:07:34'
“加文·夏普”
(来自附件383211的更新[详细信息])
也许我们应该将其中一个重命名为(fhu fhResult)以减少混淆??
'2009-06-19 13:17:56'
“贾斯汀·多尔斯克”
(回复第3条评论)
&;燃气轮机;(来自附件383211的更新[详情][详情])
&;燃气轮机;也许我们应该将其中一个重命名为(fhu fhResult)以减少混淆??
说得好。
我将包装中的一个重命名为_formHistResult。
结果似乎有点太短了。
.....
等等
有很多评论者,比如“Justin Dolske”,他们对这份报告发表了评论,而我真正想要的是评论者列表和他们在整个XML文件中写的所有句子。类似于if(from==justin dolske)gethisall句子()
。其他评论者(所有人)也是如此。我尝试了许多不同的方法,仅为“Justin dolske”或其他评论者获取句子,即使是使用XPath
、SAX
和DOM
的通用形式,但都失败了。我对这些技术非常陌生,包括JAVA,任何人都不知道如何实现它
有人能具体指导我如何使用上述任何一种技术,或者有没有其他更好的策略
(注意:稍后我想把它放在一个hashmap
中,比如hashmap(key,value)
其中key=name
是评论者(贾斯汀·多尔斯克)的注释,value是(所有句子))
非常感谢您的紧急帮助。我建议使用JAXB创建反映您的XML结构的数据模型 完成后,可以将XML加载到Java实例中 使用
Turn.From
作为键,将每个“Turn”放入Map
完成后,您可以编写:
ListjustinsTurn=allTurns.get(“'Justin Dolske'”)代码>有几种方法可以实现您的需求
- 一种方法是使用。网上有几个教程,请随意参考
- 您还可以考虑创建一个DOM,然后从中提取数据,然后将其放入HashMap中
一个参考实现如下所示:
import java.io.File;
import java.util.ArrayList;
import java.util.HashMap;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
public class XMLReader {
private HashMap<String,ArrayList<String>> namesSentencesMap;
public XMLReader() {
namesSentencesMap = new HashMap<String, ArrayList<String>>();
}
private Document getDocument(String fileName){
Document document = null;
try{
document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new File(fileName));
}catch(Exception exe){
//handle exception
}
return document;
}
private void buildNamesSentencesMap(Document document){
if(document == null){
return;
}
//Get each Turn block
NodeList turnList = document.getElementsByTagName("Turn");
String fromName = null;
NodeList sentenceNodeList = null;
for(int turnIndex = 0; turnIndex < turnList.getLength(); turnIndex++){
Element turnElement = (Element)turnList.item(turnIndex);
//Assumption: <From> element
Element fromElement = (Element) turnElement.getElementsByTagName("From").item(0);
fromName = fromElement.getTextContent();
//Extracting sentences - First check whether the map contains
//an ArrayList corresponding to the name. If yes, then use that,
//else create a new one
ArrayList<String> sentenceList = namesSentencesMap.get(fromName);
if(sentenceList == null){
sentenceList = new ArrayList<String>();
}
//Extract sentences from the Turn node
try{
sentenceNodeList = turnElement.getElementsByTagName("Sentence");
for(int sentenceIndex = 0; sentenceIndex < sentenceNodeList.getLength(); sentenceIndex++){
sentenceList.add(((Element)sentenceNodeList.item(sentenceIndex)).getTextContent());
}
}finally{
sentenceNodeList = null;
}
//Put the list back in the map
namesSentencesMap.put(fromName, sentenceList);
}
}
public static void main(String[] args) {
XMLReader reader = new XMLReader();
reader.buildNamesSentencesMap(reader.getDocument("<your_xml_file>"));
for(String names: reader.namesSentencesMap.keySet()){
System.out.println("Name: "+names+"\tTotal Sentences: "+reader.namesSentencesMap.get(names).size());
}
}
}
导入java.io.File;
导入java.util.ArrayList;
导入java.util.HashMap;
导入javax.xml.parsers.DocumentBuilderFactory;
导入org.w3c.dom.Document;
导入org.w3c.dom.Element;
导入org.w3c.dom.NodeList;
公共类XMLReader{
私有HashMap名称语句映射;
公共XMLReader(){
NameSentencesMap=新HashMap();
}
私有文档getDocument(字符串文件名){
单据=空;
试一试{
document=DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(新文件(文件名));
}捕获(异常exe){
//处理异常
}
归还文件;
}
私有void buildNamesSentenceMap(文档){
if(document==null){
返回;
}
//找到每个转弯处
NodeList turnList=document.getElementsByTagName(“Turn”);
字符串fromName=null;
NodeList语句NodeList=null;
对于(int turnIndex=0;turnIndex
注意:这只是一个演示,您需要修改它以适应需要