Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/379.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 如何获取XML文件的特定信息_Java_Parsing_Xlm - Fatal编程技术网

Java 如何获取XML文件的特定信息

Java 如何获取XML文件的特定信息,java,parsing,xlm,Java,Parsing,Xlm,我有一个很大的XML文件,下面是它的摘录: ... <LexicalEntry id="Ait~ifAq_1"> <Lemma partOfSpeech="n" writtenForm="اِتِّفاق"/> <Sense id="Ait~ifAq_1_tawaAfuq_n1AR" synset="tawaAfuq_n1AR"/> <WordForm formType="root" writtenForm="وفق"/> </L

我有一个很大的
XML
文件,下面是它的摘录:

...
<LexicalEntry id="Ait~ifAq_1">
  <Lemma partOfSpeech="n" writtenForm="اِتِّفاق"/>
  <Sense id="Ait~ifAq_1_tawaAfuq_n1AR" synset="tawaAfuq_n1AR"/>
  <WordForm formType="root" writtenForm="وفق"/>
</LexicalEntry>
<LexicalEntry id="tawaA&amp;um__1">
  <Lemma partOfSpeech="n" writtenForm="تَوَاؤُم"/>
  <Sense id="tawaA&amp;um__1_AinosijaAm_n1AR" synset="AinosijaAm_n1AR"/>
  <WordForm formType="root" writtenForm="وأم"/>
</LexicalEntry>    
<LexicalEntry id="tanaAgum_2">
  <Lemma partOfSpeech="n" writtenForm="تناغُم"/>
  <Sense id="tanaAgum_2_AinosijaAm_n1AR" synset="AinosijaAm_n1AR"/>
  <WordForm formType="root" writtenForm="نغم"/>
</LexicalEntry>


<Synset baseConcept="3" id="tawaAfuq_n1AR">
  <SynsetRelations>
    <SynsetRelation relType="hyponym" targets="AinosijaAm_n1AR"/>
    <SynsetRelation relType="hyponym" targets="AinosijaAm_n1AR"/>
    <SynsetRelation relType="hypernym" targets="ext_noun_NP_420"/>
  </SynsetRelations>
  <MonolingualExternalRefs>
    <MonolingualExternalRef externalReference="13971065-n" externalSystem="PWN30"/>
  </MonolingualExternalRefs>
</Synset>
...

解决方案之一是由于内存消耗而使用流读取器。但我不知道我该如何得到我想要的。请帮帮我。

如果此XML文件太大,无法在内存中表示,请使用SAX

您需要编写SAX解析器来维护位置。为此,我通常使用一个StringBuffer,但是一堆字符串也可以很好地工作。这一部分很重要,因为它将允许您跟踪返回文档根目录的路径,这将允许您了解在给定时间点您在文档中的位置(在尝试仅提取少量信息时很有用)

主逻辑流如下所示:

 1. When entering a node, add the node's name to the stack.
 2. When exiting a node, pop the node's name (top element) off the stack.
 3. To know your location, read your current branch of the XML from the bottom of the stack to the top of the stack.
 4. When entering a region you care about, clear the buffer you will capture the characters into
 5. When exiting a region you care about, flush the buffer into the data structure you will return back as your output.

通过这种方式,您可以有效地跳过XML树中您不关心的所有分支。

如果此XML文件太大,无法在内存中表示,请使用SAX

您需要编写SAX解析器来维护位置。为此,我通常使用一个StringBuffer,但是一堆字符串也可以很好地工作。这一部分很重要,因为它将允许您跟踪返回文档根目录的路径,这将允许您了解在给定时间点您在文档中的位置(在尝试仅提取少量信息时很有用)

主逻辑流如下所示:

 1. When entering a node, add the node's name to the stack.
 2. When exiting a node, pop the node's name (top element) off the stack.
 3. To know your location, read your current branch of the XML from the bottom of the stack to the top of the stack.
 4. When entering a region you care about, clear the buffer you will capture the characters into
 5. When exiting a region you care about, flush the buffer into the data structure you will return back as your output.

通过这种方式,您可以有效地跳过XML树中您不关心的所有分支。

SAX解析器不同于DOM解析器。它只查看当前的
,在将来的项成为当前的
项之前,它无法查看这些项。当XML文件非常大时,可以使用它。取而代之的是很多人。举几个例子:

  • SAX
    解析器
  • DOM
    解析器
  • JDOM
    解析器
  • DOM4J
    PARSER
  • STAX
    PARSER
你可以找到所有这些教程

在我看来,学习后,直接使用
DOM4J
JDOM
进行商业产品

SAX
解析器的逻辑是,您有一个
MyHandler
类,它扩展了
DefaultHandler
@覆盖了它的一些方法:

XML文件:

<?xml version="1.0"?>
<class>
   <student rollno="393">
      <firstname>dinkar</firstname>
      <lastname>kad</lastname>
      <nickname>dinkar</nickname>
      <marks>85</marks>
   </student>
   <student rollno="493">
      <firstname>Vaneet</firstname>
      <lastname>Gupta</lastname>
      <nickname>vinni</nickname>
      <marks>95</marks>
   </student>
   <student rollno="593">
      <firstname>jasvir</firstname>
      <lastname>singn</lastname>
      <nickname>jazz</nickname>
      <marks>90</marks>
   </student>
</class>
主类类:

import java.io.File;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class SAXParserDemo {
   public static void main(String[] args){

      try { 
         File inputFile = new File("input.txt");
         SAXParserFactory factory = SAXParserFactory.newInstance();
         SAXParser saxParser = factory.newSAXParser();
         UserHandler userhandler = new UserHandler();
         saxParser.parse(inputFile, userhandler);     
      } catch (Exception e) {
         e.printStackTrace();
      }
   }   
}

SAX解析器不同于DOM解析器。它只查看当前的
,在将来的项成为当前的
项之前,它无法查看这些项。当XML文件非常大时,可以使用它。取而代之的是很多人。举几个例子:

  • SAX
    解析器
  • DOM
    解析器
  • JDOM
    解析器
  • DOM4J
    PARSER
  • STAX
    PARSER
你可以找到所有这些教程

在我看来,学习后,直接使用
DOM4J
JDOM
进行商业产品

SAX
解析器的逻辑是,您有一个
MyHandler
类,它扩展了
DefaultHandler
@覆盖了它的一些方法:

XML文件:

<?xml version="1.0"?>
<class>
   <student rollno="393">
      <firstname>dinkar</firstname>
      <lastname>kad</lastname>
      <nickname>dinkar</nickname>
      <marks>85</marks>
   </student>
   <student rollno="493">
      <firstname>Vaneet</firstname>
      <lastname>Gupta</lastname>
      <nickname>vinni</nickname>
      <marks>95</marks>
   </student>
   <student rollno="593">
      <firstname>jasvir</firstname>
      <lastname>singn</lastname>
      <nickname>jazz</nickname>
      <marks>90</marks>
   </student>
</class>
主类类:

import java.io.File;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class SAXParserDemo {
   public static void main(String[] args){

      try { 
         File inputFile = new File("input.txt");
         SAXParserFactory factory = SAXParserFactory.newInstance();
         SAXParser saxParser = factory.newSAXParser();
         UserHandler userhandler = new UserHandler();
         saxParser.parse(inputFile, userhandler);     
      } catch (Exception e) {
         e.printStackTrace();
      }
   }   
}
就是为这个设计的。Java在包中提供了对它的支持

要执行所需操作,代码将如下所示:

List<String> findRelations(String word,
                           Path xmlFile)
throws XPathException {

    String xmlLocation = xmlFile.toUri().toASCIIString();

    XPath xpath = XPathFactory.newInstance().newXPath();

    xpath.setXPathVariableResolver(
        name -> (name.getLocalPart().equals("word") ? word : null));
    String id = xpath.evaluate(
        "//LexicalEntry[WordForm/@writtenForm=$word or Lemma/@writtenForm=$word]/Sense/@synset",
        new InputSource(xmlLocation));

    xpath.setXPathVariableResolver(
        name -> (name.getLocalPart().equals("id") ? id : null));
    NodeList matches = (NodeList) xpath.evaluate(
        "//Synset[@id=$id]/SynsetRelations/SynsetRelation",
        new InputSource(xmlLocation),
        XPathConstants.NODESET);

    List<String> relations = new ArrayList<>();

    int matchCount = matches.getLength();
    for (int i = 0; i < matchCount; i++) {
        Element match = (Element) matches.item(i);

        String relType = match.getAttribute("relType");
        String synset = match.getAttribute("targets");

        xpath.setXPathVariableResolver(
            name -> (name.getLocalPart().equals("synset") ? synset : null));
        NodeList formNodes = (NodeList) xpath.evaluate(
            "//LexicalEntry[Sense/@synset=$synset]/WordForm/@writtenForm",
            new InputSource(xmlLocation),
            XPathConstants.NODESET);

        int formCount = formNodes.getLength();
        StringJoiner forms = new StringJoiner(",");
        for (int j = 0; j < formCount; j++) {
            forms.add(
                formNodes.item(j).getNodeValue());
        }

        relations.add(
            String.format("%s %s %s", word, relType, forms));
    }

    return relations;
}
匹配XML文档中任何包含以下内容的
元素:

  • 具有writenform属性的WordForm子级,其值等于
    word
    变量
  • 具有writenform属性的引理子级,其值等于
    word
    变量
对于每个这样的
元素,返回作为
元素的直接子元素的任何
元素的
synset
属性的值

在计算xpath表达式之前,
word
变量由
xpath.setXPathVariableResolver
外部定义

//Synset[@id=$id]/SynsetRelations/SynsetRelations
匹配XML文档中
id
属性等于
id
变量的任何
元素。对于每个这样的
元素,查找任何direct SynsetRelations子元素,并返回其每个direct SynsetRelations子元素

在计算xpath表达式之前,
id
变量由
xpath.setXPathVariableResolver
外部定义

//LexicalEntry[Sense/@synset=$synset]/WordForm/@writenform
匹配XML文档中具有子元素的任何
元素,该子元素具有值与
synset
变量相同的
synset
属性。对于每个匹配的元素,找到任何
子元素并返回该元素的
writenform
属性

在计算xpath表达式之前,
synset
变量由
xpath.setXPathVariableResolver
外部定义


从逻辑上讲,上述内容应该是:

  • 找到请求字的synset值
  • 使用synset值查找SynsetRelation元素
  • 找到对应于每个匹配SynsetRelation的目标值的writtenForm值
    • 正是为此而设计的。Java在包中提供了对它的支持

      要执行所需操作,代码将如下所示:

      List<String> findRelations(String word,
                                 Path xmlFile)
      throws XPathException {
      
          String xmlLocation = xmlFile.toUri().toASCIIString();
      
          XPath xpath = XPathFactory.newInstance().newXPath();
      
          xpath.setXPathVariableResolver(
              name -> (name.getLocalPart().equals("word") ? word : null));
          String id = xpath.evaluate(
              "//LexicalEntry[WordForm/@writtenForm=$word or Lemma/@writtenForm=$word]/Sense/@synset",
              new InputSource(xmlLocation));
      
          xpath.setXPathVariableResolver(
              name -> (name.getLocalPart().equals("id") ? id : null));
          NodeList matches = (NodeList) xpath.evaluate(
              "//Synset[@id=$id]/SynsetRelations/SynsetRelation",
              new InputSource(xmlLocation),
              XPathConstants.NODESET);
      
          List<String> relations = new ArrayList<>();
      
          int matchCount = matches.getLength();
          for (int i = 0; i < matchCount; i++) {
              Element match = (Element) matches.item(i);
      
              String relType = match.getAttribute("relType");
              String synset = match.getAttribute("targets");
      
              xpath.setXPathVariableResolver(
                  name -> (name.getLocalPart().equals("synset") ? synset : null));
              NodeList formNodes = (NodeList) xpath.evaluate(
                  "//LexicalEntry[Sense/@synset=$synset]/WordForm/@writtenForm",
                  new InputSource(xmlLocation),
                  XPathConstants.NODESET);
      
              int formCount = formNodes.getLength();
              StringJoiner forms = new StringJoiner(",");
              for (int j = 0; j < formCount; j++) {
                  forms.add(
                      formNodes.item(j).getNodeValue());
              }
      
              relations.add(
                  String.format("%s %s %s", word, relType, forms));
          }
      
          return relations;
      }
      
      匹配XML文档中任何包含以下内容的
      元素:

      • 具有writenform属性的WordForm子级,其值等于
        word
        变量
      • 具有writenformattri的引理子