Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/xml/14.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/unity3d/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java ApacheTika:如何使用XPath查询_Java_Xml_Xpath_Apache Tika - Fatal编程技术网

Java ApacheTika:如何使用XPath查询

Java ApacheTika:如何使用XPath查询,java,xml,xpath,apache-tika,Java,Xml,Xpath,Apache Tika,我正在使用ApacheTika解析一个XML文件。我想从XML中提取某些标记及其内容,并将它们存储在HashMap中。现在,我可以提取XML的全部内容,但是标记丢失了 //detecting the file type BodyContentHandler handler = new BodyContentHandler(); Metadata metadata = new Metadata(); FileInputStream inputstream = null; try

我正在使用ApacheTika解析一个XML文件。我想从XML中提取某些标记及其内容,并将它们存储在HashMap中。现在,我可以提取XML的全部内容,但是标记丢失了

  //detecting the file type
  BodyContentHandler handler = new BodyContentHandler();

  Metadata metadata = new Metadata();
  FileInputStream inputstream = null;

try 
{
    inputstream = new FileInputStream(new File(ParseXML.class.getClassLoader().getResource("xml/a.xml").toURI()));
}
catch (URISyntaxException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

ParseContext pcontext = new ParseContext();

  //Xml parser
  XMLParser xmlparser = new XMLParser(); 
  xmlparser.parse(inputstream, handler, metadata, pcontext);
  System.out.println("Contents of the document:" + handler.toString());
  System.out.println("Metadata of the document:");
  String[] metadataNames = metadata.names();

  for(String name : metadataNames) {
     System.out.println(name + ": " + metadata.get(name));

  }
这显示了XML的全部内容

现在,我想提取XML的某些部分,因为Tika允许XPath查询,所以我尝试了这个方法

XPathParser xhtmlParser = new XPathParser("xhtml", XHTMLContentHandler.XHTML);
      Matcher divContentMatcher = xhtmlParser.parse("/Product/Source/Publisher/PublisherName[@nameType='Person']");
      ContentHandler xhandler = new MatchingContentHandler(
              new ToXMLContentHandler(), divContentMatcher);

      AutoDetectParser parser = new AutoDetectParser();
      Metadata xmetadata = new Metadata();
      try  (FileInputStream stream = new FileInputStream(new File(ParseXML.class.getClassLoader().getResource("xml/a.xml").toURI()))) {
          parser.parse(stream, xhandler, xmetadata);
          System.out.println(xhandler.toString());
      } catch (URISyntaxException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
   }
但它没有显示任何输出!我希望它只提供XQuery中指定的节点

知道发生了什么吗

顺便说一下,下面是相应的XML

<Product productID="xvc22" shortProductID="x" language="en">
  <ProductStatus statusType="Published" /> 
   <Source>
  <Publisher sequence="1" primaryIndicator="Yes">
  <PublisherID idType="Shortname">jjkjkj</PublisherID> 
  <PublisherID idType="BM">6666</PublisherID> 
  <PublisherName nameType="Legal">ABT</PublisherName> 
  <PublisherName nameType="Person">
  <LastName>pppp</LastName> 
  <FirstName>lkkk</FirstName> 
  </PublisherName>
  </Publisher>
  </Source>
  </Product>
这是打印出来的

pppp
lkkk

这是完美的。那么为什么Tika不能解析XPath查询呢?

您似乎在向Tika询问文档的纯文本版本,这也难怪标记会被删除。如果您向Tika索要文档的XHTML版本,会发生什么情况?谢谢,请查看编辑。这就是你说的吗?请看编辑。我做了一些改变
      DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
      DocumentBuilder builder = factory.newDocumentBuilder();
      Document doc = builder.parse(new File(ParseXML.class.getClassLoader().getResource("xml/a.xml").toURI()));
      XPathFactory xPathfactory = XPathFactory.newInstance();
      XPath xpath = xPathfactory.newXPath();
      XPathExpression expr = xpath.compile("/Product/Source/Publisher/PublisherName[@nameType='Person']");

      System.out.println(expr.evaluate(doc, XPathConstants.STRING));
pppp
lkkk