Java 使用ApacheTika获取页面的内容、关键字和标题_Java

Java 使用ApacheTika获取页面的内容、关键字和标题

java

Java 使用ApacheTika获取页面的内容、关键字和标题,java,Java,这个代码有什么问题吗。。如果我在Ti t=new Ti（）下面添加这一行（String c=t.parsetString（content）；）然后我得到url的实际内容，但之后我得到关键字、标题和作者的空值。如果我删除这一行（String c=t.parsetString（content）；），那么我将得到标题、作者和关键字的实际值。。为什么会这样 HttpGet request = new HttpGet("http://xyz.com/d/index.html"); Htt

这个代码有什么问题吗。。如果我在

Ti t=new Ti（）下面添加这一行（String c=t.parsetString（content）；
）

然后我得到url的实际内容，但之后我得到关键字、标题和作者的空值。如果我删除这一行（

String c=t.parsetString（content）；

），那么我将得到标题、作者和关键字的实际值。。为什么会这样

HttpGet request = new HttpGet("http://xyz.com/d/index.html");

        HttpResponse response = client.execute(request);
        HttpEntity entity = response.getEntity();
        InputStream content = entity.getContent();
        System.out.println(content)    

        Ti t = new Ti();
        String ct= t.parseToString(content);
        System.out.println(ct);

        Metadata md = new Metadata();



        Reader r = t.parse(content, md);
        System.out.println(md);


        System.out.println("Keywords: " +md.get("keywords"));
        System.out.println("Title: " +md.get("title"));
        System.out.println("Authors: " +md.get("authors"));

您正在多次读取同一个流。完全读取流后，将无法再次读取。像这样做

HttpResponse response = client.execute(request);
HttpEntity entity = response.getEntity();

//http://stackoverflow.com/questions/1264709/convert-inputstream-to-byte-in-java
byte[] content = streamToByteArray(entity.getContent());

String ct = t.parseToString(new ByteArrayInputStream(content));
System.out.println(ct);

Metadata md = new Metadata();
Reader r = t.parse(new ByteArrayInputStream(content), md);
System.out.println(md);

你的代码把我弄糊涂了。。这里的内容是什么？？在哪里我们可以使用内容..我必须创建一个方法

streamToByteArray

，或者我必须包含一些东西。。因为我在这件事上犯了错误。。。