Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/375.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 如何索引pdf';他对索尔杰满意吗?_Java_Solr_Solr Cell - Fatal编程技术网

Java 如何索引pdf';他对索尔杰满意吗?

Java 如何索引pdf';他对索尔杰满意吗?,java,solr,solr-cell,Java,Solr,Solr Cell,我正在尝试使用SolrJ为一些pdf文档编制索引,如中所述,下面是代码: import static org.apache.solr.handler.extraction.ExtractingParams.LITERALS_PREFIX; import static org.apache.solr.handler.extraction.ExtractingParams.MAP_PREFIX; import static org.apache.solr.handler.extraction.Ext

我正在尝试使用SolrJ为一些pdf文档编制索引,如中所述,下面是代码:

import static org.apache.solr.handler.extraction.ExtractingParams.LITERALS_PREFIX;
import static org.apache.solr.handler.extraction.ExtractingParams.MAP_PREFIX;
import static org.apache.solr.handler.extraction.ExtractingParams.UNKNOWN_FIELD_PREFIX;

import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
import org.apache.solr.client.solrj.request.AbstractUpdateRequest;
import org.apache.solr.client.solrj.request.ContentStreamUpdateRequest;
import org.apache.solr.common.util.NamedList;
...
public static void indexFilesSolrCell(String fileName) throws IOException, SolrServerException {

  String urlString = "http://localhost:8080/solr"; 
  SolrServer server = new CommonsHttpSolrServer(urlString);

  ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract");
  up.addFile(new File(fileName));
  String id = fileName.substring(fileName.lastIndexOf('/')+1);
  System.out.println(id);

  up.setParam(LITERALS_PREFIX + "id", id);
  up.setParam(LITERALS_PREFIX + "location", fileName); // this field doesn't exists in schema.xml, it'll be created as attr_location
  up.setParam(UNKNOWN_FIELD_PREFIX, "attr_");
  up.setParam(MAP_PREFIX + "content", "attr_content");
  up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);

  NamedList<Object> request = server.request(up);
  for(Entry<String, Object> entry : request){
    System.out.println(entry.getKey());
    System.out.println(entry.getValue());
  }
}
import static org.apache.solr.handler.extraction.ExtractingParams.LITERALS\u前缀;
导入静态org.apache.solr.handler.extraction.ExtractingParams.MAP_前缀;
导入静态org.apache.solr.handler.extraction.ExtractingParams.UNKNOWN_FIELD_前缀;
导入org.apache.solr.client.solrj.SolrServer;
导入org.apache.solr.client.solrj.SolrServerException;
导入org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
导入org.apache.solr.client.solrj.request.AbstractUpdateRequest;
导入org.apache.solr.client.solrj.request.ContentStreamUpdateRequest;
导入org.apache.solr.common.util.NamedList;
...
公共静态void indexFilesSolrCell(字符串文件名)引发IOException、SolrServerException{
字符串URL字符串=”http://localhost:8080/solr"; 
SolrServer服务器=新的CommonHttpSolrServer(urlString);
ContentStreamUpdateRequest up=新建ContentStreamUpdateRequest(“/update/extract”);
up.addFile(新文件(文件名));
String id=fileName.substring(fileName.lastIndexOf('/')+1);
系统输出打印项次(id);
setParam(文字+前缀+id”,id);
up.setParam(LITERALS\u PREFIX+“location”,fileName);//schema.xml中不存在此字段,它将被创建为attr\u location
up.setParam(未知字段前缀“attr”);
setParam(映射前缀+“内容”,“属性内容”);
setAction(AbstractUpdateRequest.ACTION.COMMIT,true,true);
NamedList request=server.request(向上);
for(条目:请求){
System.out.println(entry.getKey());
System.out.println(entry.getValue());
}
}
不幸的是,在查询*:*时,我得到了索引文档的列表,但内容字段为空。如何更改上面的代码以同时提取文档的内容

下面是xml框架,它描述了:


/home/alex/Documents/lsp.pdf
流大小
31203
内容类型
申请表格/pdf
31203
申请表格/pdf
lsp.pdf

我不认为这个问题与Apache Tika的错误安装有关,因为以前我有一些ServerException,但现在我已经在正确的路径上安装了所需的JAR。此外,我还尝试使用同一类为txt文件编制索引,但属性内容字段始终为空。

在schema.xml文件中,您是否在内容字段中设置了“stored=true”,这是我的schema.xml文件的一个示例,用于存储pdf和其他二进制文件的内容

这对你有帮助吗

赫克托

<doc>
  <arr name="attr_content">
    <str>            </str>
  </arr>
  <arr name="attr_location">
    <str>/home/alex/Documents/lsp.pdf</str>
  </arr>
  <arr name="attr_meta">
    <str>stream_size</str>
    <str>31203</str>
    <str>Content-Type</str>
    <str>application/pdf</str>
  </arr>
  <arr name="attr_stream_size">
    <str>31203</str>
  </arr>
  <arr name="content_type">
    <str>application/pdf</str>
  </arr>
  <str name="id">lsp.pdf</str>
</doc>