Java ApacheNutch2.3.1插件不起作用

Java ApacheNutch2.3.1插件不起作用,java,apache,plugins,solr,nutch,Java,Apache,Plugins,Solr,Nutch,我必须提取Apache Nutch 2.3.1默认情况下未提供的爬网数据的一些元数据信息。为此,我必须编写一个插件。为了学习的目的,我把它作为出发点。我知道本教程适用于1.x版本。我已经更改了所有必需的类,并成功地构建了它。以下是我遵循的步骤 创建一个类似$NUTCH_HOME/src/plugin/myPlugin的目录 将索引元数据复制到我的插件中,并创建一个文件myField.java cp-r索引元数据/*myPlugin/ 目录列表应该是 plugin/myplgin/plugin.x

我必须提取Apache Nutch 2.3.1默认情况下未提供的爬网数据的一些元数据信息。为此,我必须编写一个插件。为了学习的目的,我把它作为出发点。我知道本教程适用于1.x版本。我已经更改了所有必需的类,并成功地构建了它。以下是我遵循的步骤

  • 创建一个类似$NUTCH_HOME/src/plugin/myPlugin的目录
  • 将索引元数据复制到我的插件中,并创建一个文件myField.java cp-r索引元数据/*myPlugin/
  • 目录列表应该是
  • plugin/myplgin/plugin.xml应该是这样的
  • 我还在solr模式中添加了字段pageLength。根据我的期望,应该有一个新的字段pageLength和适当的值,但solr中没有字段

    问题在哪里?这是一个简单的玩具例子。 这是索引步骤的nutch日志文件(hadoop.log)输出

    2016-07-26 16:53:25,649 INFO  solr.SolrMappingReader - source: content dest: content
    2016-07-26 16:53:25,649 INFO  solr.SolrMappingReader - source: title dest: title
    2016-07-26 16:53:25,649 INFO  solr.SolrMappingReader - source: host dest: host
    2016-07-26 16:53:25,649 INFO  solr.SolrMappingReader - source: batchId dest: batchId
    2016-07-26 16:53:25,649 INFO  solr.SolrMappingReader - source: boost dest: boost
    2016-07-26 16:53:25,649 INFO  solr.SolrMappingReader - source: digest dest: digest
    2016-07-26 16:53:25,649 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp
    2016-07-26 16:53:25,649 INFO  solr.SolrMappingReader - source: pageLength dest: pageLength
    2016-07-26 16:53:26,140 INFO  solr.SolrIndexWriter - Total 1 document is added.
    2016-07-26 16:53:26,140 INFO  indexer.IndexingJob - IndexingJob: done.
    
    我如何确认插件是由nutch加载的?
    第二,在我将Nutch插件配置为Nutch进行爬行之前,有没有办法测试它?

    尝试更改plugin.xml中的扩展id。将其更改为“org.apache.nutch.indexer.AddField”,并重新构建nutch

    <extension id="org.apache.nutch.indexer.AddField"
           name="Add Field to Index"
           point="org.apache.nutch.indexer.IndexingFilter">
         <implementation id="myPlugin"
             class="org.apache.nutch.indexer.AddField"/>
    </extension>
    
    
    
    我认为这应该解决问题

    也只是为了验证控件是否进入插件类,或者是否在代码中添加一些信息日志,如

    LOG.info(“从插件打印”)
    若您能够在hadoop.log中看到这些日志,那个就意味着控制权将进入插件类

    <?xml version="1.0" encoding="UTF-8"?>
    <project name="myPlugin" default="jar">
      <import file="../build-plugin.xml"/>
    </project>
    
    <property>
      <name>plugin.includes</name>
      <value>plugin-1|plugin-2|myPlugin</value>
      <description>Added myPlugin</description>
    </property>
    
    <field name="pageLength" type="long" stored="true" indexed="true"/>
    <field dest="pageLength" source="pageLength"/>
    
    Active IndexWriters :
    SOLRIndexWriter
        solr.server.url : URL of the SOLR instance (mandatory)
        solr.commit.size : buffer size when sending to SOLR (default 1000)
        solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
        solr.auth : use authentication (default false)
        solr.auth.username : username for authentication
        solr.auth.password : password for authentication    
    IndexingJob: done.
    
    2016-07-26 16:53:25,649 INFO  solr.SolrMappingReader - source: content dest: content
    2016-07-26 16:53:25,649 INFO  solr.SolrMappingReader - source: title dest: title
    2016-07-26 16:53:25,649 INFO  solr.SolrMappingReader - source: host dest: host
    2016-07-26 16:53:25,649 INFO  solr.SolrMappingReader - source: batchId dest: batchId
    2016-07-26 16:53:25,649 INFO  solr.SolrMappingReader - source: boost dest: boost
    2016-07-26 16:53:25,649 INFO  solr.SolrMappingReader - source: digest dest: digest
    2016-07-26 16:53:25,649 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp
    2016-07-26 16:53:25,649 INFO  solr.SolrMappingReader - source: pageLength dest: pageLength
    2016-07-26 16:53:26,140 INFO  solr.SolrIndexWriter - Total 1 document is added.
    2016-07-26 16:53:26,140 INFO  indexer.IndexingJob - IndexingJob: done.
    
    <extension id="org.apache.nutch.indexer.AddField"
           name="Add Field to Index"
           point="org.apache.nutch.indexer.IndexingFilter">
         <implementation id="myPlugin"
             class="org.apache.nutch.indexer.AddField"/>
    </extension>