Java ApacheNutch2.3.1插件不起作用
我必须提取Apache Nutch 2.3.1默认情况下未提供的爬网数据的一些元数据信息。为此,我必须编写一个插件。为了学习的目的,我把它作为出发点。我知道本教程适用于1.x版本。我已经更改了所有必需的类,并成功地构建了它。以下是我遵循的步骤Java ApacheNutch2.3.1插件不起作用,java,apache,plugins,solr,nutch,Java,Apache,Plugins,Solr,Nutch,我必须提取Apache Nutch 2.3.1默认情况下未提供的爬网数据的一些元数据信息。为此,我必须编写一个插件。为了学习的目的,我把它作为出发点。我知道本教程适用于1.x版本。我已经更改了所有必需的类,并成功地构建了它。以下是我遵循的步骤 创建一个类似$NUTCH_HOME/src/plugin/myPlugin的目录 将索引元数据复制到我的插件中,并创建一个文件myField.java cp-r索引元数据/*myPlugin/ 目录列表应该是 plugin/myplgin/plugin.x
2016-07-26 16:53:25,649 INFO solr.SolrMappingReader - source: content dest: content
2016-07-26 16:53:25,649 INFO solr.SolrMappingReader - source: title dest: title
2016-07-26 16:53:25,649 INFO solr.SolrMappingReader - source: host dest: host
2016-07-26 16:53:25,649 INFO solr.SolrMappingReader - source: batchId dest: batchId
2016-07-26 16:53:25,649 INFO solr.SolrMappingReader - source: boost dest: boost
2016-07-26 16:53:25,649 INFO solr.SolrMappingReader - source: digest dest: digest
2016-07-26 16:53:25,649 INFO solr.SolrMappingReader - source: tstamp dest: tstamp
2016-07-26 16:53:25,649 INFO solr.SolrMappingReader - source: pageLength dest: pageLength
2016-07-26 16:53:26,140 INFO solr.SolrIndexWriter - Total 1 document is added.
2016-07-26 16:53:26,140 INFO indexer.IndexingJob - IndexingJob: done.
我如何确认插件是由nutch加载的?
第二,在我将Nutch插件配置为Nutch进行爬行之前,有没有办法测试它?尝试更改plugin.xml中的扩展id。将其更改为“org.apache.nutch.indexer.AddField”,并重新构建nutch
<extension id="org.apache.nutch.indexer.AddField"
name="Add Field to Index"
point="org.apache.nutch.indexer.IndexingFilter">
<implementation id="myPlugin"
class="org.apache.nutch.indexer.AddField"/>
</extension>
我认为这应该解决问题
也只是为了验证控件是否进入插件类,或者是否在代码中添加一些信息日志,如
LOG.info(“从插件打印”)若您能够在hadoop.log中看到这些日志,那个就意味着控制权将进入插件类
<?xml version="1.0" encoding="UTF-8"?>
<project name="myPlugin" default="jar">
<import file="../build-plugin.xml"/>
</project>
<property>
<name>plugin.includes</name>
<value>plugin-1|plugin-2|myPlugin</value>
<description>Added myPlugin</description>
</property>
<field name="pageLength" type="long" stored="true" indexed="true"/>
<field dest="pageLength" source="pageLength"/>
Active IndexWriters :
SOLRIndexWriter
solr.server.url : URL of the SOLR instance (mandatory)
solr.commit.size : buffer size when sending to SOLR (default 1000)
solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
solr.auth : use authentication (default false)
solr.auth.username : username for authentication
solr.auth.password : password for authentication
IndexingJob: done.
2016-07-26 16:53:25,649 INFO solr.SolrMappingReader - source: content dest: content
2016-07-26 16:53:25,649 INFO solr.SolrMappingReader - source: title dest: title
2016-07-26 16:53:25,649 INFO solr.SolrMappingReader - source: host dest: host
2016-07-26 16:53:25,649 INFO solr.SolrMappingReader - source: batchId dest: batchId
2016-07-26 16:53:25,649 INFO solr.SolrMappingReader - source: boost dest: boost
2016-07-26 16:53:25,649 INFO solr.SolrMappingReader - source: digest dest: digest
2016-07-26 16:53:25,649 INFO solr.SolrMappingReader - source: tstamp dest: tstamp
2016-07-26 16:53:25,649 INFO solr.SolrMappingReader - source: pageLength dest: pageLength
2016-07-26 16:53:26,140 INFO solr.SolrIndexWriter - Total 1 document is added.
2016-07-26 16:53:26,140 INFO indexer.IndexingJob - IndexingJob: done.
<extension id="org.apache.nutch.indexer.AddField"
name="Add Field to Index"
point="org.apache.nutch.indexer.IndexingFilter">
<implementation id="myPlugin"
class="org.apache.nutch.indexer.AddField"/>
</extension>