Apache Nutch bin/crawl脚本失败-手动步骤工作正常

Apache Nutch bin/crawl脚本失败-手动步骤工作正常,apache,shell,solr,nutch,Apache,Shell,Solr,Nutch,我正在尝试运行Nutch 1.6“bin/crawl”中提供的脚本,该脚本执行以下所有手动步骤,以启动并爬行站点 当我手动运行这些步骤时,一切正常,我的页面按预期编制了索引(尽管只有一个页面,但将对此进行研究) 已创建包含URL@seeds/URL.txt的文本文件 bin/nutch inject crawl_test/crawldb seeds/ bin/nutch generate crawl_test/crawldb crawl_test/segments export SEGMEN

我正在尝试运行Nutch 1.6“bin/crawl”中提供的脚本,该脚本执行以下所有手动步骤,以启动并爬行站点

当我手动运行这些步骤时,一切正常,我的页面按预期编制了索引(尽管只有一个页面,但将对此进行研究)

已创建包含URL@seeds/URL.txt的文本文件

bin/nutch inject crawl_test/crawldb seeds/

bin/nutch generate crawl_test/crawldb crawl_test/segments

export SEGMENT=crawl_test/segments/`ls -tr crawl_test/segments|tail -1`

bin/nutch fetch $SEGMENT -noParsing

bin/nutch parse $SEGMENT

bin/nutch updatedb crawl_test/crawldb $SEGMENT -filter -normalize

bin/nutch invertlinks crawl_test/linkdb -dir crawl_test/segments

bin/nutch solrindex http://dev:8080/solr/ crawl_test/crawldb -linkdb crawl_test/linkdb crawl_test/segments/*
bin/crawl脚本出现此错误


你知道为什么这个脚本不起作用吗?我认为这一定是脚本本身而不是我的配置中的错误,因为它正在查找的路径不存在,并且不确定它为什么会在那里查找。

看起来脚本中存在错误

Indexing 20130412115759 on SOLR index -> someurl:8080/solr/
SolrIndexer: starting at 2013-04-12 11:58:47
SolrIndexer: deleting gone documents: false
SolrIndexer: URL filtering: false
SolrIndexer: URL normalizing: false
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/opt/nutch/20130412115759/crawl_fetch
Input path does not exist: file:/opt/nutch/20130412115759/crawl_parse
Input path does not exist: file:/opt/nutch/20130412115759/parse_data
Input path does not exist: file:/opt/nutch/20130412115759/parse_text
-  $bin/nutch solrindex $SOLRURL $CRAWL_PATH/crawldb -linkdb $CRAWL_PATH/linkdb $SEGMENT
+  $bin/nutch solrindex $SOLRURL $CRAWL_PATH/crawldb -linkdb $CRAWL_PATH/linkdb $CRAWL_PATH/segments/$SEGMENT