Web crawler Nutch 1.13爬网脚本不工作_Web Crawler_Nutch

Web crawler Nutch 1.13爬网脚本不工作

web-crawler

Web crawler Nutch 1.13爬网脚本不工作,web-crawler,nutch,Web Crawler,Nutch,我已经安装、配置了Nutch 1.10并使用了爬网脚本，但正在尝试升级到Nutch 1.13。我很难让Nutch爬行脚本与Nutch v1.13一起工作这通常适用于v1.10 bin/crawl -i -D elastic.server.url=http://localhost:9300/search-index/ urls/ searchcrawl/ 2 然而，当我尝试使用它运行v1.13时，我得到了 Usage: crawl [-i|--index] [-D "key=value"]

我已经安装、配置了Nutch 1.10并使用了爬网脚本，但正在尝试升级到Nutch 1.13。我很难让Nutch爬行脚本与Nutch v1.13一起工作

这通常适用于v1.10

bin/crawl -i -D elastic.server.url=http://localhost:9300/search-index/ urls/ searchcrawl/  2

然而，当我尝试使用它运行v1.13时，我得到了

Usage: crawl [-i|--index] [-D "key=value"] [-w|--wait] [-s <Seed Dir>] <Crawl Dir> <Num Rounds>
-i|--index  Indexes crawl results into a configured indexer
-D      A Java property to pass to Nutch calls
-w|--wait   NUMBER[SUFFIX] Time to wait before generating a new segment when no URLs
        are scheduled for fetching. Suffix can be: s for second,
        m for minute, h for hour and d for day. If no suffix is
        specified second is used by default.
-s Seed Dir Path to seeds file(s)
Crawl Dir   Directory where the crawl/link/segments dirs are saved
Num Rounds  The number of rounds to run this crawl for

用法：爬网[-i |--index][-D“key=value”][-w |--wait][-s]
-i |--索引将结果爬网到配置的索引器中
-D传递给Nutch调用的Java属性
-w |--等待编号[后缀]在没有URL时生成新段之前等待的时间
已计划进行获取。后缀可以是：s表示秒，
m代表分钟，h代表小时，d代表天。如果没有后缀
默认情况下使用指定的秒。
-s种子目录到种子文件的路径
爬网目录，其中保存爬网/链接/分段目录
Num Rounds运行此爬网的轮数

我在文件中没有看到任何不同。。。我错过什么了吗？如何让爬网脚本与v1.13一起工作？

经过一些更好的搜索后才找到

在1.14中，bin/crawl脚本现在希望种子的路径前面有-s

这项工作： bin/crawl-i-D elastic.server.url= -surl/searchcrawl/2

-其他人在经过更好的搜索后找到了

在1.14中，bin/crawl脚本现在希望种子的路径前面有-s

这项工作： bin/crawl-i-D elastic.server.url= -surl/searchcrawl/2

-其他人

谢谢，这帮了我的忙。文档上没有这么说谢谢，这对我很有帮助。文档中没有这样说