Java 如果我更新url过滤器文本,我需要从命令行调用哪个Nutch命令

Java 如果我更新url过滤器文本,我需要从命令行调用哪个Nutch命令,java,mapreduce,nutch,web-crawler,Java,Mapreduce,Nutch,Web Crawler,坚果大师 如果我更改文件,如robots.txt,或regex urlfilter.txt以及任何此类资源,我需要调用哪个命令 我不确定从坚果的说明。我猜这是解析器的工作,但我不确定 卡尔蒂克 根据说明书 # echo " crawl one-step crawler for intranets" echo " inject inject new urls into the database" echo " hostinject creates or updates an

坚果大师

如果我更改文件,如robots.txt,或regex urlfilter.txt以及任何此类资源,我需要调用哪个命令

我不确定从坚果的说明。我猜这是解析器的工作,但我不确定

卡尔蒂克

根据说明书

# echo " crawl one-step crawler for intranets"
  echo " inject     inject new urls into the database"
  echo " hostinject     creates or updates an existing host table from a text file"
  echo " generate   generate new batches to fetch from crawl db"
  echo " fetch      fetch URLs marked during generate"
  echo " parse      parse URLs marked during fetch"
  echo " updatedb   update web table after parsing"
  echo " updatehostdb   update host table after parsing"
  echo " readdb     read/dump records from page database"
  echo " readhostdb     display entries from the hostDB"
  echo " elasticindex   run the elasticsearch indexer"
  echo " solrindex  run the solr indexer on parsed batches"
  echo " solrdedup  remove duplicates from solr"
  echo " parsechecker   check the parser for a given url"
  echo " indexchecker   check the indexing filters for a given url"
  echo " plugin     load a plugin and run one of its classes main()"
  echo " nutchserver    run a (local) Nutch server on a user defined port"
  echo " junit          runs the given JUnit test"
  echo " or"
  echo " CLASSNAME  run the class named CLASSNAME"
  echo "Most commands print help when invoked w/o parameters."

如果更改regex-urlfilter.txt文件,则需要更新nutch作业文件。这可以通过以下方式实现:

jar-uvf/usr/local/nutch-1.2/nutch-1.2.job