Apache “线程中的异常”;“主要”;java.lang.ClassNotFoundException错误
我运行hadoop jar/home/apache-nutch-2.3.1/runtime/deploy/apache-nutch-2.3.1.job org.apache.nutch.crawl.crawl URL-dir crawl-depth 3-topN 5 但我得到了以下错误:Apache “线程中的异常”;“主要”;java.lang.ClassNotFoundException错误,apache,web-crawler,nutch,Apache,Web Crawler,Nutch,我运行hadoop jar/home/apache-nutch-2.3.1/runtime/deploy/apache-nutch-2.3.1.job org.apache.nutch.crawl.crawl URL-dir crawl-depth 3-topN 5 但我得到了以下错误: Exception in thread "main" java.lang.ClassNotFoundException: org.apache.nutch.crawl.Crawl at java
Exception in thread "main" java.lang.ClassNotFoundException: org.apache.nutch.crawl.Crawl
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.util.RunJar.run(RunJar.java:316)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
我在/home/apache-nutch-2.3.1/build/中创建了一个URL/seed.text文件,其中包含以下URL:
http://nutch.apache.org
http://apache.org
我编辑了conf/regex-urlfilter.txt,如下所示:
+^http://([a-z0-9]*\)*apache.org/
自1.8版以来,已删除org.apache.nutch.crawl.crawl类。建议改为运行shell脚本bin/crawl。它将为爬网的每个步骤启动Hadoop作业:注入、生成、获取、解析等