Web crawler 用nutch爬行时出错

Web crawler 用nutch爬行时出错,web-crawler,nutch,Web Crawler,Nutch,我试图用nutch抓取网站,但出现以下错误: java.net.MalformedURLException: no protocol: Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265) at org.apache.nutch.crawl.In

我试图用nutch抓取网站,但出现以下错误:

java.net.MalformedURLException: no protocol:
    Exception in thread "main" java.io.IOException: Job failed!
            at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
            at org.apache.nutch.crawl.Injector.inject(Injector.java:296)
            at org.apache.nutch.crawl.Crawl.run(Crawl.java:127)
            at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
            at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)

检查你的种子名单。运行喷油器作业时发生此错误。可能是因为你的种子名单。您的种子URL应如下所示:。您必须将协议添加为“http/”

谢谢您的回答,这是有效的,但现在我得到了以下错误:线程“main”java.io.IOException中的异常:作业失败!org.apache.nutch.crawl.crawl.run(crawl.java:127)org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)org.apache.nutch.crawl.Injector.Injector(Injector.java:296)org.apache.nutch.crawl.crawl.crawl.run(crawl.java:127)org.apache.hadoop.util.ToolRunner.runner.runner(ToolRunner.java:65)org.apa问题出在哪里?!您使用什么存储(hbase、cassandra或mysql)?检查您的配置。(作为hbase-site.xml…)检查Hadoop日志。你可以看到问题的更多细节。使用mysql,你是对的,我看到了日志,错误是因为目录不存在,这就解决了。谢谢