Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/378.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 在Windows上安装Apache Nutch_Java_Hadoop_Solr_Nutch - Fatal编程技术网

Java 在Windows上安装Apache Nutch

Java 在Windows上安装Apache Nutch,java,hadoop,solr,nutch,Java,Hadoop,Solr,Nutch,我试图在Windows7(64位)上集成ApacheSolr和ApacheNutch1.14,但在尝试运行Nutch时出错 我已经做过的事情: 将JAVA\u HOME env变量设置为:C:\Program Files\JAVA\jdk1.8.0\u 25或C:\Progra~1\JAVA\jdk1.8.0\u 25 从下载Hadoop WinUtils文件,将其放入c:\WinUtils\bin,将Hadoop\u HOME env变量设置为c:\winutil,并将c:\winutil\

我试图在Windows7(64位)上集成ApacheSolr和ApacheNutch1.14,但在尝试运行Nutch时出错

我已经做过的事情:

  • 将JAVA\u HOME env变量设置为:C:\Program Files\JAVA\jdk1.8.0\u 25或C:\Progra~1\JAVA\jdk1.8.0\u 25
  • 从下载Hadoop WinUtils文件,将其放入c:\WinUtils\bin,将Hadoop\u HOME env变量设置为c:\winutil,并将c:\winutil\bin文件夹添加到PATH
(我尝试了Hadoop WinUtils 2.7.1,但也没有成功)

我得到的错误是:

$ bin/crawl -i -D http://localhost:8983/solr/ -s urls/ TestCrawl 2
  Injecting seed URLs
  /home/apache-nutch-1.14/bin/nutch inject TestCrawl/crawldb urls/
  Injector: starting at 2018-06-20 07:14:47
  Injector: crawlDb: TestCrawl/crawldb
  Injector: urlDir: urls
  Injector: Converting injected urls to crawl db entries.
  Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
    at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
    at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:609)
    at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:977)
    at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:187)
    at org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174)
    at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:108)
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.confChanged(LocalDirAllocator.java:285)
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:344)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115)
    at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:125)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:163)
    at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
    at org.apache.nutch.crawl.Injector.inject(Injector.java:417)
    at org.apache.nutch.crawl.Injector.run(Injector.java:563)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.nutch.crawl.Injector.main(Injector.java:528)
  Error running:
    /home/apache-nutch-1.14/bin/nutch inject TestCrawl/crawldb urls/
  Failed with exit value 1.
如果不设置HADOOP_HOME变量,则会出现以下异常:

Injector: java.io.IOException: (null) entry in command string: null chmod 0644 C:\cygwin64\home\apache-nutch-1.14\TestCrawl\crawldb\.locked
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
    at org.apache.hadoop.util.Shell.execCommand(Shell.java:869)
    at org.apache.hadoop.util.Shell.execCommand(Shell.java:852)
    at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:733)
    at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:225)
    at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:209)
    at org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:307)
    at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:296)
    at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:328)
    at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:398)
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:461)
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:854)
    at org.apache.hadoop.fs.FileSystem.createNewFile(FileSystem.java:1154)
    at org.apache.nutch.util.LockUtil.createLockFile(LockUtil.java:59)
    at org.apache.nutch.util.LockUtil.createLockFile(LockUtil.java:81)
    at org.apache.nutch.crawl.CrawlDb.lock(CrawlDb.java:178)
    at org.apache.nutch.crawl.Injector.inject(Injector.java:398)
    at org.apache.nutch.crawl.Injector.run(Injector.java:563)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.nutch.crawl.Injector.main(Injector.java:528)

  Error running:
    /home/apache-nutch-1.14/bin/nutch inject TestCrawl//crawldb urls/
  Failed with exit value 127.
Injector:java.io.IOException:(null)命令字符串中的条目:null chmod 0644 C:\cygwin64\home\apache-nutch-1.14\TestCrawl\crawdb\.locked
位于org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
位于org.apache.hadoop.util.Shell.execCommand(Shell.java:869)
位于org.apache.hadoop.util.Shell.execCommand(Shell.java:852)
位于org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:733)
位于org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream。(RawLocalFileSystem.java:225)
位于org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream。(RawLocalFileSystem.java:209)
位于org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:307)
位于org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:296)
位于org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:328)
位于org.apache.hadoop.fs.checksumfsystem$checksumfsoutputsumer.(checksumffilesystem.java:398)
位于org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:461)
位于org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440)
位于org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
位于org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
位于org.apache.hadoop.fs.FileSystem.create(FileSystem.java:854)
位于org.apache.hadoop.fs.FileSystem.createNewFile(FileSystem.java:1154)
位于org.apache.nutch.util.LockUtil.createLockFile(LockUtil.java:59)
位于org.apache.nutch.util.LockUtil.createLockFile(LockUtil.java:81)
位于org.apache.nutch.crawl.CrawlDb.lock(CrawlDb.java:178)
位于org.apache.nutch.crawl.Injector.Injector(Injector.java:398)
位于org.apache.nutch.crawl.Injector.run(Injector.java:563)
位于org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
位于org.apache.nutch.crawl.Injector.main(Injector.java:528)
运行时出错:
/home/apache-nutch-1.14/bin/nutch-inject-TestCrawl//crawldb-url/
失败,退出值为127。

我真的很感激能得到的任何帮助

执行爬网时,只需执行以下命令

bin/crawl -s urls/ TestCrawl/ 2
在你可以使用这个(-D和类)之后

也可以在conf/nutch-site.xml中指定

<property>
    <name>solr.server.url</name>
    <value>http://localhost:8983/solr/YOURCORE/</value>
    <description>Defines the Solr URL into which data should be indexed using the indexer-solr plugin.</description>
</property> 

solr.server.url
http://localhost:8983/solr/YOURCORE/
定义应使用indexer Solr插件将数据索引到的Solr URL。

如果Nutch支持Hadoop 3.xIn,我会感到惊讶。此外,我尝试了Hadoop WinUtils版本2.7.1:,但没有成功。您实际运行的是哪个Hadoop版本?这将包括一个Hadoop核心jar文件,因此无需下载YourSelf,我遵循了教程:,它没有说明任何关于安装Hadoop的内容。你认为我需要吗?如果是这样,我如何在Windows上正确安装它?谢谢你说你有HADOOP_HOME变量,这意味着你下载了HADOOP二进制文件,不仅仅是winutilsI,我在尝试运行第一个命令时也会遇到同样的错误:bin/crawl-s url/TestCrawl/2I无法评论您的第一篇文章,但您必须删除。锁定的文件,但我不能-它是在运行第一个命令时自动生成的
bin/nutch index -Dsolr.server.url=http://localhost:8983/solr/YOURCORE TestCrawl/crawldb/ -linkdb TestCrawl/linkdb/ TestCrawl/segments/* -filter -normalize -deleteGone
<property>
    <name>solr.server.url</name>
    <value>http://localhost:8983/solr/YOURCORE/</value>
    <description>Defines the Solr URL into which data should be indexed using the indexer-solr plugin.</description>
</property>