Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/solr/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Can';不要让apache nutch爬网-怀疑权限和JAVA_主页_Java_Solr_Ubuntu 12.04_Nutch - Fatal编程技术网

Can';不要让apache nutch爬网-怀疑权限和JAVA_主页

Can';不要让apache nutch爬网-怀疑权限和JAVA_主页,java,solr,ubuntu-12.04,nutch,Java,Solr,Ubuntu 12.04,Nutch,我正在尝试按照以下步骤运行基本爬网: 所以我用Solr安装了Nutch。我将.bashrc中的$JAVA_HOME设置为/usr/lib/jvm/JAVA-1.6.0-openjdk-amd64 当我从nutch主目录运行bin/nutch时,我看不到任何问题,但当我尝试按上述方式运行爬网时,我得到以下错误: log4j:ERROR setFile(null,true) call failed. java.io.FileNotFoundException: /usr/share/nutch/lo

我正在尝试按照以下步骤运行基本爬网:

所以我用Solr安装了Nutch。我将
.bashrc
中的$JAVA_HOME设置为
/usr/lib/jvm/JAVA-1.6.0-openjdk-amd64

当我从nutch主目录运行
bin/nutch
时,我看不到任何问题,但当我尝试按上述方式运行爬网时,我得到以下错误:

log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /usr/share/nutch/logs/hadoop.log (Permission denied)
        at java.io.FileOutputStream.openAppend(Native Method)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:207)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:131)
        at org.apache.log4j.FileAppender.setFile(FileAppender.java:290)
        at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:164)
        at org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:216)
        at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:257)
        at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:133)
        at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:97)
        at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:689)
        at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:647)
        at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:544)
        at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:440)
        at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:476)
        at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:471)
        at org.apache.log4j.LogManager.<clinit>(LogManager.java:125)
        at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:73)
        at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:270)
        at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:281)
        at org.apache.nutch.crawl.Crawl.<clinit>(Crawl.java:43)
log4j:ERROR Either File or DatePattern options are not set for appender [DRFA].
solrUrl is not set, indexing will be skipped...
crawl started in: crawl
rootUrlDir = urls
threads = 10
depth = 3
solrUrl=null
topN = 5
Injector: starting at 2013-06-28 16:24:53
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: total number of urls rejected by filters: 0
Injector: total number of urls injected after normalization and filtering: 1
Injector: Merging injected urls into crawl db.
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:296)
        at org.apache.nutch.crawl.Crawl.run(Crawl.java:132)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)

所以我觉得我遇到了第22条军规。我应该能够用
sudo
运行这个命令吗,或者我还需要做些什么,这样我就不必用
sudo
运行它了,或者这里完全是在做别的事情吗?

看起来,作为一个普通用户,你没有权限写入
/usr/share/nutch/logs/hadoop.log
,作为安全特性,这是有意义的

要解决此问题,请创建一个简单的bash脚本:

#!/bin/sh
export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk-amd64
bin/nutch crawl urls -dir crawl -depth 3 -topN 5
将其另存为
nutch.sh
,然后使用
sudo
运行它:

sudo sh nutch.sh

解决此问题的关键是将
JAVA\u HOME
变量添加到
sudo
环境中。例如,键入
env
sudo env
,您将看到
JAVA\u HOME
未设置为
sudo
。要解决此问题,您需要添加路径

  • 运行
    sudovisudo
    编辑您的
    /etc/sudoers
    文件。(不要使用标准文本编辑器。这是一个特殊的vi文本编辑器,它将在允许您保存之前验证语法。)
  • 添加此行:

    Defaults env_keep+="JAVA_HOME"
    
    Defaults env_keep
    部分的末尾

  • 重新启动
  • sudo sh nutch.sh
    
    Defaults env_keep+="JAVA_HOME"