Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/hadoop/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在Hadoop上运行wordcount R示例代码时出错_R_Hadoop_Rhadoop - Fatal编程技术网

在Hadoop上运行wordcount R示例代码时出错

在Hadoop上运行wordcount R示例代码时出错,r,hadoop,rhadoop,R,Hadoop,Rhadoop,R wordcount示例代码: library(rmr2) map <- function(k,lines) { words.list <- strsplit(lines, '\\s') words <- unlist(words.list) return( keyval(words, 1) ) } reduce <- function(word, counts) { keyval(word, sum(counts)) } word

R wordcount示例代码:

library(rmr2) 
map <- function(k,lines) {
    words.list <- strsplit(lines, '\\s') 
    words <- unlist(words.list)
    return( keyval(words, 1) )
}
reduce <- function(word, counts) { 
    keyval(word, sum(counts))
}
wordcount <- function (input, output=NULL) { 
    mapreduce(input=input, output=output, input.format = "text", map=map, reduce=reduce)
}
system("/opt/hadoop/hadoop-2.5.1/bin/hadoop fs -rm -r /wordcount/out")
hdfs.root <- 'wordcount'
hdfs.data <- file.path(hdfs.root, 'data')
hdfs.out <- file.path(hdfs.root, 'out')
错误发生后,将显示:

INFO mapreduce.Job:  map 100% reduce 100%

out put文件夹是在HDFS中创建的,但不会生成任何结果。知道是什么导致了这个问题吗

更新1: 我发现了Hadoop为localhost:8042上的特定作业提供的错误日志

Dec 11, 2014 3:26:38 PM com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider get
WARNING: You are attempting to use a deprecated API (specifically, attempting to @Inject ServletContext inside an eagerly created singleton. While we allow this for backwards compatibility, be warned that this MAY have unexpected behavior if you have more than one injector (with ServletModule) running in the same JVM. Please consult the Guice documentation at http://code.google.com/p/google-guice/wiki/Servlets for more information.
Dec 11, 2014 3:26:40 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider class
Dec 11, 2014 3:26:40 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class
Dec 11, 2014 3:26:40 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a root resource class
Dec 11, 2014 3:26:40 PM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM'
Dec 11, 2014 3:26:40 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to GuiceManagedComponentProvider with the scope "Singleton"
Dec 11, 2014 3:26:43 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider with the scope "Singleton"
Dec 11, 2014 3:26:45 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider with the scope "PerRequest"
log4j:WARN No appenders could be found for logger (org.apache.hadoop.ipc.Server).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
有人知道问题是什么吗

更新2: 我在$HADOOP\u HOME/logs/userlogs/[application\u id]/[container\u id]/stderr中找到了额外的日志信息:

...
Error : .onLoad failed in loadNamespace() for 'rhdfs', details:
call: fun(libname, pkgname)
  error: Environment variable HADOOP_CMD must be set before loading package rhdfs
Warning in FUN(c("base", "methods", "datasets", "utils", "grDevices", "graphics",  :
can't load rhdfs
Loading required package: rmr2
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : 
there is no package called ‘stringr’
...

在深入查看错误日志之后,似乎我已经在用户级别安装了R库,我应该在系统级别安装它。有关如何将R库安装到系统级的详细信息,请参见本页。dev_工具包可能会很方便,请记住在sudo下运行R,或者您可以选择sudo R CMD install[package_name]

您可以按system.filepackage=[package_name]在R中仔细检查包的安装路径,尽管这始终显示包的第一个首选库路径。因此,我强烈推荐以前安装的用户库


再运行几次以仔细检查错误日志,并确保在R系统库中正确安装了包。stderr日志很有用,但之前没有人指出实际位置:-

不相关,但我会使用\\s+来safety@TylerRinker使用\\s+vs\s有什么区别?使用此选项,您会告诉我:x
Dec 11, 2014 3:26:38 PM com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider get
WARNING: You are attempting to use a deprecated API (specifically, attempting to @Inject ServletContext inside an eagerly created singleton. While we allow this for backwards compatibility, be warned that this MAY have unexpected behavior if you have more than one injector (with ServletModule) running in the same JVM. Please consult the Guice documentation at http://code.google.com/p/google-guice/wiki/Servlets for more information.
Dec 11, 2014 3:26:40 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider class
Dec 11, 2014 3:26:40 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class
Dec 11, 2014 3:26:40 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a root resource class
Dec 11, 2014 3:26:40 PM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM'
Dec 11, 2014 3:26:40 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to GuiceManagedComponentProvider with the scope "Singleton"
Dec 11, 2014 3:26:43 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider with the scope "Singleton"
Dec 11, 2014 3:26:45 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider with the scope "PerRequest"
log4j:WARN No appenders could be found for logger (org.apache.hadoop.ipc.Server).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
...
Error : .onLoad failed in loadNamespace() for 'rhdfs', details:
call: fun(libname, pkgname)
  error: Environment variable HADOOP_CMD must be set before loading package rhdfs
Warning in FUN(c("base", "methods", "datasets", "utils", "grDevices", "graphics",  :
can't load rhdfs
Loading required package: rmr2
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : 
there is no package called ‘stringr’
...