和RHADOOP有问题吗?
我已经检查了这个问题,并且尝试了我这边的答案。但它带来了很多问题。和RHADOOP有问题吗?,r,hadoop,rstudio-server,rhadoop,R,Hadoop,Rstudio Server,Rhadoop,我已经检查了这个问题,并且尝试了我这边的答案。但它带来了很多问题。 代码如下: Sys.setenv("HADOOP_CMD"="/usr/local/hadoop/bin/hadoop") Sys.setenv("HADOOP_STREAMING"="/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.4.0.jar") # load librarys library(rmr2) library(rhdfs) # init
代码如下:
Sys.setenv("HADOOP_CMD"="/usr/local/hadoop/bin/hadoop")
Sys.setenv("HADOOP_STREAMING"="/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.4.0.jar")
# load librarys
library(rmr2)
library(rhdfs)
# initiate rhdfs package
hdfs.init()
map <- function(k,lines) {
words.list <- strsplit(lines, '\\s')
words <- unlist(words.list)
return( keyval(words, 1) )
}
reduce <- function(word, counts) {
keyval(word, sum(counts))
}
wordcount <- function (input, output=NULL) {
mapreduce(input=input, output=output, input.format="text", map=map, reduce=reduce)
}
## read text files from folder example/wordcount/data
hdfs.root <- 'example/wordcount'
hdfs.data <- file.path(hdfs.root, 'data')
## save result in folder example/wordcount/out
hdfs.out <- file.path(hdfs.root, 'out')
## Submit job
out <- wordcount(hdfs.data, hdfs.out)
## Fetch results from HDFS
results <- from.dfs(out)
results.df <- as.data.frame(results, stringsAsFactors=F)
colnames(results.df) <- c('word', 'count')
head(results.df)
Sys.setenv(“HADOOP_CMD”=“/usr/local/HADOOP/bin/HADOOP”)
Sys.setenv(“HADOOP_STREAMING”=“/usr/local/HADOOP/share/HADOOP/tools/lib/HADOOP-STREAMING-2.4.0.jar”)
#加载库
图书馆(rmr2)
图书馆(rhdfs)
#启动rhdfs包
hdfs.init()
映射什么是hadoop版本?在上面粘贴的代码中,hadoop流媒体jar版本是2.4.0,但是在问题链接中,它说是2.7.3!与您的问题类似的问题已经在下面的链接中得到了回答,看看这是否有帮助!