Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/66.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用rmr的Rhadoop-wordcount_R_Hadoop_Rhadoop - Fatal编程技术网

使用rmr的Rhadoop-wordcount

使用rmr的Rhadoop-wordcount,r,hadoop,rhadoop,R,Hadoop,Rhadoop,我正在尝试使用Rhadoop包运行一个简单的rmr作业,但它不起作用 print("Initializing variable.....") Sys.setenv(HADOOP_HOME="/usr/hdp/2.2.4.2-2/hadoop") Sys.setenv(HADOOP_CMD="/usr/hdp/2.2.4.2-2/hadoop/bin/hadoop") print("Invoking functions.......") #Referece taken from Revolutio

我正在尝试使用Rhadoop包运行一个简单的rmr作业,但它不起作用

print("Initializing variable.....")
Sys.setenv(HADOOP_HOME="/usr/hdp/2.2.4.2-2/hadoop")
Sys.setenv(HADOOP_CMD="/usr/hdp/2.2.4.2-2/hadoop/bin/hadoop")
print("Invoking functions.......")
#Referece taken from Revolution Analytics
wordcount = function(    input,     output = NULL,     pattern = " ")
{
mapreduce(
      input = input ,
      output = output,
      input.format = "text",
      map = wc.map,
      reduce = wc.reduce,
      combine = T)
}

wc.map =
      function(., lines) {
        keyval(
          unlist(
            strsplit(
              x = lines,
              split = pattern)),
          1)}

wc.reduce =
      function(word, counts ) {
        keyval(word, sum(counts))}

#Function Invoke

wordcount('/user/hduser/rmr/wcinput.txt')
我正在运行上面的脚本

Rscript wordcount.r
我正在犯错误

[1] "Initializing variable....."
[1] "Invoking functions......."
Error in wordcount("/user/hduser/rmr/wcinput.txt") :
could not find function "mapreduce"
Execution halted

请告诉我问题出在哪里。

首先,您必须在代码中设置
HADOOP\u STREAMING
环境变量

请尝试下面的代码,并注意,该代码假定您已将文本文件复制到
hdfs
文件夹
examples/wordcount/data

R代码: 下面是运行R字数映射减少程序的另一个示例,供您参考


希望这有帮助。

此程序会出现流媒体问题。如何解决这些问题?检查这个类似于你的问题已经回答,看看这是否有帮助。
Sys.setenv("HADOOP_CMD"="/usr/local/hadoop/bin/hadoop")
Sys.setenv("HADOOP_STREAMING"="/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.4.0.jar")

# load librarys
library(rmr2)
library(rhdfs)

# initiate rhdfs package
hdfs.init()

map <- function(k,lines) {
  words.list <- strsplit(lines, '\\s')
  words <- unlist(words.list)
  return( keyval(words, 1) )
}

reduce <- function(word, counts) {
  keyval(word, sum(counts))
}

wordcount <- function (input, output=NULL) {
  mapreduce(input=input, output=output, input.format="text", map=map, reduce=reduce)
}

## read text files from folder example/wordcount/data
hdfs.root <- 'example/wordcount'
hdfs.data <- file.path(hdfs.root, 'data')

## save result in folder example/wordcount/out
hdfs.out <- file.path(hdfs.root, 'out')

## Submit job
out <- wordcount(hdfs.data, hdfs.out) 

## Fetch results from HDFS
results <- from.dfs(out)
results.df <- as.data.frame(results, stringsAsFactors=F)
colnames(results.df) <- c('word', 'count')

head(results.df)
word count
  AS    16
  As     5
  B.     1
  BE    13
  BY    23
  By     7