Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/hadoop/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R-Hadoop计数_R_Hadoop_Rhadoop_Rmr2 - Fatal编程技术网

R-Hadoop计数

R-Hadoop计数,r,hadoop,rhadoop,rmr2,R,Hadoop,Rhadoop,Rmr2,我是R方面的新手,我对MapReduce rmr2有一个问题。我有一个这样的文件要读,每一行都有一个日期和一些单词(a,B,C…): 我想在输出中得到如下结果: 2016-05: A 3 2016-05: E 4 2016-05: E 4 我在java实现中也做了同样的问题,现在我在R代码中也做了同样的问题,但我必须弄清楚如何做我的缩减器。有一种方法可以在我的mapper和Reduce代码中进行一些打印,因为在mapper或Reduce中使用print命令,我会在RStudio中获得一个错误

我是R方面的新手,我对MapReduce rmr2有一个问题。我有一个这样的文件要读,每一行都有一个日期和一些单词(a,B,C…):

我想在输出中得到如下结果:

2016-05: A 3 
2016-05: E 4
2016-05: E 4
我在java实现中也做了同样的问题,现在我在R代码中也做了同样的问题,但我必须弄清楚如何做我的缩减器。有一种方法可以在我的mapper和Reduce代码中进行一些打印,因为在mapper或Reduce中使用print命令,我会在RStudio中获得一个错误

Sys.setenv(HADOOP_STREAMING = "/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.8.0.jar")
Sys.setenv(HADOOP_HOME = "/usr/local/hadoop/bin/hadoop")
Sys.setenv(HADOOP_CMD = "/usr/local/hadoop/bin/hadoop") 

library(stringr)
library(rmr2)
library(stringi)
customMapper = function(k,v){
  #words = unlist(strsplit(v,"\\s"))
  #words = unlist(strsplit(v,","))
  tmp = unlist(stri_split_fixed(v, pattern= ",",n = 2))
  data = tmp[1]
  onlyYearMonth = unlist(stri_split_fixed(data, pattern= "-",n = 3))
  #print(words)
  words = unlist(strsplit(tmp[2],","))
  compositeK = paste(onlyYearMonth[1],"-",onlyYearMonth[2])
  keyval(compositeK,words)

}

customReducer = function(k,v) {
    #Here there are all the value with same date ??? 
    elementsWithSameDate = unlist(v)

    #defining something similar to java Map to use for counting elements in same date
    # myMap

    for(elWithSameDate in  elementsWithSameDate) {

      words = unlist(strsplit(elWithSameDate,","))
      for(word in words) {
        compositeNewK = paste(k,":",word)
        # if myMap contains compositeNewK
             # myMap (compositeNewK, 1 + myMap.getValue(compositeNewK))
        # else 
             #myMap (compositeNewK, 1)

      }
    }

    #here i want to transorm myMap in a String, containing the first 3 words with max occurrencies
    #fromMapToString = convert(myMap)
    keyval(k,fromMapToString)
}


wordcount = function(inputData,outputData=NULL){
  mapreduce(input = inputData,output = outputData,input.format = "text",map = customMapper,reduce = customReducer)
}


hdfs.data = file.path("/user/hduser","folder2")
hdfs.out  = file.path("/user/hduser","output1")

result = wordcount(hdfs.data,hdfs.out)

为什么需要这个
rmr2
库?Hadoop流媒体从标准输入读取并写入标准输出。。。。换句话说,您可以完全不用hadoop完成所有这些
cat input.txt | mapper.r | sort-k1,1 | reducer.r
(取自此处)它的大学作业…使用Hadoop流媒体或使用
rmr2
?使用rmr2进行映射减少了映射程序的工作?如果您正在寻找HashMapJava等价物,那么R中有
散列。
Sys.setenv(HADOOP_STREAMING = "/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.8.0.jar")
Sys.setenv(HADOOP_HOME = "/usr/local/hadoop/bin/hadoop")
Sys.setenv(HADOOP_CMD = "/usr/local/hadoop/bin/hadoop") 

library(stringr)
library(rmr2)
library(stringi)
customMapper = function(k,v){
  #words = unlist(strsplit(v,"\\s"))
  #words = unlist(strsplit(v,","))
  tmp = unlist(stri_split_fixed(v, pattern= ",",n = 2))
  data = tmp[1]
  onlyYearMonth = unlist(stri_split_fixed(data, pattern= "-",n = 3))
  #print(words)
  words = unlist(strsplit(tmp[2],","))
  compositeK = paste(onlyYearMonth[1],"-",onlyYearMonth[2])
  keyval(compositeK,words)

}

customReducer = function(k,v) {
    #Here there are all the value with same date ??? 
    elementsWithSameDate = unlist(v)

    #defining something similar to java Map to use for counting elements in same date
    # myMap

    for(elWithSameDate in  elementsWithSameDate) {

      words = unlist(strsplit(elWithSameDate,","))
      for(word in words) {
        compositeNewK = paste(k,":",word)
        # if myMap contains compositeNewK
             # myMap (compositeNewK, 1 + myMap.getValue(compositeNewK))
        # else 
             #myMap (compositeNewK, 1)

      }
    }

    #here i want to transorm myMap in a String, containing the first 3 words with max occurrencies
    #fromMapToString = convert(myMap)
    keyval(k,fromMapToString)
}


wordcount = function(inputData,outputData=NULL){
  mapreduce(input = inputData,output = outputData,input.format = "text",map = customMapper,reduce = customReducer)
}


hdfs.data = file.path("/user/hduser","folder2")
hdfs.out  = file.path("/user/hduser","output1")

result = wordcount(hdfs.data,hdfs.out)