Hadoop 具有MapReduce的置换_Hadoop_Mapreduce_Permutation_Combinations

Hadoop 具有MapReduce的置换

hadoop mapreduce

Hadoop 具有MapReduce的置换,hadoop,mapreduce,permutation,combinations,Hadoop,Mapreduce,Permutation,Combinations,有没有办法用MapReduce生成置换输入文件： 1 title1 2 title2 3 title3 我的目标是： 1,2 title1,title2 1,3 title1,title3 2,3 title2,title3 由于文件将具有n输入，因此排列应具有n^2输出。可以让n任务执行n这些操作。我相信您可以这样做（假设只有一个文件）：将输入文件放入中，以便映射器/还原器以只读方式访问。对文件的每一行进行一次输入拆分（如WordCount）。因此，映射器将接收一行（例如，

有没有办法用MapReduce生成置换

输入文件：

1  title1
2  title2
3  title3

我的目标是：

1,2  title1,title2
1,3  title1,title3
2,3  title2,title3

由于文件将具有

输入，因此排列应具有

n^2

输出。可以让

任务执行

这些操作。我相信您可以这样做（假设只有一个文件）：

将输入文件放入中，以便映射器/还原器以只读方式访问。对文件的每一行进行一次输入拆分（如WordCount）。因此，映射器将接收一行（例如，在您的示例中，

title1

）。然后从DistributedCache中的文件中读取行并发出键/值对：键作为输入，值作为DistributedCache中文件的每一行

在这个模型中，您应该只需要一个映射步骤

比如：

  public static class PermuteMapper
       extends Mapper<Object, Text, Text, Text>{

    private static final IN_FILENAME="file.txt";

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {

      String inputLine = value.toString();

      // set the property mapred.cache.files in your
      // configuration for the file to be available
      Path[] cachedPaths = DistributedCache.getLocalCacheArchives(conf);
      if ( cachedPaths[0].getName().equals(IN_FILENAME) ) {
         // function defined elsewhere
         String[] cachedLines = getLinesFromPath(cachedPaths[0]);
         for (String line : cachedLines)
           context.emit(inputLine, line);
      }
    }
  }

公共静态类PermuteMapper
扩展映射器{
_FILENAME=“file.txt”中的私有静态final；
公共无效映射（对象键、文本值、上下文
)抛出IOException、InterruptedException{
字符串inputLine=value.toString（）；
//在数据库中设置属性mapred.cache.files
//要使用的文件的配置
路径[]cachedPaths=DistributedCache.getLocalCacheArchives（conf）；
if（cachedPath[0].getName（）.equals（在文件名中））{
//其他地方定义的函数
字符串[]cachedLines=getLinesFromPath（CachedPath[0]）；
用于（字符串行：缓存线）
emit（inputLine，line）；
}
}
}