Java MapReduce：一行输入文件的两次拆分（执行map方法）_Java_Hadoop_Mapreduce

Java MapReduce：一行输入文件的两次拆分（执行map方法）

java hadoop mapreduce

Java MapReduce：一行输入文件的两次拆分（执行map方法）,java,hadoop,mapreduce,Java,Hadoop,Mapreduce,我开发了一个mapReduce程序来计算30分钟内的请求数量和这段时间内搜索最多的单词，并将其登录到请求文件中我的输入文件是： 01_11_2012 12_02_10 132.227.045.028 life 02_11_2012 02_52_10 132.227.045.028 restaurent+kitchen 03_11_2012 12_32_10 132.227.045.028 guitar+music 04_11_2012 13_52_10 132.227.045.028 book

我开发了一个mapReduce程序来计算30分钟内的请求数量和这段时间内搜索最多的单词，并将其登录到请求文件中

我的输入文件是：

01_11_2012 12_02_10 132.227.045.028 life
02_11_2012 02_52_10 132.227.045.028 restaurent+kitchen
03_11_2012 12_32_10 132.227.045.028 guitar+music
04_11_2012 13_52_10 132.227.045.028 book+music
05_11_2012 12_22_10 132.227.045.028 animal+life
05_11_2012 12_22_10 132.227.045.028 history

DD_MM_YYYY | HH_MM_SS | ip |搜索词

我的输出文件应该显示如下内容：

between 02h30 and 2h59 restaurent 1  
between 13h30 and 13h59 book 1
between 12h00 and 12h29 life 3  
between 12h30 and 12h59 guitar 1

第一行：restaurent是02h30和2h59之间的时间段的最大搜索词，1表示请求数

我的问题是对于同一行执行redundent映射。因此，我使用以下输入（文件中的1行）测试程序

01_11_2012 12_02_10 132.227.045.028生活

当我使用每行eclipse行进行调试时，在下面的映射行上放置一个断点

context.write(key, result);

我的程序在这一行上通过两次，并为唯一的输入行写入两次相同的信息

我被困在这一点上，我不知道为什么我得到2个地图任务，因为我应该只有一个关于我的输入分裂

节目如下。（对不起我的英语）

我从以下链接获得解决方案：

我在日志中没有看到要处理的总输入路径：2。正如他们在链接中所说，我只需要对这句话进行评论

FileInputFormat.addInputPath(job, new Path(args[0]));

我不理解“这行只是将输入追加回配置”的注释谁能解释一下吗在你的主要（工作）方法中，任何想法都是值得赞赏的。这些行是重复的：

FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

另外：

job.setJarByClass（booblebyments.class）
但是此行应该导致重复输入：FileInputFormat.addInputPath（作业，新路径（args[0]）
因此，您的主要方法应该是：
 public static void main(String[] args) throws Exception {

        Job job = new org.apache.hadoop.mapreduce.Job();
        job.setJarByClass(BoobleByMinutes.class);
        job.setJobName("Booble mot le plus recherché et somme de requete par tranche de 30 minutes");

        job.setMapperClass(TokenizerMapper.class);
//      job.setCombinerClass(PriceSumReducer.class);
        job.setReducerClass(PriceSumReducer.class);

        job.setNumReduceTasks(1);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }

您是否可以签入文件，如果该文件有两行同时使用grep'12_02_10'inputFile？请尝试在真正的Hadoop设置中运行它，而不是使用Eclipse来执行一些意外操作。我怀疑编辑器中的某个文件的备份副本会复制您的输入。我尝试了grep'12_02_10'myFile命令并返回了一行。Hi Radim，当我在真正的hadoop中使用Thread启动jar文件时，我得到了奇怪的结果。工作停留在地图0%减少0%，突然让我退出ubuntu会话。所以当我登录时，我必须重新启动我的守护进程。。。
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

 public static void main(String[] args) throws Exception {

        Job job = new org.apache.hadoop.mapreduce.Job();
        job.setJarByClass(BoobleByMinutes.class);
        job.setJobName("Booble mot le plus recherché et somme de requete par tranche de 30 minutes");

        job.setMapperClass(TokenizerMapper.class);
//      job.setCombinerClass(PriceSumReducer.class);
        job.setReducerClass(PriceSumReducer.class);

        job.setNumReduceTasks(1);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }