Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/359.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 使用MapReduce分析日志文件_Java_Hadoop_Mapreduce - Fatal编程技术网

Java 使用MapReduce分析日志文件

Java 使用MapReduce分析日志文件,java,hadoop,mapreduce,Java,Hadoop,Mapreduce,以下是一个日志文件: 2011-10-26 06:11:35 user1 210.77.23.12 2011-10-26 06:11:45 user2 210.77.23.17 2011-10-26 06:11:46 user3 210.77.23.12 2011-10-26 06:11:47 user2 210.77.23.89 2011-10-26 06:11:48 user2 210.77.23.12 2011-10-26 06:11:52 user3 210.77.23.12 2011-

以下是一个日志文件:

2011-10-26 06:11:35 user1 210.77.23.12
2011-10-26 06:11:45 user2 210.77.23.17
2011-10-26 06:11:46 user3 210.77.23.12
2011-10-26 06:11:47 user2 210.77.23.89
2011-10-26 06:11:48 user2 210.77.23.12
2011-10-26 06:11:52 user3 210.77.23.12
2011-10-26 06:11:53 user2 210.77.23.12
...
我想使用MapReduce按第三个字段(用户)的日志记录次数按每行降序排序。换句话说,我希望结果显示为:

user2 4
user3 2
user1 1
现在我有两个问题:

  • 默认情况下,MapReduce将使用空格和回车来分割日志文件,但我每行只需要第三个字段,也就是说,我不关心字段,例如
    2011-10-26
    06:11:35
    210.77.23.12
    ,如何让MapReduce忽略它们并选择用户字段

  • 默认情况下,MapReduce将按键对结果进行排序,而不是按值进行排序。如何让MapReduce按值对结果进行排序(记录时间)


  • 谢谢。

    关于您的第一个问题:

    您可能应该将整行代码传递给映射器,并且每次只保留第三个标记用于映射和映射(
    user
    ,1)

    public class AnalyzeLogs
    {       
        public static class FindFriendMapper extends Mapper<Object, Text, Text, IntWritable> {
    
        public void map(Object, Text value, Context context) throws IOException, InterruptedException 
        {       
            String tempStrings[] = value.toString().split(","); 
            context.write(new Text(tempStrings[2]), new IntWritable(1));
        }
    }
    
    您可以将自定义比较器设置为:
    job.setSortComparatorClass(LogDescComparator.class)


    这项工作的减员什么也不做。但是,如果我们不设置一个减缩器,映射器键的排序将无法完成(我们需要这样做)。因此,您需要将
    IdentityReducer
    设置为第二个MR作业的减缩器,以便不进行减缩,但仍然确保映射器的合成关键帧按照我们指定的方式进行排序

    非常感谢你的详细回答。这真的很有帮助!
    public static class SortLogsMapper extends Mapper<Object, Text, Text, NullWritable> {
    
    public void map(Object, Text value, Context context) throws IOException, InterruptedException 
    {       
        context.write(value, new NullWritable());
    }
    
    public static class LogDescComparator extends WritableComparator
    {
        protected LogDescComparator() 
        {
            super(Text.class, true);
        }
    
        @Override
        public int compare(WritableComparable w1, WritableComparable w2)
        {
    
            Text t1 = (Text) w1;
            Text t2 = (Text) w2;
            String[] t1Items = t1.toString().split(" "); //probably it's a " "
            String[] t2Items = t2.toString().split(" ");
            String t1Value = t1Items[1];
            String t2Value = t2Items[1];
            int comp = t2Value.compareTo(t1Value); // We compare using "real" value part of our synthetic key in Descending order
    
            return comp;
    
        }
    }