Java 使用MapReduce分析日志文件
以下是一个日志文件:Java 使用MapReduce分析日志文件,java,hadoop,mapreduce,Java,Hadoop,Mapreduce,以下是一个日志文件: 2011-10-26 06:11:35 user1 210.77.23.12 2011-10-26 06:11:45 user2 210.77.23.17 2011-10-26 06:11:46 user3 210.77.23.12 2011-10-26 06:11:47 user2 210.77.23.89 2011-10-26 06:11:48 user2 210.77.23.12 2011-10-26 06:11:52 user3 210.77.23.12 2011-
2011-10-26 06:11:35 user1 210.77.23.12
2011-10-26 06:11:45 user2 210.77.23.17
2011-10-26 06:11:46 user3 210.77.23.12
2011-10-26 06:11:47 user2 210.77.23.89
2011-10-26 06:11:48 user2 210.77.23.12
2011-10-26 06:11:52 user3 210.77.23.12
2011-10-26 06:11:53 user2 210.77.23.12
...
我想使用MapReduce按第三个字段(用户)的日志记录次数按每行降序排序。换句话说,我希望结果显示为:
user2 4
user3 2
user1 1
现在我有两个问题:
2011-10-26
,06:11:35
,210.77.23.12
,如何让MapReduce忽略它们并选择用户字段谢谢。关于您的第一个问题: 您可能应该将整行代码传递给映射器,并且每次只保留第三个标记用于映射和映射(
user
,1)
public class AnalyzeLogs
{
public static class FindFriendMapper extends Mapper<Object, Text, Text, IntWritable> {
public void map(Object, Text value, Context context) throws IOException, InterruptedException
{
String tempStrings[] = value.toString().split(",");
context.write(new Text(tempStrings[2]), new IntWritable(1));
}
}
您可以将自定义比较器设置为:job.setSortComparatorClass(LogDescComparator.class)代码>
这项工作的减员什么也不做。但是,如果我们不设置一个减缩器,映射器键的排序将无法完成(我们需要这样做)。因此,您需要将IdentityReducer
设置为第二个MR作业的减缩器,以便不进行减缩,但仍然确保映射器的合成关键帧按照我们指定的方式进行排序 非常感谢你的详细回答。这真的很有帮助!
public static class SortLogsMapper extends Mapper<Object, Text, Text, NullWritable> {
public void map(Object, Text value, Context context) throws IOException, InterruptedException
{
context.write(value, new NullWritable());
}
public static class LogDescComparator extends WritableComparator
{
protected LogDescComparator()
{
super(Text.class, true);
}
@Override
public int compare(WritableComparable w1, WritableComparable w2)
{
Text t1 = (Text) w1;
Text t2 = (Text) w2;
String[] t1Items = t1.toString().split(" "); //probably it's a " "
String[] t2Items = t2.toString().split(" ");
String t1Value = t1Items[1];
String t2Value = t2Items[1];
int comp = t2Value.compareTo(t1Value); // We compare using "real" value part of our synthetic key in Descending order
return comp;
}
}