Java Map Reduce—如何在单个作业中对多个属性进行分组和聚合
我目前正在努力使用MapReduce。 我有以下数据集:Java Map Reduce—如何在单个作业中对多个属性进行分组和聚合,java,hadoop,mapreduce,grouping,aggregation,Java,Hadoop,Mapreduce,Grouping,Aggregation,我目前正在努力使用MapReduce。 我有以下数据集: 1,John,Computer 2,Anne,Computer 3,John,Mobile 4,Julia,Mobile 5,Jack,Mobile 6,Jack,TV 7,John,Computer 8,Jack,TV 9,Jack,TV 10,Anne,Mobile 11,Anne,Computer 12,Julia,Mobile 现在,我想将MapReduce应用于分组和 此数据集上的聚合,以便 不仅显示了谁买了多少次东西, 还有
1,John,Computer
2,Anne,Computer
3,John,Mobile
4,Julia,Mobile
5,Jack,Mobile
6,Jack,TV
7,John,Computer
8,Jack,TV
9,Jack,TV
10,Anne,Mobile
11,Anne,Computer
12,Julia,Mobile
现在,我想将MapReduce应用于分组和
此数据集上的聚合,以便
不仅显示了谁买了多少次东西,
还有产品是什么,订购最多的是什么
因此,输出应该如下所示:
John 3 Computer
Anne 3 Mobile
Jack 4 TV
Julia 2 Mobile
我目前实现的映射器以及减缩器
看起来是这样的,它完美地返回了订单数量
然而,由个人做出的,我真的不知道该怎么做
以获得所需的输出
static class CountMatchesMapper extends Mapper<Object,Text,Text,IntWritable> {
@Override
protected void map(Object key, Text value, Context ctx) throws IOException, InterruptedException {
String row = value.toString();
String[] row_part = row.split(",");
try{
ctx.write(new Text(row_part[1]), new IntWritable(1));
catch (IOException e) {
}
catch (InterruptedException e) {
}
}
}
}
static class CountMatchesReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context ctx) throws IOException, InterruptedException {
int i = 0;
for (IntWritable value : values) i += value.get();
try{
ctx.write(key, new IntWritable(i));
}
catch (IOException e) {
}
catch (InterruptedException e) {
}
}
}
我非常感谢任何有效的解决方案和帮助
提前谢谢 如果我正确理解了您的需求,我认为第二行输出应该是:
Anne 3 Computer
基于输入。安妮总共购买了3种产品:2台电脑和1台手机
我这里有一个非常基本和简单的方法,它不考虑边缘情况等,但可以给你一些指导:
static class CountMatchesMapper extends Mapper<LongWritable, Text, Text, Text> {
private Text outputKey = new Text();
private Text outputValue = new Text();
@Override
protected void map(LongWritable key, Text value, Context ctx) throws IOException, InterruptedException {
String row = value.toString();
String[] row_part = row.split(",");
outputKey.set(row_part[1]);
outputValue.set(row_part[2]);
ctx.write(outputKey, outputValue);
}
}
static class CountMatchesReducer extends Reducer<Text, Text, Text, NullWritable> {
private Text output = new Text();
@Override
protected void reduce(Text key, Iterable<Text> values, Context ctx) throws IOException, InterruptedException {
HashMap<String, Integer> productCounts = new HashMap();
int totalProductsBought = 0;
for (Text value : values) {
String productBought = value.toString();
int count = 0;
if (productCounts.containsKey(productBought)) {
count = productCounts.get(productBought);
}
productCounts.put(productBought, count + 1);
totalProductsBought += 1;
}
String topProduct = getTopProductForPerson(productCounts);
output.set(key.toString() + " " + totalProductsBought + " " + topProduct);
ctx.write(output, NullWritable.get());
}
private String getTopProductForPerson(Map<String, Integer> productCounts) {
String topProduct = "";
int maxCount = 0;
for (Map.Entry<String, Integer> productCount : productCounts.entrySet()) {
if (productCount.getValue() > maxCount) {
maxCount = productCount.getValue();
topProduct = productCount.getKey();
}
}
return topProduct;
}
}
上面将给出您描述的输出
如果您想要一个适当的解决方案来扩展等,那么您可能需要一个复合键和自定义GroupComparator。这样,您将能够添加组合器以及使其更有效。但是,上述方法适用于一般情况。谢谢!是的,很抱歉和安妮搞混了……这正是我想要的