Java 将多个参数发送到reducer MapReduce
我已经编写了一段代码,它执行类似于SQL GroupBy的操作 我获取的数据集如下所示:Java 将多个参数发送到reducer MapReduce,java,hadoop,mapreduce,Java,Hadoop,Mapreduce,我已经编写了一段代码,它执行类似于SQL GroupBy的操作 我获取的数据集如下所示: 25078868141920090906200937200909619,周日,周末,在线,早上,外出,语音,25078,按秒付费,成功发布服务,17,0,1,21.25635-10-112-30455 公共类MyMap扩展映射器{ 公共void映射(LongWritable键、文本值、上下文)引发IOException { 字符串行=value.toString(); String[]属性=line.s
25078868141920090906200937200909619,周日,周末,在线,早上,外出,语音,25078,按秒付费,成功发布服务,17,0,1,21.25635-10-112-30455
公共类MyMap扩展映射器{
公共void映射(LongWritable键、文本值、上下文)引发IOException
{
字符串行=value.toString();
String[]属性=line.split(“,”);
double rs=double.parseDouble(属性[17]);
字符串梳=新字符串();
comb=属性[5].concat(属性[8].concat(属性[10]);
write(新文本(comb)、新双写(rs));
}
}
公共类MyReduce扩展了Reducer{
受保护的void reduce(文本键、迭代器值、上下文)
抛出IOException、InterruptedException{
双和=0;
迭代器iter=values.Iterator();
while(iter.hasNext())
{
double val=iter.next().get();
sum=sum+val;
}
write(key,新的DoubleWritable(sum));
};
}
在映射器中,as的值将第17个参数发送到reducer进行求和。现在我还要总结第14个参数,如何将其发送到reducer?如果您的数据类型相同,那么创建一个ArrayWritable类应该可以做到这一点。该类应类似于:
public class DblArrayWritable extends ArrayWritable
{
public DblArrayWritable()
{
super(DoubleWritable.class);
}
}
然后,映射器类看起来像:
public class MyMap extends Mapper<LongWritable, Text, Text, DblArrayWritable>
{
public void map(LongWritable key, Text value, Context context) throws IOException
{
String line = value.toString();
String[] attribute=line.split(",");
DoubleWritable[] values = new DoubleWritable[2];
values[0] = Double.parseDouble(attribute[14]);
values[1] = Double.parseDouble(attribute[17]);
String comb=new String();
comb=attribute[5].concat(attribute[8].concat(attribute[10]));
context.write(new Text(comb),new DblArrayWritable.set(values));
}
}
您可以通过简单地连接这些值并将它们作为文本传递给reducer来处理这个问题,然后reducer将再次拆分它们
另一个选择是实现您自己的可写类。下面是一个如何工作的示例:
public static class PairWritable implements Writable
{
private Double myDouble;
private String myString;
// TODO :- Override the Hadoop serialization/Writable interface methods
@Override
public void readFields(DataInput in) throws IOException {
myLong = in.readDouble();
myString = in.readUTF();
}
@Override
public void write(DataOutput out) throws IOException {
out.writeDouble(myLong);
out.writeUTF(myString);
}
//End of Implementation
//Getter and Setter methods for myLong and mySring variables
public void set(Double d, String s) {
myDouble = d;
myString = s;
}
public Long getLong() {
return myDouble;
}
public String getString() {
return myString;
}
}
public class ObjArrayWritable extends ArrayWritable
{
public ObjArrayWritable()
{
super(Object.class);
}
}
public static class PairWritable implements Writable
{
private Double myDouble;
private String myString;
// TODO :- Override the Hadoop serialization/Writable interface methods
@Override
public void readFields(DataInput in) throws IOException {
myLong = in.readDouble();
myString = in.readUTF();
}
@Override
public void write(DataOutput out) throws IOException {
out.writeDouble(myLong);
out.writeUTF(myString);
}
//End of Implementation
//Getter and Setter methods for myLong and mySring variables
public void set(Double d, String s) {
myDouble = d;
myString = s;
}
public Long getLong() {
return myDouble;
}
public String getString() {
return myString;
}
}