Java mapreduce中的拆分方法_Java_Eclipse_Hadoop_Mapreduce

Java mapreduce中的拆分方法

java eclipse hadoop mapreduce

Java mapreduce中的拆分方法,java,eclipse,hadoop,mapreduce,Java,Eclipse,Hadoop,Mapreduce,我有一个输入文件： 101 Alice 23 female IT 45 102 Bob 34 male Finance 89 103 Chris 67 male IT 97 我的地图绘制者： package EmpCtcPack; import java.io.IOException; import org.apache.hadoop.io.Text; import org.apache.hadoop.

我有一个输入文件：

    101 Alice   23  female  IT  45
    102 Bob 34  male    Finance 89
    103 Chris   67  male    IT  97

我的地图绘制者：

    package EmpCtcPack;

    import java.io.IOException;

    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Mapper.Context;

    public class EmpctcMapper extends Mapper<Object, Text, Text, Text>{

    private Text MKey=new Text();
    private Text MValue=new Text();

    public void map(Object key, Text value, Context context
         ) throws IOException, InterruptedException {

    String tempKey= new String();
    String tempValue=new String();

    try
    {
        tempValue=value.toString();
        tempKey=value.toString().split("    ")[3];

    }
    catch (Exception e)
    {
        e.printStackTrace();
    }

    MKey.set(tempKey);
    MValue.set(tempValue);

    context.write(MKey, MValue);
    }
    }

package-empctack；
导入java.io.IOException；
导入org.apache.hadoop.io.Text；
导入org.apache.hadoop.mapreduce.Mapper；
导入org.apache.hadoop.mapreduce.Mapper.Context；
公共类empctmapper扩展映射器{
私有文本MKey=新文本（）；
私有文本MValue=新文本（）；
公共无效映射（对象键、文本值、上下文
)抛出IOException、InterruptedException{
String tempKey=新字符串（）；
字符串tempValue=新字符串（）；
尝试
{
tempValue=value.toString（）；
tempKey=value.toString（）.split（“”[3]；
}
捕获（例外e）
{
e、 printStackTrace（）；
}
MKey.set（tempKey）；
MValue.set（tempValue）；
write（MKey，MValue）；
}
}

我的减速机：

    package EmpCtcPack;

   import java.io.IOException;

   import org.apache.hadoop.io.IntWritable;
   import org.apache.hadoop.io.Text;
   import org.apache.hadoop.mapreduce.Reducer;
   import org.apache.hadoop.mapreduce.Reducer.Context;

   public class EmpCtcReducer extends Reducer<Text,Text,Text,Text> {

   private Text RValue=new Text();
   private Text RKey= new Text();

   public void reduce(Text key, Iterable<Text> values, 
            Context context
            ) throws IOException, InterruptedException {

    Integer i= new Integer(0);              
    String s=new String();      
    Integer t=new Integer(0);
    Text text=new Text();


    try
    {
        for (Text val : values)
        {   


            String arr[]=val.toString().split(" ");
            s=arr[3];
            text.set(s);

            context.write(key, text);

        }   
    }
    catch (Exception e)
    {
        e.printStackTrace();
    }

    }
   }

package-empctack；
导入java.io.IOException；
导入org.apache.hadoop.io.IntWritable；
导入org.apache.hadoop.io.Text；
导入org.apache.hadoop.mapreduce.Reducer；
导入org.apache.hadoop.mapreduce.Reducer.Context；
公共类empcreducer扩展了Reducer{
私有文本RValue=新文本（）；
私有文本RKey=新文本（）；
public void reduce（文本键、Iterable值、，
语境
)抛出IOException、InterruptedException{
整数i=新整数（0）；
字符串s=新字符串（）；
整数t=新整数（0）；
Text Text=新文本（）；
尝试
{
用于（文本值：值）
{   
字符串arr[]=val.toString（）.split（“”）；
s=arr[3]；
文本集；
上下文。写（键、文本）；
}   
}
捕获（例外e）
{
e、 printStackTrace（）；
}
}
}

问题在于分裂法

当我尝试获取

arr[0]

时，它工作正常，我得到了id号（101、102等等）

但是如果我试图获得

arr[1]

或

arr[2]

我得到0。有人知道为什么会这样吗

提前谢谢你

在驱动程序类中使用了combiner，在本例中不需要它

尝试在

split（）中使用正则表达式：

我也犯了同样的错误。这是与组合器类。我删除了combiner类，现在它可以正常工作了。谢谢

请检查如何创建一个最小的、完整的、可验证的示例。您能展示一下您使用arr[0]、arr[1]等的代码吗？我的意思是在s=arr[3]行中，如果是s=arr[0]，那么我会得到id号，如果有任何其他索引（1,2,3…），那么我只会得到0您可以添加驱动程序类吗？我认为您在驱动程序类中使用了combiner，在本例中不需要它。是的，在删除combiner之后，它现在可以正常工作了。非常感谢您字段是按制表符拆分的，我尝试了“\\s+”，但没有help@Mariia-请参阅更新！-请尝试回答中所述的

\\t

或

\\t+

方法。不幸的是，您是否在

映射器

和

还原器

方法中都进行了此更改？问题是组合器类，我在这里不需要它

value.toString().split("\\t+"); //if split-en by multiple tabs
value.toString().split("\\t");   //if split-en by single tab