Java Hadoop中的Mapreduce在使用超过200MB的文件时超出了GC开销限制_Java_Hadoop_Garbage Collection

Java Hadoop中的Mapreduce在使用超过200MB的文件时超出了GC开销限制

java hadoop

Java Hadoop中的Mapreduce在使用超过200MB的文件时超出了GC开销限制,java,hadoop,garbage-collection,Java,Hadoop,Garbage Collection,我正在Hadoop多节点集群（2.4.1）上运行Mapreduce代码。当我尝试使用大小分别为200MB和200MB的两个输入文件运行时，我遇到了超出错误GC开销限制的情况。当我使用非常小的文件时，它运行得很好，并且得到了正确的输出。我的目标是将第一个文件中的每个流量记录与第二个文件中的每个流量记录进行比较，并计算距离，然后取10个最大值，并根据这10个最大值输出到reducer。两个文件中的流记录示例-194.144.0.27 | 192.168.1.5 | 0.0.0 | 0 | 0 |

我正在Hadoop多节点集群（2.4.1）上运行Mapreduce代码。当我尝试使用大小分别为200MB和200MB的两个输入文件运行时，我遇到了超出错误GC开销限制的情况。当我使用非常小的文件时，它运行得很好，并且得到了正确的输出。
我的目标是将第一个文件中的每个流量记录与第二个文件中的每个流量记录进行比较，并计算距离，然后取10个最大值，并根据这10个最大值输出到reducer。

两个文件中的流记录示例-194.144.0.27 | 192.168.1.5 | 0.0.0 | 0 | 0 | 2 | 104 | 1410985350 | 1410985350 | 51915 | 51413 | 6

几张快照：和

以下是Mapper类：

映射器类：

public class mapper extends Mapper<LongWritable, Text, Text, IntWritable> 
{

 private final static IntWritable five = new IntWritable(5);

 private Text counter1;

 ArrayList<String> lines = new ArrayList<String>();
 String str;
 BufferedReader br,in;
 int ddos_line = 0; 
 int normal_line = 0,total_testing_records=4000;
 int K = 10;

  @Override
  protected void setup(Context context) throws IOException, InterruptedException 
  { 
     //BufferedReader in = new BufferedReader(new FileReader("normal"));

      Configuration conf = context.getConfiguration();          
      URI[] cachefiles = context.getCacheFiles();

      FileSystem fs = FileSystem.get(new Configuration());          
      FileStatus[] status = fs.listStatus(new Path(cachefiles[0].toString()));            
      BufferedReader in=new BufferedReader(new InputStreamReader(fs.open(status[0].getPath()))); 


      while((str = in.readLine()) != null)
      {
          lines.add(str);
      }
      in.close();
      //System.out.println("na netti");
  }

@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException 
{

    String line1 = value.toString();
    ddos_line++;
    normal_line = 0;

    double[] count = {-1, -1, -1, -1, -1, -1, -1, -1, -1, -1};
    int[] lineIndex = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0};

    String[] parts = line1.split("\\|");
    String[] linesArray = lines.toArray(new String[lines.size()]);  

    boolean bool = true;
    int t1=0;
    double sum=0;
    while (bool) 
    {
        for(int i=0; i<K;i++)
        {
                if(bool==false) break;
                sum = 0;
                String[] parts2 = linesArray[normal_line].split("\\|");

                for(int x=0;x<13;x++)
                    {
                            if(parts[x].equals(parts2[x]))
                            {
                                t1 = 1;
                            }
                            else t1 = 0;

                            sum += t1;
                    }

                    sum = Math.sqrt(sum);

                    if(count[K-1] <= sum)
                    {
                        count[K-1] = sum;
                        lineIndex[K-1]=normal_line;
                    } 



                    for(int k=0;k<K;k++)
                    {
                        for(int j=0;j<K-1;j++)
                        {   
                            if(count[j] < count[j+1]) 
                            {
                                double temp2 = count[j+1];
                                count[j+1] = count[j];
                                count[j] = temp2;

                                int temp3 = lineIndex[j+1];
                                lineIndex[j+1] = lineIndex[j];
                                lineIndex[j] = temp3;
                            }
                         }
                     }

                //System.out.println(ddos_line + "   " + normal_line);
                if (normal_line + 1 < linesArray.length)
                {
                    normal_line++;
                    continue;
                } 

                else bool = false;
            }


    } // while end

    char[] t = {'d','d','d','d','d','d','d','d','d','d'};
    for(int i=0;i<K;i++)
    {
        if(lineIndex[i] <= total_testing_records/2 ) t[i] = 'n'; 
    }

    int counter_normal=0, counter_ddos=0;
    for(int i=0;i<K;i++)
    {
        if(t[i]=='n')
            counter_normal++;
        else
            counter_ddos++;
        //System.out.println("t[i]: "+t[i]+", counter: "+counter_ddos);

    }

    if(counter_normal<=K/2)
    {
        counter1 = new Text(ddos_line + " : d : "+ counter_ddos);
    }
    else
    {
        counter1 = new Text(ddos_line + " : n : "+ (counter_normal));
    }



    context.write(counter1, five);

    //System.out.println("mapper finished");    
}
  public void run(Context context) throws IOException, InterruptedException 
  {
      setup(context);
      while (context.nextKeyValue()) {
            map(context.getCurrentKey(), context.getCurrentValue(), context);
      }
      cleanup(context);
  }
}

公共类映射器扩展映射器
{
private final static IntWritable five=新的IntWritable（5）；
私人文本计数器1；
ArrayList行=新的ArrayList（）；
字符串str；
缓冲读取器br，in；
int ddos_线=0；
int正常线=0，总测试记录=4000；
int K=10；
@凌驾
受保护的无效设置（上下文上下文）引发IOException、InterruptedException
{ 
//BufferedReader in=新的BufferedReader（新文件读取器（“正常”）；
conf=context.getConfiguration（）；
URI[]cachefiles=context.getCacheFiles（）；
FileSystem fs=FileSystem.get（新配置（））；
FileStatus[]status=fs.listStatus（新路径（cachefiles[0].toString（））；
BufferedReader in=new BufferedReader（新的InputStreamReader（fs.open（状态[0].getPath（）））；
而（（str=in.readLine（））！=null）
{
行。添加（str）；
}
in.close（）；
//System.out.println（“na netti”）；
}
@凌驾
公共void映射（LongWritable键、文本值、上下文上下文）引发IOException、InterruptedException
{
字符串line1=value.toString（）；
ddos_line++；
法线=0；
双[]计数={1，-1，-1，-1，-1，-1，-1，-1，-1，-1}；
int[]lineIndex={0,0,0,0,0,0,0,0,0,0}；
String[]parts=line1.split（“\\\\”）；
String[]linesArray=lines.toArray（新字符串[lines.size（）]）；
布尔布尔布尔=真；
int t1=0；
双和=0；
while（bool）
{
对于（int i=0；i，只需增加任务的记忆，然后：
设置
在您的作业配置中
-Xmx1024m

或者更多，请阅读此文件并进行处理。您的群集配置是什么？@Kishorer747很抱歉Kishorer747没有看到您的回答我的群集是Hadoop 2.4.1 5节点群集，具有双核处理器，每台电脑有8 GB ram。我尝试使用-Xmx*设置高于2 Gig，但得到了相同的错误。@Kishorer747你某处有内存泄漏
-Xmx1024m