Java Hadoop MapReduce-Euler'；s总积分/总积分之和（和其他数学运算）_Java_Hadoop_Cluster Computing

Java Hadoop MapReduce-Euler'；s总积分/总积分之和（和其他数学运算）

java hadoop cluster-computing

Java Hadoop MapReduce-Euler'；s总积分/总积分之和（和其他数学运算）,java,hadoop,cluster-computing,Java,Hadoop,Cluster Computing,作为我研究的一部分，我正在用不同的并行计算语言实现ToClient（Euler's ToClient）的总和，老实说，我正在努力使用MapReduce。主要目标是在运行时、效率等方面做一种基准测试我的代码现在正在运行，我得到了正确的输出，但是速度非常慢，我想知道为什么是因为我的实现，还是因为Hadoop MadReduce不是为这个目的而设计的。我还实现了一个组合器，因为据我所知，它应该优化代码，但事实并非如此。对不起，如果这个问题看起来很愚蠢，但是我在网上什么也没找到，我已经厌倦了尝

作为我研究的一部分，我正在用不同的并行计算语言实现ToClient（Euler's ToClient）的总和，老实说，我正在努力使用MapReduce。主要目标是在运行时、效率等方面做一种基准测试

我的代码现在正在运行，我得到了正确的输出，但是速度非常慢，我想知道为什么

是因为我的实现，还是因为Hadoop MadReduce不是为这个目的而设计的。我还实现了一个组合器，因为据我所知，它应该优化代码，但事实并非如此。对不起，如果这个问题看起来很愚蠢，但是我在网上什么也没找到，我已经厌倦了尝试所有的事情而没有任何结果

我的输入文件的值范围为1到15000

1 2 3 4 5 6 ... 14998 14999 15000

我正在研究一个由32个节点组成的集群，我的目标是让每个节点计算我的一部分范围（合并器），然后将合并器中的所有“子和”相加

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class NewTotient {

  public static long hcf(long x, long y)
  {
    long t;

    while (y != 0) {
      t = x % y;
      x = y;
      y = t;
    }
    return x;
  }

  public static boolean relprime(long x, long y)
  {
    return hcf(x, y) == 1;
  }

  public static long euler(long n)
  {
    long length, i;

    length = 0;
    for (i = 1; i < n; i++)
      if (relprime(n, i))
        length++;
    return length;
  }

  public static class TotientMapper extends Mapper<LongWritable, Text, Text, IntWritable>{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        for (String val : value.toString().split(" ")) {
            context.write(new Text(), new IntWritable(Integer.valueOf(val)));
        }
    }
  }

  public static class TotientCombiner extends Reducer<Text,IntWritable,Text,IntWritable> {
    //private IntWritable result = new IntWritable();

    protected void reduce(Text key, Iterable<IntWritable> values, Context context)throws IOException, InterruptedException {
          int sum = 0;
          for (IntWritable val : values) {
              sum += NewTotient.euler(val.get());
          }
      }
  }

  public static class TotientReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
    //private IntWritable result = new IntWritable();

    protected void reduce(Text key, Iterable<IntWritable> values, Context context)throws IOException, InterruptedException {
          int sum = 1;
          for (IntWritable val : values) {
              sum += val.get();
          }
          context.write(null, new IntWritable(sum));
      }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    System.out.println("\n\n__________________________________________________________\n"+"Starting Job\n"+"__________________________________________________________\n\n");
    final long startTime = System.currentTimeMillis();

    Job job = Job.getInstance(conf, "Sum of Totient");
    job.setJarByClass(NewTotient.class);
    job.setMapperClass(TotientMapper.class);
    job.setCombinerClass(TotientCombiner.class);
    job.setReducerClass(TotientReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    //job.setOutputKeyClass(Text.class);
    //job.setOutputValueClass(Text.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    job.waitForCompletion(true);
    final double duration = (System.currentTimeMillis() - startTime)/1000.0;
    System.out.println("\n\n__________________________________________________________\n"+"Job Finished in " + duration + " seconds\n"+"__________________________________________________________\n\n");
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

使用java中的顺序代码，速度更快：

real    0m0.512s
user    0m0.279s
sys     0m0.142s

为了清楚起见，我必须使用这种计算方法，因为它足够慢，可以在不同的系统之间进行有趣的比较，即使我知道有计算所有素因子及其倍数并从n中减去该计数以得到总函数值的想法，我也无法使用更智能的计算方法来提高系统的速度（素因子和素因子的倍数的gcd不会为1）.

在这里，您可以在一行中提供来自文件的输入。映射器中使用的键是新行，因此由于只有一行，它将由单个映射任务处理，因此不会并行处理输入。您可以做的一件事是在新行中提供每个输入编号，而不是空格，并相应地更改映射器。

此外，combiner在这里也没有多大意义，因为您在映射输出中没有使用不同的键

我对映射器应该如何更改感到有点困惑。我可能误解了什么，但我在哪里说我正在使用新行？通常情况下，我使用行按新行处理。*否？文件输入的默认输入格式为TextInputFormat，它将文件中的每个行号（偏移量）视为键，将整行视为值，RecordReader将读取文件中的每一行，并将键和值传递给映射器。请参阅下面描述mapreduce程序工作原理的链接。

real    0m0.512s
user    0m0.279s
sys     0m0.142s