Java 如何在hadoop mapreduce中实现自适应合并排序_Java_Hadoop

Java 如何在hadoop mapreduce中实现自适应合并排序

java hadoop

Java 如何在hadoop mapreduce中实现自适应合并排序,java,hadoop,Java,Hadoop,我已经用java编写了自适应合并排序。但是作为一个mapreduce程序，有三个类。一个是地图，两个是减速器，第三个是驱动器。我不明白如何将此代码转换为mapreduce，以便在hadoop多节点集群上运行它代码如下： import java.io.BufferedWriter; import java.io.File; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.Fi

我已经用java编写了自适应合并排序。但是作为一个mapreduce程序，有三个类。一个是地图，两个是减速器，第三个是驱动器。我不明白如何将此代码转换为mapreduce，以便在hadoop多节点集群上运行它

代码如下：

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.util.Scanner;

public class MainClass {

    public static void main(String args[]) throws Exception {

        MainClass mainClass = new MainClass();
        MainClass.SortClass sortClass = mainClass.new SortClass();
        sortClass.sortMainFunction();
    }

    public class Constant {
        public static final int MAX = 10000;
        public static final int max = 10000;
    }

    public class Values {
        int st_index;
        int ed_index;
        boolean as_ds;

        public Values(int st_index, int ed_index, boolean as_ds) {
            this.st_index = st_index;
            this.ed_index = ed_index;
            this.as_ds = as_ds;
        }
    }

    public class SortClass {
        int[] a = new int[Constant.MAX];
        int[] b = new int[Constant.MAX];
        int index1 = 0;
        int i = 0;
        int numberOfItem = 0, first_ind, end_ind, tem, k, g, h;
        boolean flag;
        Values node[] = new Values[Constant.max];
        int p, q, r, low, high, j;

        void insert(int st_ind, int end_ind, boolean fl) {


            node[index1] = new Values(st_ind, end_ind, fl);
            index1++;
        }

        public void sortMainFunction() throws Exception {
            addDataIntoArray();

            first_ind = 0;
            end_ind = 0;
            flag = false;
            tem = a[0];
            k = 1;

            while (true) {

                while (true) {
                    if (k < i && tem >= a[k]) {
                        end_ind = k;
                        tem = a[k];
                        k++;
                        flag = true;
                        continue;
                    }
                    break;
                }
                if (flag) {



                    node[index1] = new Values(first_ind, end_ind, flag);
                    index1++;


                    first_ind = k;
                    end_ind = k;
                    if (k >= i) {

                    } else {
                        tem = a[k++];
                    }

                }

                while (true) {
                    if (k < i && tem <= a[k]) {
                        tem = a[k];
                        end_ind = k;
                        flag = false;
                        k++;
                        continue;
                    }
                    break;
                }
                if (!flag) {
                    insert(first_ind, end_ind, flag);
                    first_ind = end_ind = k;
                    if (k >= i) {

                    } else {
                        tem = a[k++];
                    }

                }

                if (end_ind == i)
                    break;
            }

            g = index1 - 1;
            h = 0;
            while (true) {
                if (g / 2 <= 0) {
                    break;
                }
                g /= 2;
                h++;
            }

            System.out.println(h);
            h = (int) ((double) Math.log((double) index1 - 1) / (double) Math.log(2.0));
            System.out.println(h);

            p = 1;
            for (q = 0; q <= h; q++, p *= 2) {
                for (r = 0; r < index1; r++) {
                    if (2 * r * p + p >= index1)
                        break;
                    else {
                        low = 2 * p * r;
                        high = 2 * p * r + p;
                        k = node[low].st_index;
                        if (node[low].as_ds == false && node[high].as_ds == false) {
                            i = node[low].st_index;
                            j = node[high].st_index;

                            while (i <= node[low].ed_index && j <= node[high].ed_index) {
                                if (a[i] <= a[j])
                                    b[k++] = a[i++];
                                else
                                    b[k++] = a[j++];
                            }
                            if (i > node[low].ed_index)
                                while (j <= node[high].ed_index)
                                    b[k++] = a[j++];
                            else
                                while (i <= node[low].ed_index)
                                    b[k++] = a[i++];
                        } else if (node[low].as_ds == false && node[high].as_ds == true) {
                            i = node[low].st_index;
                            j = node[high].ed_index;

                            while (i <= node[low].ed_index && j >= node[high].st_index) {
                                if (a[i] <= a[j])
                                    b[k++] = a[i++];
                                else
                                    b[k++] = a[j--];
                            }
                            if (i > node[low].ed_index)
                                while (j >= node[high].st_index)
                                    b[k++] = a[j--];
                            else
                                while (i <= node[low].ed_index)
                                    b[k++] = a[i++];
                        } else if (node[low].as_ds == true && node[high].as_ds == false) {
                            i = node[low].ed_index;
                            j = node[high].st_index;

                            while (i >= node[low].st_index && j <= node[high].ed_index) {
                                if (a[i] <= a[j])
                                    b[k++] = a[i--];
                                else
                                    b[k++] = a[j++];
                            }
                            if (j > node[high].ed_index)
                                while (i >= node[low].st_index)
                                    b[k++] = a[i--];
                            else
                                while (j <= node[high].ed_index)
                                    b[k++] = a[j++];
                        } else if (node[low].as_ds == true && node[high].as_ds == true) {
                            i = node[low].ed_index;
                            j = node[high].ed_index;

                            while (i >= node[low].st_index && j >= node[high].st_index) {
                                if (a[i] <= a[j])
                                    b[k++] = a[i--];
                                else
                                    b[k++] = a[j--];
                            }
                            if (i < node[low].st_index)
                                while (j >= node[high].st_index)
                                    b[k++] = a[j--];
                            else
                                while (i >= node[low].st_index)
                                    b[k++] = a[i--];
                        }
                        for (k = node[low].st_index; k <= node[high].ed_index; k++)
                            a[k] = b[k];

                        node[low].ed_index = node[high].ed_index;
                        node[high].st_index = node[low].st_index;
                        node[low].as_ds = false;
                        node[high].as_ds = false;
                    }
                }
            }



            BufferedWriter output = null;

            File file = new File("output.txt");
            output = new BufferedWriter(new FileWriter(file));
            for (k = 0; k < numberOfItem; k++) {
                String s = String.valueOf(a[k]);
                output.write(s);
                output.newLine();
            }

            if (output != null) {
                output.close();
            }



        }

        public void addDataInt

    oArray() throws Exception {
                Scanner scanner;
                scanner = new Scanner(new File("input.txt"));
                while (scanner.hasNextInt()) {
                    a[i] = scanner.nextInt();
                    i++;
                    numberOfItem++;
                }

            }
        }

    }

导入java.io.BufferedWriter；
导入java.io.File；
导入java.io.FileInputStream；
导入java.io.FileNotFoundException；
导入java.io.FileOutputStream；
导入java.io.FileWriter；
导入java.io.IOException；
导入java.util.Scanner；
公共类主类{
公共静态void main（字符串args[]）引发异常{
MainClass MainClass=新的MainClass（）；
MainClass.SortClass SortClass=MainClass.new SortClass（）；
sortClass.sortMainFunction（）；
}
公共类常数{
公共静态最终整数最大值=10000；
公共静态最终整数最大值=10000；
}
公共阶级价值观{
国际标准普尔指数；
int-ed_指数；
布尔as_ds；
公共值（int st_索引、int ed_索引、布尔as_ds）{
this.st_index=st_index；
该指数=ed_指数；
this.as_ds=as_ds；
}
}
公共类分类{
int[]a=新的int[Constant.MAX]；
int[]b=新的int[常数最大值]；
int index1=0；
int i=0；
int numberOfItem=0，first_ind，end_ind，tem，k，g，h；
布尔标志；
值节点[]=新值[常数.max]；
int p，q，r，低，高，j；
空插入（int st_ind，int end_ind，布尔fl）{
节点[index1]=新值（st_ind，end_ind，fl）；
index1++；
}
public void sortMainFunction（）引发异常{
addDataIntoArray（）；
第一个指数=0；
end_ind=0；
flag=false；
tem=a[0]；
k=1；
while（true）{
while（true）{
如果（k=a[k]）{
end_ind=k；
tem=a[k]；
k++；
flag=true；
继续；
}
打破
}
国际单项体育联合会（旗）{
节点[index1]=新值（第一个索引、结束索引、标志）；
index1++；
第一_ind=k；
end_ind=k；
如果（k>=i）{
}否则{
tem=a[k++]；
}
}
while（true）{
如果（k如果（g/2这不是合并排序。合并排序使用吨递归，而且要短得多。如果它比冒泡排序运行得快，我会感到惊讶。这对stackoverflow的要求太高了。你自己尝试一下，然后带着你尝试过的东西来找我们。同时，阅读你的排序！是的，这不是合并排序。阅读标题。这是一个自适应合并排序。或者，如果你不知道自适应合并排序，那么用谷歌搜索它。自适应合并排序比合并排序快得多。我很抱歉，但它仍然要求太多的堆栈溢出。如果我要求太多，我也很抱歉。事实上，我已经很好地理解了wordcount示例，它由3个类组成。1是map 2是reduce3是驱动程序，在map中文本被转换成令牌。但是我没有一个例子可以很好地解释排序。比如在排序中需要多少类，以及如何处理整数。我已经研究并了解到mapreduce map中进行排序，所以如果我想指定任何排序，那么它需要进行二次排序y排序。但我不明白如何进行二次排序。在阅读了有关自适应合并排序的内容后，它的缺点是内存使用率更高。这会抵消任何大o差异，并且它与合并排序的最坏情况相同。我想我会坚持使用常规合并排序，除非你知道你正在做的是确定的ely是使用自适应合并排序的完美案例。合并排序更易于阅读，因此仅出于这个原因，它在99%的情况下都要好得多。这不是合并排序。合并排序使用吨递归，而且要短得多。如果它比冒泡排序运行得快，我会感到惊讶。这对stackoverflow的要求太高了。请尝试to你自己做，然后带着你尝试过的东西来找我们。同时，仔细阅读你的排序！是的，这不是合并排序。阅读标题。这是自适应合并排序。或者如果你不知道自适应合并排序，那就用谷歌搜索它。自适应合并排序比合并排序快得多。我很抱歉，但它仍然要求太多erflow。如果我问得太多，我也很抱歉。事实上，我已经很好地理解了wordcount示例，它由3个类组成。1是map 2是reduce，3是driver，在map中文本也转换为token。但是我没有一个示例可以很好地解释排序。比如在排序中，需要多少个类，以及如何进行de与整数有关。我已经研究并了解到，在mapreduce中，map会进行排序，因此如果我想指定任何排序，则需要进行二次排序。但我不了解如何进行二次排序。在阅读了有关自适应合并排序的内容后，它的缺点是内存使用量更大。这会消除任何大的o差异，并且无论哪种方式，最坏的情况都与合并排序相同。我想我会坚持使用常规合并排序，除非你知道你所做的绝对是使用自适应合并排序的最佳情况。合并排序更易于阅读，而且