Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/hadoop/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Hadoop 如何从Mahout运行Kmean集群?_Hadoop_Cluster Analysis_Mahout_K Means - Fatal编程技术网

Hadoop 如何从Mahout运行Kmean集群?

Hadoop 如何从Mahout运行Kmean集群?,hadoop,cluster-analysis,mahout,k-means,Hadoop,Cluster Analysis,Mahout,K Means,嗨,我试着运行《Mahout在行动》第7章(k-均值聚类)中的示例。有人能告诉我如何在Hadoop集群(单节点CDH-4.2.1)和Mahout(0.7)中运行该示例吗 以下是我遵循的步骤: 将代码(从)复制到本地计算机上的EclipseIDE中 将这些JAR放入我的Eclipse项目中 hadoop-common-2.0.0-cdh4.2.1.jar hadoop-hdfs-2.0.0-cdh4.2.1.jar hadoop-mapreduce-client-core-2.0.0-cdh4.2

嗨,我试着运行《Mahout在行动》第7章(k-均值聚类)中的示例。有人能告诉我如何在Hadoop集群(单节点CDH-4.2.1)和Mahout(0.7)中运行该示例吗

以下是我遵循的步骤:

  • 将代码(从)复制到本地计算机上的EclipseIDE中

  • 将这些JAR放入我的Eclipse项目中

  • hadoop-common-2.0.0-cdh4.2.1.jar

    hadoop-hdfs-2.0.0-cdh4.2.1.jar

    hadoop-mapreduce-client-core-2.0.0-cdh4.2.1.jar

    mahout-core-0.7-cdh4.3.0.jar

    mahout-core-0.7-cdh4.3.0-job.jar

    mahout-math-0.7-cdh4.3.0.jar

  • 为这个项目制作了一个Jar,并将它复制到我的Hadoop集群上

  • 执行此命令

  • user@INFPH01463U:~$hadoop jar/home/user/apurv/Kmean.jar 试用。简单方法群集

    这给了我以下的错误

    Exception in thread "main" java.lang.NoClassDefFoundError: FileSystem
            at java.lang.Class.getDeclaredMethods0(Native Method)
            at java.lang.Class.privateGetDeclaredMethods(Class.java:2427)
            at java.lang.Class.getMethod0(Class.java:2670)
            at java.lang.Class.getMethod(Class.java:1603)
            at org.apache.hadoop.util.RunJar.main(RunJar.java:202)
    Caused by: java.lang.ClassNotFoundException: FileSystem
            at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
            at java.security.AccessController.doPrivileged(Native Method)
            at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
            ... 5 more
    
    有人能帮我解决我遗漏的问题吗?或者我的执行方式有错吗

    其次,我想知道如何在CSV文件上运行K-mean群集


    提前感谢:)

    给定的代码有误导性,代码

    Cluster cluster = new Cluster(vec, i, new EuclideanDistanceMeasure());
        writer.append(new Text(cluster.getIdentifier()), cluster);
    }
    writer.close();
    
    KMeansDriver.run(conf, new Path("testdata/points"), new Path("testdata/clusters"),
      new Path("output"), new EuclideanDistanceMeasure(), 0.001, 10,
      true, false);
    
    SequenceFile.Reader reader = new SequenceFile.Reader(fs,
        new Path("output/" + Cluster.CLUSTERED_POINTS_DIR
                 + "/part-m-00000"), conf);
    
    应该由

    Kluster cluster = new Kluster(vec, i, new EuclideanDistanceMeasure());
        writer.append(new Text(cluster.getIdentifier()), cluster);
    }
    writer.close();
    
    KMeansDriver.run(conf, new Path("testdata/points"), new Path("testdata/clusters"),
      new Path("output"), new EuclideanDistanceMeasure(), 0.001, 10,
      true, false);
    
    SequenceFile.Reader reader = new SequenceFile.Reader(fs,
        new Path("output/" + Kluster.CLUSTERED_POINTS_DIR
                 + "/part-m-00000"), conf);
    
    集群是一个接口,而集群是一个类。请查看更多信息

    要使用csv文件运行kmeans,首先必须创建一个SequenceFile,作为KmeansDriver中的参数传递。以下代码读取CSV文件“points.CSV”的每一行,并将其转换为矢量,然后将其写入序列文件“points.seq”

    试试看(
    BufferedReader=new BufferedReader(新文件阅读器(“testdata2/points.csv”);
    SequenceFile.Writer Writer=new SequenceFile.Writer(fs、conf、新路径(“testdata2/points.seq”)、LongWritable.class、VectorWritable.class)
    ) {
    弦线;
    长计数器=0;
    而((line=reader.readLine())!=null){
    字符串[]c=行。拆分(“,”);
    如果(c.长度>1){
    double[]d=新的double[c.长度];
    for(int i=0;i

    希望能有帮助

    你能运行Hadoop和Mahout附带的示例吗?可能您使用的“hadoop”命令已损坏,并且没有正确设置类路径。我可以在hadoop集群上运行MR代码,甚至可以运行Mahout合成控制数据示例。
    try (
                BufferedReader reader = new BufferedReader(new FileReader("testdata2/points.csv"));
                SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf,new Path("testdata2/points.seq"), LongWritable.class, VectorWritable.class)
            ) {
                String line;
                long counter = 0;
                while ((line = reader.readLine()) != null) {
                    String[] c = line.split(",");
                    if(c.length>1){
                        double[] d = new double[c.length];
                        for (int i = 0; i < c.length; i++)
                                d[i] = Double.parseDouble(c[i]);
                        Vector vec = new RandomAccessSparseVector(c.length);
                        vec.assign(d);
    
                    VectorWritable writable = new VectorWritable();
                    writable.set(vec);
                    writer.append(new LongWritable(counter++), writable);
                }
            }
            writer.close();
        }