Hadoop 如何从Mahout运行Kmean集群?
嗨,我试着运行《Mahout在行动》第7章(k-均值聚类)中的示例。有人能告诉我如何在Hadoop集群(单节点CDH-4.2.1)和Mahout(0.7)中运行该示例吗 以下是我遵循的步骤:Hadoop 如何从Mahout运行Kmean集群?,hadoop,cluster-analysis,mahout,k-means,Hadoop,Cluster Analysis,Mahout,K Means,嗨,我试着运行《Mahout在行动》第7章(k-均值聚类)中的示例。有人能告诉我如何在Hadoop集群(单节点CDH-4.2.1)和Mahout(0.7)中运行该示例吗 以下是我遵循的步骤: 将代码(从)复制到本地计算机上的EclipseIDE中 将这些JAR放入我的Eclipse项目中 hadoop-common-2.0.0-cdh4.2.1.jar hadoop-hdfs-2.0.0-cdh4.2.1.jar hadoop-mapreduce-client-core-2.0.0-cdh4.2
Exception in thread "main" java.lang.NoClassDefFoundError: FileSystem
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2427)
at java.lang.Class.getMethod0(Class.java:2670)
at java.lang.Class.getMethod(Class.java:1603)
at org.apache.hadoop.util.RunJar.main(RunJar.java:202)
Caused by: java.lang.ClassNotFoundException: FileSystem
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 5 more
有人能帮我解决我遗漏的问题吗?或者我的执行方式有错吗
其次,我想知道如何在CSV文件上运行K-mean群集
提前感谢:)给定的代码有误导性,代码
Cluster cluster = new Cluster(vec, i, new EuclideanDistanceMeasure());
writer.append(new Text(cluster.getIdentifier()), cluster);
}
writer.close();
KMeansDriver.run(conf, new Path("testdata/points"), new Path("testdata/clusters"),
new Path("output"), new EuclideanDistanceMeasure(), 0.001, 10,
true, false);
SequenceFile.Reader reader = new SequenceFile.Reader(fs,
new Path("output/" + Cluster.CLUSTERED_POINTS_DIR
+ "/part-m-00000"), conf);
应该由
Kluster cluster = new Kluster(vec, i, new EuclideanDistanceMeasure());
writer.append(new Text(cluster.getIdentifier()), cluster);
}
writer.close();
KMeansDriver.run(conf, new Path("testdata/points"), new Path("testdata/clusters"),
new Path("output"), new EuclideanDistanceMeasure(), 0.001, 10,
true, false);
SequenceFile.Reader reader = new SequenceFile.Reader(fs,
new Path("output/" + Kluster.CLUSTERED_POINTS_DIR
+ "/part-m-00000"), conf);
集群是一个接口,而集群是一个类。请查看更多信息
要使用csv文件运行kmeans,首先必须创建一个SequenceFile,作为KmeansDriver中的参数传递。以下代码读取CSV文件“points.CSV”的每一行,并将其转换为矢量,然后将其写入序列文件“points.seq”
试试看(
BufferedReader=new BufferedReader(新文件阅读器(“testdata2/points.csv”);
SequenceFile.Writer Writer=new SequenceFile.Writer(fs、conf、新路径(“testdata2/points.seq”)、LongWritable.class、VectorWritable.class)
) {
弦线;
长计数器=0;
而((line=reader.readLine())!=null){
字符串[]c=行。拆分(“,”);
如果(c.长度>1){
double[]d=新的double[c.长度];
for(int i=0;i
希望能有帮助 你能运行Hadoop和Mahout附带的示例吗?可能您使用的“hadoop”命令已损坏,并且没有正确设置类路径。我可以在hadoop集群上运行MR代码,甚至可以运行Mahout合成控制数据示例。
try (
BufferedReader reader = new BufferedReader(new FileReader("testdata2/points.csv"));
SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf,new Path("testdata2/points.seq"), LongWritable.class, VectorWritable.class)
) {
String line;
long counter = 0;
while ((line = reader.readLine()) != null) {
String[] c = line.split(",");
if(c.length>1){
double[] d = new double[c.length];
for (int i = 0; i < c.length; i++)
d[i] = Double.parseDouble(c[i]);
Vector vec = new RandomAccessSparseVector(c.length);
vec.assign(d);
VectorWritable writable = new VectorWritable();
writable.set(vec);
writer.append(new LongWritable(counter++), writable);
}
}
writer.close();
}