Hadoop Mahout转置矩阵_Hadoop_Mahout_Transpose

Hadoop Mahout转置矩阵

hadoop

Hadoop Mahout转置矩阵,hadoop,mahout,transpose,Hadoop,Mahout,Transpose,我是个新手。我正试图用Mahout-transpose命令行转换矩阵我的数据源文件中的每一行看起来都像：1；456;789;012;.... .键是每行中的第一个元素（在本例中为“1”）。每一行都是矩阵的向量我尝试用“，”或“空格”来更改分隔符，但没有成功为了转换矩阵，我首先使用以下命令将hdfs数据文件转换为序列文件： mahout seqdirectory -c utf-8 -i /test/myfile -p /test/myfile_seq mahout seq2spars

我是个新手。我正试图用Mahout-transpose命令行转换矩阵

我的数据源文件中的每一行看起来都像：

1；456;789;012;....   .键是每行中的第一个元素（在本例中为“1”）。每一行都是矩阵的向量
我尝试用“，”或“空格”来更改分隔符，但没有成功
为了转换矩阵，我首先使用以下命令将hdfs数据文件转换为序列文件：
mahout seqdirectory -c utf-8 -i /test/myfile -p /test/myfile_seq

mahout seq2sparse -i /test/myfile_seq/chunk-0 -o /test/myfile_vector

sudo -u hdfs mahout transpose --input  /test/myfile_vector//tfidf-vectors/part-r-00000 --numRows 5 --numCols 24

然后，我尝试使用以下命令将序列文件转换为向量：
mahout seqdirectory -c utf-8 -i /test/myfile -p /test/myfile_seq

mahout seq2sparse -i /test/myfile_seq/chunk-0 -o /test/myfile_vector

sudo -u hdfs mahout transpose --input  /test/myfile_vector//tfidf-vectors/part-r-00000 --numRows 5 --numCols 24

然后，为了转换姿势，我使用了以下命令：
mahout seqdirectory -c utf-8 -i /test/myfile -p /test/myfile_seq

mahout seq2sparse -i /test/myfile_seq/chunk-0 -o /test/myfile_vector

sudo -u hdfs mahout transpose --input  /test/myfile_vector//tfidf-vectors/part-r-00000 --numRows 5 --numCols 24

我有几个问题：
请将与Mahout相关的问题发布到Mahout user@maillist，以便从Mahout提交者那里获得更快速、更明确的答案
Mahout的TransposeJob需要一个矩阵作为输入，而不会像u一样处理单个向量。
输入格式是什么并不重要。你可以有一个CSV文件并解析每一行
以下是您试图完成的一系列步骤：
将输入CSV文件转换为命名向量，其中向量ID是您案例中的关键。查看Mahout的CSVIterator的代码，调整它以处理命名向量并解析输入中的每一行
在NamedVector上运行Mahout的RowIdJob，以创建所有向量的矩阵。矩阵的每一行都是您输入的一行。RowIDJob的输出是-matrix和docIndex
矩阵-所有向量的所有级联矩阵的m*n矩阵
docIndex-documentId到documentName的映射（在您的情况下，它将documentId映射到您的键）
将上一步的矩阵输出作为输入馈送到TransposeJob。您需要为CLI指定行数和列数
如果您还有任何问题，请发送至Mahout user@