Java Q作业未成功mahout ssvd
我正试图在mahout中的一些tfidf向量上运行ssvd。当我在Java代码中运行它时,如下所示(使用mahout 0.6 JAR),它运行良好:Java Q作业未成功mahout ssvd,java,mahout,decomposition,singular,Java,Mahout,Decomposition,Singular,我正试图在mahout中的一些tfidf向量上运行ssvd。当我在Java代码中运行它时,如下所示(使用mahout 0.6 JAR),它运行良好: public static void main(String[] args){ runSSVDOnSparseVectors(vectorOutputPath + "/tfidf-vectors/part-r-00000", ssvdOutputPath, 1, 0, 30000, 1); } private static vo
public static void main(String[] args){
runSSVDOnSparseVectors(vectorOutputPath
+ "/tfidf-vectors/part-r-00000", ssvdOutputPath, 1, 0, 30000, 1);
}
private static void runSSVDOnSparseVectors(String inputPath, String outputPath,
int rank, int oversampling, int blocks,
int reduceTasks) throws IOException {
Configuration conf = new Configuration();
// get number of reduce tasks from config?
SSVDSolver solver = new SSVDSolver(conf, new Path[] { new Path(
inputPath) }, new Path(outputPath), blocks, rank, oversampling,
reduceTasks);
solver.setcUHalfSigma(true);
solver.setcVHalfSigma(true);
solver.run();
}
我决定将其转换为bash脚本,只使用cli命令,但当我这样做时,我得到了以下错误(在版本0.5和0.7上尝试了这一点,但都不起作用。我可以尝试0.6,但我认为这不是版本问题):
我在集群上以分布式模式运行这个。我读过Q作业失败可能与块大小有关,但我的大于p+k。我还意识到我使用的是一个小得可笑的输入(4个向量),但正如我所说,它在java代码中工作。我很困惑,为什么它可以在java中工作,而不能在CLI中工作。我很确定我对这个函数有相同的参数。我总是可以将java代码打包到一个jar中,并将其放入bash脚本中,但这将是非常困难的
作业日志显示:
2012-07-23 15:00:55,413 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
2012-07-23 15:00:55,417 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6ce53220
2012-07-23 15:00:55,638 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
2012-07-23 15:00:55,697 ERROR org.apache.mahout.common.IOUtils: new m can't be less than n
java.lang.IllegalArgumentException: new m can't be less than n
at org.apache.mahout.math.hadoop.stochasticsvd.qr.GivensThinSolver.adjust(GivensThinSolver.java:109)
at org.apache.mahout.math.hadoop.stochasticsvd.qr.QRFirstStep.cleanup(QRFirstStep.java:233)
at org.apache.mahout.math.hadoop.stochasticsvd.qr.QRFirstStep.close(QRFirstStep.java:89)
at org.apache.mahout.common.IOUtils.close(IOUtils.java:128)
at org.apache.mahout.math.hadoop.stochasticsvd.QJob$QMapper.cleanup(QJob.java:158)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org. apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
2012-07-23 15:00:55,731 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2012-07-23 15:00:55,733 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.IllegalArgumentException: new m can't be less than n
at org.apache.mahout.math.hadoop.stochasticsvd.qr.GivensThinSolver.adjust(GivensThinSolver.java:109)
at org.apache.mahout.math.hadoop.stochasticsvd.qr.QRFirstStep.cleanup(QRFirstStep.java:233)
at org.apache.mahout.math.hadoop.stochasticsvd.qr.QRFirstStep.close(QRFirstStep.java:89)
at org.apache.mahout.common.IOUtils.close(IOUtils.java:128)
at org.apache.mahout.math.hadoop.stochasticsvd.QJob$QMapper.cleanup(QJob.java:158)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
2012-07-23 15:00:55,736 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
提前感谢您的帮助。我实际上认为这是因为tfidf向量中有一些序列文件是空的,因为我使用了太多的还原程序。这对我来说似乎是一个bug。这是不够的信息。此跟踪只是表示作业失败的客户端。您需要发布工作人员的错误。该任务的日志文件没有任何输出。这就是你的意思吗?肯定会有来自任何Hadoop作业的日志,至少有自己的输出。我以前错了。我将日志中的信息添加到问题中。谢谢
2012-07-23 15:00:55,413 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
2012-07-23 15:00:55,417 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6ce53220
2012-07-23 15:00:55,638 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
2012-07-23 15:00:55,697 ERROR org.apache.mahout.common.IOUtils: new m can't be less than n
java.lang.IllegalArgumentException: new m can't be less than n
at org.apache.mahout.math.hadoop.stochasticsvd.qr.GivensThinSolver.adjust(GivensThinSolver.java:109)
at org.apache.mahout.math.hadoop.stochasticsvd.qr.QRFirstStep.cleanup(QRFirstStep.java:233)
at org.apache.mahout.math.hadoop.stochasticsvd.qr.QRFirstStep.close(QRFirstStep.java:89)
at org.apache.mahout.common.IOUtils.close(IOUtils.java:128)
at org.apache.mahout.math.hadoop.stochasticsvd.QJob$QMapper.cleanup(QJob.java:158)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org. apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
2012-07-23 15:00:55,731 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2012-07-23 15:00:55,733 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.IllegalArgumentException: new m can't be less than n
at org.apache.mahout.math.hadoop.stochasticsvd.qr.GivensThinSolver.adjust(GivensThinSolver.java:109)
at org.apache.mahout.math.hadoop.stochasticsvd.qr.QRFirstStep.cleanup(QRFirstStep.java:233)
at org.apache.mahout.math.hadoop.stochasticsvd.qr.QRFirstStep.close(QRFirstStep.java:89)
at org.apache.mahout.common.IOUtils.close(IOUtils.java:128)
at org.apache.mahout.math.hadoop.stochasticsvd.QJob$QMapper.cleanup(QJob.java:158)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
2012-07-23 15:00:55,736 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task