Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/399.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 如何在mahout中矢量化文本文件?_Java_Vectorization_Mahout_Bigdata - Fatal编程技术网

Java 如何在mahout中矢量化文本文件?

Java 如何在mahout中矢量化文本文件?,java,vectorization,mahout,bigdata,Java,Vectorization,Mahout,Bigdata,我有一个带有标签和推文的文本文件 我需要将每一行转换为向量值。如果我使用seq2sparse命令,意味着整个文档将转换为向量,但我需要将每一行转换为向量,而不是整个文档。 前任: 键:正值:矢量值(tweet) 我们如何在mahout中实现这一点 /*这就是我所做的*/ StringTokenizer str= new StringTokenizer(line,","); String label=str.nextToken(); whi

我有一个带有标签和推文的文本文件

我需要将每一行转换为向量值。如果我使用
seq2sparse
命令,意味着整个文档将转换为向量,但我需要将每一行转换为向量,而不是整个文档。 前任: 键:正值:矢量值(tweet) 我们如何在mahout中实现这一点


/*这就是我所做的*/

    StringTokenizer str= new StringTokenizer(line,",");
            String label=str.nextToken();
            while (str.hasMoreTokens())
            {
            tweetline =str.nextToken();
            System.out.println("Tweetline"+tweetline);
            StringTokenizer words = new StringTokenizer(tweetline," ");
            while(words.hasMoreTokens()){
            featureList.add(words.nextToken());}
            }
            Vector unclassifiedInstanceVector = new RandomAccessSparseVector(tweetline.split(" ").length);
 FeatureVectorEncoder vectorEncoder = new AdaptiveWordValueEncoder(label);
            vectorEncoder.setProbes(1);
            System.out.println("Feature List: "+featureList);
            for (Object feature: featureList) {
                vectorEncoder.addToVector((String) feature, unclassifiedInstanceVector);
            }
            context.write(new Text("/"+label), new VectorWritable(unclassifiedInstanceVector));

提前感谢

您可以使用SequenceFile.Writer将其写入app hdfs路径

           FS = FileSystem.get(HBaseConfiguration.create());
           String newPath =   "/foo/mahouttest/part-r-00000";
           Path newPathFile = new Path(newPath);
           Text key = new Text();
           VectorWritable value = new VectorWritable();
           SequenceFile.Writer writer = SequenceFile.createWriter(FS, conf, newPathFile,
                key.getClass(), value.getClass());
                 .....
           key.set("c/"+label);
           value.set(unclassifiedInstanceVector );
           writer.append(key,value);
           FS = FileSystem.get(HBaseConfiguration.create());
           String newPath =   "/foo/mahouttest/part-r-00000";
           Path newPathFile = new Path(newPath);
           Text key = new Text();
           VectorWritable value = new VectorWritable();
           SequenceFile.Writer writer = SequenceFile.createWriter(FS, conf, newPathFile,
                key.getClass(), value.getClass());
                 .....
           key.set("c/"+label);
           value.set(unclassifiedInstanceVector );
           writer.append(key,value);