Python 将Spark数据帧转换为朴素贝叶斯的标签点

Python 将Spark数据帧转换为朴素贝叶斯的标签点,python,dataframe,naivebayes,Python,Dataframe,Naivebayes,我正在尝试将数据帧转换为标签点,以便在朴素贝叶斯分类器中使用它。这是我的代码: # These are the two dataframes train = to_spark_df("train.csv") test = to_spark_df("test.csv") # These are the labels of the six classes labels = [i for i in train.columns if i not in ["id", "comment_text"]]

我正在尝试将数据帧转换为标签点,以便在朴素贝叶斯分类器中使用它。这是我的代码:

# These are the two dataframes
train = to_spark_df("train.csv")
test = to_spark_df("test.csv")

# These are the labels of the six classes
labels = [i for i in train.columns if i not in ["id", "comment_text"]]

tokenizer = Tokenizer(inputCol="comment_text", outputCol="words")
wordsData = tokenizer.transform(train)

word2vec = Word2Vec(inputCol = "words", outputCol = "rawFeatures")
model = word2vec.fit(wordsData)
result = model.transform(wordsData) # This is the feature vector extracted with Word2Vec
此时,我想创建一个LabeledPoint对象,其中“labels”作为第一个字段,包含数据集的类,“result”作为第二个字段,包含特征。我试着绘制地图,但我做不到。有人能帮我吗