Python 将Spark数据帧转换为朴素贝叶斯的标签点
我正在尝试将数据帧转换为标签点,以便在朴素贝叶斯分类器中使用它。这是我的代码:Python 将Spark数据帧转换为朴素贝叶斯的标签点,python,dataframe,naivebayes,Python,Dataframe,Naivebayes,我正在尝试将数据帧转换为标签点,以便在朴素贝叶斯分类器中使用它。这是我的代码: # These are the two dataframes train = to_spark_df("train.csv") test = to_spark_df("test.csv") # These are the labels of the six classes labels = [i for i in train.columns if i not in ["id", "comment_text"]]
# These are the two dataframes
train = to_spark_df("train.csv")
test = to_spark_df("test.csv")
# These are the labels of the six classes
labels = [i for i in train.columns if i not in ["id", "comment_text"]]
tokenizer = Tokenizer(inputCol="comment_text", outputCol="words")
wordsData = tokenizer.transform(train)
word2vec = Word2Vec(inputCol = "words", outputCol = "rawFeatures")
model = word2vec.fit(wordsData)
result = model.transform(wordsData) # This is the feature vector extracted with Word2Vec
此时,我想创建一个LabeledPoint对象,其中“labels”作为第一个字段,包含数据集的类,“result”作为第二个字段,包含特征。我试着绘制地图,但我做不到。有人能帮我吗