Python 无法在NaiveBayes Spark示例中将字符串转换为浮点

Python 无法在NaiveBayes Spark示例中将字符串转换为浮点,python,apache-spark,pyspark,Python,Apache Spark,Pyspark,我正在学习Spark 1.6的教程 我复制了如下相同的代码: from pyspark.mllib.classification import NaiveBayes, NaiveBayesModel from pyspark.mllib.linalg import Vectors from pyspark.mllib.regression import LabeledPoint from pyspark import SparkContext, SparkConf def parseLine

我正在学习Spark 1.6的教程

我复制了如下相同的代码:

from pyspark.mllib.classification import NaiveBayes, NaiveBayesModel
from pyspark.mllib.linalg import Vectors
from pyspark.mllib.regression import LabeledPoint
from pyspark import SparkContext, SparkConf


def parseLine(line):
    parts = line.split(',')
    label = float(parts[0])
    features = Vectors.dense([float(x) for x in parts[1].split(' ')])
    return LabeledPoint(label, features)

conf= SparkConf()
conf.setAppName("NaiveBaye")
conf.set('spark.driver.memory','6g')
conf.set('spark.executor.memory','6g')
conf.set('spark.cores.max',156)

sc = SparkContext(conf= conf)

data = sc.textFile('sample_naive_bayes_data.txt').map(parseLine)

# Split data aproximately into training (60%) and test (40%)
training, test = data.randomSplit([0.6, 0.4], seed=0)

# Train a naive Bayes model.
model = NaiveBayes.train(training, 1.0)

# Make prediction and test accuracy.
predictionAndLabel = test.map(lambda p: (model.predict(p.features), p.label))
accuracy = 1.0 * predictionAndLabel.filter(lambda (x, v): x == v).count() / test.count()

# Save and load model
model.save(sc, "model")
sameModel = NaiveBayesModel.load(sc, "model")
示例_naive_bayes_data.txt包含以下内容:

0, 1.0 0.0 0.0
0, 2.0 0.0 0.0
1, 0.0 1.0 0.0
1, 0.0 2.0 0.0
2, 0.0 0.0 1.0
2, 0.0 0.0 2.0
这是一个非常基础的教程,但仍然不起作用

它给了我以下错误:无法将此行上的字符串转换为浮点:

features = Vectors.dense([float(x) for x in parts[1].split(' ')])
有人能解释一下为什么以及如何修复它吗

编辑1 我正在尝试对字符串值进行一些更改:

label = str(parts[0])
features = Vectors.dense([str(x) for x in parts[1].split('')])
使用此数据集:

positive, happy food food
positive, dog food food
negative, food happy food
negative, food dog food
neutral, food food happy
neutral, food food dog
存在相同的值,但使用字符串而不是浮点值。 在上一个示例中,精度为:1.0

现在,如果我尝试运行此代码,我将收到以下错误:

ValueError: could not convert string to float: happy on this line: 
model = NaiveBayes.train(training, 1.0)

由于拆分,您将获得错误。sample_naive_bayes_data.txt中的空格与split方法中的空格不匹配

替换


它应该可以工作。

如果您在该行之前打印x,可能有助于发现问题。然后,您可以看到x失败时的情况。
features = Vectors.dense([float(x) for x in parts[1].split(' ')])
features = Vectors.dense([float(x) for x in parts[1].split()])