Python 无法在NaiveBayes Spark示例中将字符串转换为浮点
我正在学习Spark 1.6的教程 我复制了如下相同的代码:Python 无法在NaiveBayes Spark示例中将字符串转换为浮点,python,apache-spark,pyspark,Python,Apache Spark,Pyspark,我正在学习Spark 1.6的教程 我复制了如下相同的代码: from pyspark.mllib.classification import NaiveBayes, NaiveBayesModel from pyspark.mllib.linalg import Vectors from pyspark.mllib.regression import LabeledPoint from pyspark import SparkContext, SparkConf def parseLine
from pyspark.mllib.classification import NaiveBayes, NaiveBayesModel
from pyspark.mllib.linalg import Vectors
from pyspark.mllib.regression import LabeledPoint
from pyspark import SparkContext, SparkConf
def parseLine(line):
parts = line.split(',')
label = float(parts[0])
features = Vectors.dense([float(x) for x in parts[1].split(' ')])
return LabeledPoint(label, features)
conf= SparkConf()
conf.setAppName("NaiveBaye")
conf.set('spark.driver.memory','6g')
conf.set('spark.executor.memory','6g')
conf.set('spark.cores.max',156)
sc = SparkContext(conf= conf)
data = sc.textFile('sample_naive_bayes_data.txt').map(parseLine)
# Split data aproximately into training (60%) and test (40%)
training, test = data.randomSplit([0.6, 0.4], seed=0)
# Train a naive Bayes model.
model = NaiveBayes.train(training, 1.0)
# Make prediction and test accuracy.
predictionAndLabel = test.map(lambda p: (model.predict(p.features), p.label))
accuracy = 1.0 * predictionAndLabel.filter(lambda (x, v): x == v).count() / test.count()
# Save and load model
model.save(sc, "model")
sameModel = NaiveBayesModel.load(sc, "model")
示例_naive_bayes_data.txt包含以下内容:
0, 1.0 0.0 0.0
0, 2.0 0.0 0.0
1, 0.0 1.0 0.0
1, 0.0 2.0 0.0
2, 0.0 0.0 1.0
2, 0.0 0.0 2.0
这是一个非常基础的教程,但仍然不起作用
它给了我以下错误:无法将此行上的字符串转换为浮点:
features = Vectors.dense([float(x) for x in parts[1].split(' ')])
有人能解释一下为什么以及如何修复它吗
编辑1
我正在尝试对字符串值进行一些更改:
label = str(parts[0])
features = Vectors.dense([str(x) for x in parts[1].split('')])
使用此数据集:
positive, happy food food
positive, dog food food
negative, food happy food
negative, food dog food
neutral, food food happy
neutral, food food dog
存在相同的值,但使用字符串而不是浮点值。
在上一个示例中,精度为:1.0
现在,如果我尝试运行此代码,我将收到以下错误:
ValueError: could not convert string to float: happy on this line:
model = NaiveBayes.train(training, 1.0)
由于拆分,您将获得错误。sample_naive_bayes_data.txt中的空格与split方法中的空格不匹配 替换 与
它应该可以工作。如果您在该行之前打印x,可能有助于发现问题。然后,您可以看到x失败时的情况。
features = Vectors.dense([float(x) for x in parts[1].split(' ')])
features = Vectors.dense([float(x) for x in parts[1].split()])