Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/sql-server/23.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python “错误”;属性错误:';Py4JError';对象没有属性';消息';构建决策树模型_Python_Scala_Apache Spark_Pyspark - Fatal编程技术网

Python “错误”;属性错误:';Py4JError';对象没有属性';消息';构建决策树模型

Python “错误”;属性错误:';Py4JError';对象没有属性';消息';构建决策树模型,python,scala,apache-spark,pyspark,Python,Scala,Apache Spark,Pyspark,我将遵循O'Reilly的“Spark高级分析”第4章。这本书是Scala的,我在将代码转换成Python时遇到了麻烦 Scala代码 import org.apache.spark.mllib.linalg._ import org.apache.spark.mllib.regression._ val rawData = sc.textFile("hdfs:///user/ds/covtype.data") val data = rawData.map { line => va

我将遵循O'Reilly的“Spark高级分析”第4章。这本书是Scala的,我在将代码转换成Python时遇到了麻烦

Scala代码

import org.apache.spark.mllib.linalg._
import org.apache.spark.mllib.regression._
val rawData = sc.textFile("hdfs:///user/ds/covtype.data")
val data = rawData.map { line =>
    val values = line.split(',').map(_.toDouble)
    val featureVector = Vectors.dense(values.init)
    val label = values.last - 1
    LabeledPoint(label, featureVector)
}
val Array(trainData, cvData, testData) =
  data.randomSplit(Array(0.8, 0.1, 0.1))
trainData.cache()
cvData.cache()
testData.cache()


import org.apache.spark.mllib.evaluation._
import org.apache.spark.mllib.tree._
import org.apache.spark.mllib.tree.model._
import org.apache.spark.rdd._

def getMetrics(model: DecisionTreeModel, data: RDD[LabeledPoint]):
    MulticlassMetrics = {
 val predictionsAndLabels = data.map(example =>
    (model.predict(example.features), example.label)
 )
 new MulticlassMetrics(predictionsAndLabels)
}
val model = DecisionTree.trainClassifier(
 trainData, 7, Map[Int,Int](), "gini", 4, 100)

val metrics = getMetrics(model, cvData) 
metrics.confusionMatrix
我的Python代码

from pyspark.sql.functions import col, split
import pyspark.mllib.linalg as linal
import pyspark.mllib.regression as regre
import pyspark.mllib.evaluation as eva
import pyspark.mllib.tree as tree
import pyspark.rdd as rd

raw_data = sc.textFile("covtype.data")

def fstDecisionTree(line):
    values = list(map(float,line.split(",")))
    featureVector = linal.Vectors.dense(values[:-1])
    label = values[-1]-1
    ret=regre.LabeledPoint(label, featureVector)
    return regre.LabeledPoint(label, featureVector) 

data = raw_data.map(fstDecisionTree)
trainData,cvData,testData=data.randomSplit([0.8,0.1,0.1])
trainData.cache()
cvData.cache()
testData.cache()

def help_lam(model):
 def _help_lam(dataline):
    print(dataline)
    a=dataline.collect()
    return (model.predict(a[1]),a[0])
return _help_lam

def getMetrics(model, data):
    print(type(model),type(data))
    predictionsAndLabels= data.map(help_lam(model))
    return eva.MulticlassMetrics(predictionsAndLabels)

n_targets=7
max_depth=4
max_bin_count=100
model = tree.DecisionTree.trainClassifier(trainData, n_targets, {}, "gini", max_depth, max_bin_count)

metrics=getMetrics(model,cvData)
当我运行此操作时,当我尝试隐式传递映射迭代时,在
def-help-lam(数据线)
方法
def-help-lam(模型)
内部出现此错误:

AttributeError: 'Py4JError' object has no attribute 'message'

我认为问题出在
模型中。predict
函数

注意:在Python中,当前无法在RDD中使用predict 转变或行动。 而是直接在RDD上调用predict

你能做的就是像这样直接传递特征向量

>>> rdd = sc.parallelize([[1.0], [0.0]])
>>> model.predict(rdd).collect()
[1.0, 0.0]
编辑:

您的
getMetrics
更新可能是:

def getMetrics(model, data):
    labels = data.map(lambda d: d.label)
    features = data.map(lambda d: d.features)
    predictions = model.predict(features)
    predictionsAndLabels = predictions.zip(labels)
    return eva.MulticlassMetrics(predictionsAndLabels)