Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/date/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark Pyspark从CrossValidator中的每个子模型检索度量(AUC ROC)_Apache Spark_Pyspark_Random Forest_Apache Spark Mllib_Cross Validation - Fatal编程技术网

Apache spark Pyspark从CrossValidator中的每个子模型检索度量(AUC ROC)

Apache spark Pyspark从CrossValidator中的每个子模型检索度量(AUC ROC),apache-spark,pyspark,random-forest,apache-spark-mllib,cross-validation,Apache Spark,Pyspark,Random Forest,Apache Spark Mllib,Cross Validation,使用crossValidator时,如何返回每个折叠/子模型的单个auc roc分数 文档表明collectSubModels=True应该保存所有模型,而不仅仅是最好的或平均的,但是在检查model.subModels之后,我找不到如何打印它们 下面的示例仅在缺少model.subModels.auccore时起作用 期望的结果是每一次折叠都有相应的分数,如[fold1:0.85,fold2:0.07,fold3:0.55] from pyspark.ml.feature import Vec

使用crossValidator时,如何返回每个折叠/子模型的单个auc roc分数

文档表明collectSubModels=True应该保存所有模型,而不仅仅是最好的或平均的,但是在检查model.subModels之后,我找不到如何打印它们

下面的示例仅在缺少model.subModels.auccore时起作用

期望的结果是每一次折叠都有相应的分数,如[fold1:0.85,fold2:0.07,fold3:0.55]

from pyspark.ml.feature import VectorAssembler
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder
from pyspark.ml.evaluation import BinaryClassificationEvaluator

#Creating test dataframe
training = spark.createDataFrame([
    (1,0,1),
    (1,0,0),
    (0,1,1),
    (0,1,0)], ["label", "feature1", "feature2"])

#Vectorizing features for modelling

assembler = VectorAssembler(inputCols=['feature1','feature2'],outputCol="features")
prepped = assembler.transform(training).select('label','features')

#setting variables and configuring CrossValidator

rf = RandomForestClassifier(labelCol="label", featuresCol="features")
params = ParamGridBuilder().build()
evaluator = BinaryClassificationEvaluator()
folds = 3

cv = CrossValidator(estimator=rf,
estimatorParamMaps=params,
evaluator=evaluator,
numFolds=folds,
collectSubModels=True
)

#Fitting model
model = cv.fit(prepped)

#Print Metrics
print(model)
print()
print(model.avgMetrics)
print()
print(model.subModels)

>>>>>Return:
>>>>>CrossValidatorModel_3a5c95c6d8d2
>>>>>()
>>>>>[0.8333333333333333]
>>>>>()
>>>>>[[RandomForestClassificationModel (uid=RandomForestClassifier_95da3a68af93) with 20 trees], >>>>>[RandomForestClassificationModel (uid=RandomForestClassifier_95da3a68af93) with 20 trees], >>>>>[RandomForestClassificationModel (uid=RandomForestClassifier_95da3a68af93) with 20 trees]]