Machine learning pyspark随机森林回归器预测多类_Machine Learning_Pyspark_Random Forest_Apache Spark Ml

Machine learning pyspark随机森林回归器预测多类

machine-learning pyspark

Machine learning pyspark随机森林回归器预测多类,machine-learning,pyspark,random-forest,apache-spark-ml,Machine Learning,Pyspark,Random Forest,Apache Spark Ml,我有一个随机森林回归器Pypark ml模型。响应变量是9类当我预测测试数据时，我得到的概率是我需要得到的类使用的代码： rf = RandomForestRegressor(featuresCol="scaled_features") pipeline = Pipeline(stages=[featureIndexer, rf]) # Train model. This also runs the indexer. model = pipeline.fit(trai

我有一个随机森林回归器Pypark ml模型。响应变量是9类

当我预测测试数据时，我得到的概率是我需要得到的类

使用的代码：

rf = RandomForestRegressor(featuresCol="scaled_features")
pipeline = Pipeline(stages=[featureIndexer, rf])

# Train model.  This also runs the indexer.
model = pipeline.fit(train)

# Make predictions.
predictions = model.transform(test)

evaluator = RegressionEvaluator(labelCol="label", predictionCol="prediction", metricName="rmse")
rmse = evaluator.evaluate(predictions)

您正在处理分类问题。因此，您应该使用

RandomForestClassifier

作为ML算法

对于评估，您应该使用

MultiClassificationEvaluator

您听起来很困惑；回归器（如此处的RF）不返回概率，只返回数值。如果你的问题是分类问题，你应该使用相应的分类器，而不是回归器。谢谢你的澄清，但在我的目标变量中有9个类。我需要使用回归器而不是分类器。然而，对于我的测试类预测只有2。模型并不能预测其他类别。恐怕你们听起来还是很困惑。您只是处于一个多类分类设置中（有9个类），这是分类，而不是回归。根据定义，您无法从回归模型中获得概率值（更不用说类了）。谢谢您。是的，我的输入类有[0-9]个类，这让我感到困惑。通过回归拟合，预期预测结果在0-9范围内。当我看到预测结果时，它是0.1,1.3，只有0.xxx和1.xxx。因此我认为这是概率。但是你已经澄清了这些不是概率值，我把我的目标类看作是连续的10个类。当使用RandomForest回归器时，我只对2个类进行预测，如何对所有类进行模型预测。