Apache spark 为什么这个pyspark.ml.RandomForestRegressor由于上下文停止而失败?
我试图在一个名为Apache spark 为什么这个pyspark.ml.RandomForestRegressor由于上下文停止而失败?,apache-spark,pyspark,apache-spark-ml,Apache Spark,Pyspark,Apache Spark Ml,我试图在一个名为train的数据帧上训练一个随机森林回归器,如下所示: rf=pyspark.ml.regression.randomforestrestregressor(featuresCol=self.featuresCol,labelCol=self.labelCol) param_grid=ParamGridBuilder()\ .addGrid(右数树[5,10,20])\ .addGrid(rf.maxDepth,[5,10,15])\ .build() crossval=Cros
train
的数据帧上训练一个随机森林回归器
,如下所示:
rf=pyspark.ml.regression.randomforestrestregressor(featuresCol=self.featuresCol,labelCol=self.labelCol)
param_grid=ParamGridBuilder()\
.addGrid(右数树[5,10,20])\
.addGrid(rf.maxDepth,[5,10,15])\
.build()
crossval=CrossValidator(估计器=rf,
估计器参数映射=参数网格,
评估器=回归评估器(),
numFolds=3)
self.model=crossval.fit(列车)
以下是dataframe中的行数、分区数、示例行和dataframe架构:
Training on 26398 examples with 8 partitions
{'features': SparseVector(10479, {5: 1.0, 360: 1.0, 361: 0.2444, 362: -0.9697, 363: 1.0, 10476: -0.0685}),
'label': 989}
root
|-- features: vector (nullable = true)
|-- label: long (nullable = true)
尝试拟合模型后的最终错误消息:
org.apache.spark.SparkException: Job 44 cancelled because SparkContext was shut down
是什么导致了这次失败
掌握
- m4.xlarge
- 8 vCPU
- 16吉比特存储器
- r4.xlarge
- 4 vCPU
- 30.5gib存储器