Java Spark告诉我功能栏是错的
是什么导致了这个错误。我有点迷路了。 我所发现的一切都帮不了我 堆栈跟踪:Java Spark告诉我功能栏是错的,java,apache-spark,apache-spark-mllib,apache-spark-ml,Java,Apache Spark,Apache Spark Mllib,Apache Spark Ml,是什么导致了这个错误。我有点迷路了。 我所发现的一切都帮不了我 堆栈跟踪: Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Column features must be of type struct<type:tinyint,size:int,indices:array<int>,values:array<double>> but was a
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Column features must be of type struct<type:tinyint,size:int,indices:array<int>,values:array<double>> but was actually struct<type:tinyint,size:int,indices:array<int>,values:array<double>>.
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:43)
at org.apache.spark.ml.PredictorParams$class.validateAndTransformSchema(Predictor.scala:51)
at org.apache.spark.ml.classification.Classifier.org$apache$spark$ml$classification$ClassifierParams$$super$validateAndTransformSchema(Classifier.scala:58)
at org.apache.spark.ml.classification.ClassifierParams$class.validateAndTransformSchema(Classifier.scala:42)
at org.apache.spark.ml.classification.ProbabilisticClassifier.org$apache$spark$ml$classification$ProbabilisticClassifierParams$$super$validateAndTransformSchema(ProbabilisticClassifier.scala:53)
at org.apache.spark.ml.classification.ProbabilisticClassifierParams$class.validateAndTransformSchema(ProbabilisticClassifier.scala:37)
at org.apache.spark.ml.classification.ProbabilisticClassifier.validateAndTransformSchema(ProbabilisticClassifier.scala:53)
at org.apache.spark.ml.Predictor.transformSchema(Predictor.scala:144)
at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74)
at org.apache.spark.ml.Predictor.fit(Predictor.scala:100)
at classifier.Clasafie.trainModel_MPC(Clasafie.java:46)
at classifier.Clasafie.MPC_Classifier(Clasafie.java:75)
at classifier.Clasafie.main(Clasafie.java:30)
public static MultilayerPerceptronClassificationModel trainModel_MPC(SparkSession session,JavaRDD<LabeledPoint> data)
{
int[] layers = {784,800};
MultilayerPerceptronClassifier model = new MultilayerPerceptronClassifier().setLayers(layers)
.setSeed((long) 42).setBlockSize(128).setMaxIter(1000);
Dataset<Row> dataset = session.createDataFrame(data.rdd(), LabeledPoint.class);
return model.fit(dataset);
}
线程“main”java.lang.IllegalArgumentException中的异常:要求失败:列功能必须是struct类型,但实际上是struct。
在scala.Predef$.require处(Predef.scala:224)
位于org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:43)
位于org.apache.spark.ml.Predictor参数$class.validateAndTransferorMschema(Predictor.scala:51)
在org.apache.spark.ml.classification.Classifier.org$apache$spark$ml$classification$classification$classificationparams$$super$validateandtransferormschema(Classifier.scala:58)
位于org.apache.spark.ml.classification.ClassifierParams$class.ValidateAndTransferorMschema(Classifier.scala:42)
在org.apache.spark.ml.classification.ProbabilisticClassifier.org$apache$spark$ml$classification$ProbabilisticClassifier参数$$super$validateAndTransferorMschema(ProbabilisticClassifier.scala:53)
在org.apache.spark.ml.classification.ProbabilisticClassifierParams$class.ValidateAndTransferorMschema(ProbabilisticClassifier.scala:37)上
位于org.apache.spark.ml.classification.ProbabilisticClassifier.ValidateAndTransferorMschema(ProbabilisticClassifier.scala:53)
位于org.apache.spark.ml.Predictor.transformSchema(Predictor.scala:144)
位于org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74)
位于org.apache.spark.ml.Predictor.fit(Predictor.scala:100)
在classifier.Clasafie.trainModel\u MPC(Clasafie.java:46)上
MPC_分类器(Clasafie.java:75)
main(Clasafie.java:30)
代码部分:
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Column features must be of type struct<type:tinyint,size:int,indices:array<int>,values:array<double>> but was actually struct<type:tinyint,size:int,indices:array<int>,values:array<double>>.
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:43)
at org.apache.spark.ml.PredictorParams$class.validateAndTransformSchema(Predictor.scala:51)
at org.apache.spark.ml.classification.Classifier.org$apache$spark$ml$classification$ClassifierParams$$super$validateAndTransformSchema(Classifier.scala:58)
at org.apache.spark.ml.classification.ClassifierParams$class.validateAndTransformSchema(Classifier.scala:42)
at org.apache.spark.ml.classification.ProbabilisticClassifier.org$apache$spark$ml$classification$ProbabilisticClassifierParams$$super$validateAndTransformSchema(ProbabilisticClassifier.scala:53)
at org.apache.spark.ml.classification.ProbabilisticClassifierParams$class.validateAndTransformSchema(ProbabilisticClassifier.scala:37)
at org.apache.spark.ml.classification.ProbabilisticClassifier.validateAndTransformSchema(ProbabilisticClassifier.scala:53)
at org.apache.spark.ml.Predictor.transformSchema(Predictor.scala:144)
at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74)
at org.apache.spark.ml.Predictor.fit(Predictor.scala:100)
at classifier.Clasafie.trainModel_MPC(Clasafie.java:46)
at classifier.Clasafie.MPC_Classifier(Clasafie.java:75)
at classifier.Clasafie.main(Clasafie.java:30)
public static MultilayerPerceptronClassificationModel trainModel_MPC(SparkSession session,JavaRDD<LabeledPoint> data)
{
int[] layers = {784,800};
MultilayerPerceptronClassifier model = new MultilayerPerceptronClassifier().setLayers(layers)
.setSeed((long) 42).setBlockSize(128).setMaxIter(1000);
Dataset<Row> dataset = session.createDataFrame(data.rdd(), LabeledPoint.class);
return model.fit(dataset);
}
公共静态多层PerceptronClassificationModel trainModel_MPC(SparkSession会话,JavaRDD数据)
{
int[]层={784800};
MultilayerPerceptronClassifier模型=新的MultilayerPerceptronClassifier()。设置层(层)
.setSeed((长)42)、立根锁尺寸(128)、setMaxIter(1000);
Dataset数据集=session.createDataFrame(data.rdd(),LabeledPoint.class);
returnmodel.fit(数据集);
}
我认为问题在于使用正确包中的LabelPoint
类
检查完整的包并使用on from ml包而不是从mllib
我想,你正在使用-
org.apache.spark.mllib.regression.LabeledPoint
请使用(spark v2.0.0中介绍)-
org.apache.spark.ml.feature.LabeledPoint