Apache spark 将类型化JavaRDD转换为行JavaRDD
我正在尝试将类型化的rdd转换为行rdd,然后从中创建数据帧。当我执行代码时,它抛出异常 代码:Apache spark 将类型化JavaRDD转换为行JavaRDD,apache-spark,dataframe,rdd,Apache Spark,Dataframe,Rdd,我正在尝试将类型化的rdd转换为行rdd,然后从中创建数据帧。当我执行代码时,它抛出异常 代码: 问题是这里没有转换。创建行时,它可以接受任意对象。它按原样放置。因此,它并不等同于数据帧创建: spark.createDataFramerdd,Counter.class; 或创建数据集: 编码器编码器=Encoders.beanCounter.class; spark.createDatasetrdd,编码器; 当使用bean类时 所以RowFactory::create在这里不适用。如果您想要
问题是这里没有转换。创建行时,它可以接受任意对象。它按原样放置。因此,它并不等同于数据帧创建: spark.createDataFramerdd,Counter.class; 或创建数据集: 编码器编码器=Encoders.beanCounter.class; spark.createDatasetrdd,编码器; 当使用bean类时 所以RowFactory::create在这里不适用。如果您想要传递RDD,那么所有值都应该已经以一种可以直接与DataFrame with一起使用的形式表示。这意味着您必须将每个计数器显式映射到以下形状的行: Rowvid,bytes,ListRowid1,count1,…,RowidN,countN 并且您的代码应等同于: JavaRDD rows=counters.mapFunction cnt->{ 返回RowFactory.create cnt.vid,cnt.bytes, cnt.blist.stream.mapb->RowFactory.createb.id,b.count.toArray ; }; Dataset df=sqlContext.createDataFramerows,getSchema;
JavaRDD<Counter> rdd = sc.parallelize(counters);
JavaRDD<Row> rowRDD = rdd.map((Function<Counter, Row>) RowFactory::create);
//I am using some schema here based on the class Counter
DataFrame df = sqlContext.createDataFrame(rowRDD, getSchema());
marineDF.show(); //throws Exception
class Counter {
long vid;
byet[] bytes;
List<B> blist;
}
class B {
String id;
long count;
}
private StructType getSchema() {
List<StructField> fields = new ArrayList<>();
fields.add(DataTypes.createStructField("vid", DataTypes.LongType, false));
fields.add(DataTypes.createStructField("bytes",DataTypes.createArrayType(DataTypes.ByteType), false));
List<StructField> bFields = new ArrayList<>();
bFields.add(DataTypes.createStructField("id", DataTypes.StringType, false));
bFields.add(DataTypes.createStructField("count", DataTypes.LongType, false));
StructType bclasSchema = DataTypes.createStructType(bFields);
fields.add(DataTypes.createStructField("blist", DataTypes.createArrayType(bclasSchema, false), false));
StructType schema = DataTypes.createStructType(fields);
return schema;
}
java.lang.ClassCastException: test.spark.SampleTest$A cannot be cast to java.lang.Long
at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:110)
at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getLong(rows.scala:42)
at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getLong(rows.scala:221)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$LongConverter$.toScalaImpl(CatalystTypeConverters.scala:367)