Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 将类型化JavaRDD转换为行JavaRDD_Apache Spark_Dataframe_Rdd - Fatal编程技术网

Apache spark 将类型化JavaRDD转换为行JavaRDD

Apache spark 将类型化JavaRDD转换为行JavaRDD,apache-spark,dataframe,rdd,Apache Spark,Dataframe,Rdd,我正在尝试将类型化的rdd转换为行rdd,然后从中创建数据帧。当我执行代码时,它抛出异常 代码: 问题是这里没有转换。创建行时,它可以接受任意对象。它按原样放置。因此,它并不等同于数据帧创建: spark.createDataFramerdd,Counter.class; 或创建数据集: 编码器编码器=Encoders.beanCounter.class; spark.createDatasetrdd,编码器; 当使用bean类时 所以RowFactory::create在这里不适用。如果您想要

我正在尝试将类型化的rdd转换为行rdd,然后从中创建数据帧。当我执行代码时,它抛出异常

代码:


问题是这里没有转换。创建行时,它可以接受任意对象。它按原样放置。因此,它并不等同于数据帧创建:

spark.createDataFramerdd,Counter.class; 或创建数据集:

编码器编码器=Encoders.beanCounter.class; spark.createDatasetrdd,编码器; 当使用bean类时

所以RowFactory::create在这里不适用。如果您想要传递RDD,那么所有值都应该已经以一种可以直接与DataFrame with一起使用的形式表示。这意味着您必须将每个计数器显式映射到以下形状的行:

Rowvid,bytes,ListRowid1,count1,…,RowidN,countN 并且您的代码应等同于:

JavaRDD rows=counters.mapFunction cnt->{ 返回RowFactory.create cnt.vid,cnt.bytes, cnt.blist.stream.mapb->RowFactory.createb.id,b.count.toArray ; }; Dataset df=sqlContext.createDataFramerows,getSchema;
JavaRDD<Counter> rdd = sc.parallelize(counters);
JavaRDD<Row> rowRDD = rdd.map((Function<Counter, Row>) RowFactory::create);

//I am using some schema here based on the class Counter
DataFrame df = sqlContext.createDataFrame(rowRDD, getSchema());
marineDF.show(); //throws Exception 
class Counter {
  long vid;
  byet[] bytes; 
  List<B> blist;
}
class B {
  String id;
  long count;
}
private StructType getSchema() {
List<StructField> fields = new ArrayList<>();
fields.add(DataTypes.createStructField("vid", DataTypes.LongType, false));
fields.add(DataTypes.createStructField("bytes",DataTypes.createArrayType(DataTypes.ByteType), false));

List<StructField> bFields = new ArrayList<>();
bFields.add(DataTypes.createStructField("id", DataTypes.StringType, false));
bFields.add(DataTypes.createStructField("count", DataTypes.LongType, false));

StructType bclasSchema = DataTypes.createStructType(bFields);

fields.add(DataTypes.createStructField("blist", DataTypes.createArrayType(bclasSchema, false), false));
StructType schema = DataTypes.createStructType(fields);
return schema;
}
java.lang.ClassCastException: test.spark.SampleTest$A cannot be cast to java.lang.Long

    at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:110)
    at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getLong(rows.scala:42)
    at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getLong(rows.scala:221)
    at org.apache.spark.sql.catalyst.CatalystTypeConverters$LongConverter$.toScalaImpl(CatalystTypeConverters.scala:367)