Apache spark sql spark sql协议缓冲区支持

Apache spark sql spark sql协议缓冲区支持,apache-spark-sql,protocol-buffers,apache-spark-dataset,Apache Spark Sql,Protocol Buffers,Apache Spark Dataset,一直在尝试针对java RDD和数据集进行编写,并使用spark的协议缓冲区(v2.5.x)来推断模式。 但是,spark在协议缓冲区字段成员上失败 给一个原型 message FooProto { required string name = 1; required string value = 2; } 并尝试构建一个javabean @Builder @Data @NoArgsConstructor @AllArgsConstructor public class T

一直在尝试针对java RDD和数据集进行编写,并使用spark的协议缓冲区(v2.5.x)来推断模式。 但是,spark在协议缓冲区字段成员上失败 给一个原型

message FooProto {  
    required string name = 1;
    required string value = 2;
}
并尝试构建一个javabean

@Builder
@Data
@NoArgsConstructor
@AllArgsConstructor
public class TestProto implements Serializable {

    private FooProto foobar;

}

然后使用它激发java

 FooProto proto = ParsedHeaderProto.newBuilder()
                                                   .setName("foo")
                                                   .setValue("value")
                                                   .build();
TestProto testProto = TestProto .builder()
                                .foobar(proto)
                                .build();

 JavaRDD<TestProto> rdd = sparkContext.parallelize(Arrays.asList(testProto));
 Dataset<TestProto> dataSet = sqlc.createDataset(rdd.rdd(), Encoders.bean(TestProto.class));
 dataSet.show();

我理解javadocs建议嵌套javabean支持推理。 我的问题是

  • 本机支持已经存在于spark框架中,是否有一个我在上面遗漏的可以解决问题的明显错误
  • 是否存在一个库,该库为spark sql数据集中的协议缓冲区添加了本机支持

  • 解决方法是为每个protobuff创建可序列化的POJO。

    你找到解决方案了吗?没有,我已经找到了解决方案。你找到了解决方案吗?没有,我已经找到了解决方案
    java.lang.UnsupportedOperationException: Cannot have circular references in bean class, but got the circular reference of class class com.google.protobuf.Descriptors$Descriptor
        at org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:123)
        at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$1.apply(JavaTypeInference.scala:133)
        at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$1.apply(JavaTypeInference.scala:131)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
        at org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:131)
        at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$1.apply(JavaTypeInference.scala:133)
        at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$1.apply(JavaTypeInference.scala:131)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
        at org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:131)
        at org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:117)
        at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$1.apply(JavaTypeInference.scala:133)
        at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$1.apply(JavaTypeInference.scala:131)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
        at org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:131)
        at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$1.apply(JavaTypeInference.scala:133)
        at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$1.apply(JavaTypeInference.scala:131)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
        at org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:131)
        at org.apache.spark.sql.catalyst.JavaTypeInference$.inferDataType(JavaTypeInference.scala:55)
        at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:86)
        at org.apache.spark.sql.Encoders$.bean(Encoders.scala:142)
        at org.apache.spark.sql.Encoders.bean(Encoders.scala)