Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/334.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何将java复杂对象转换为spark数据帧_Java_Dataframe_Apache Spark - Fatal编程技术网

如何将java复杂对象转换为spark数据帧

如何将java复杂对象转换为spark数据帧,java,dataframe,apache-spark,Java,Dataframe,Apache Spark,我正在使用JavaSpark,下面是我的代码 JavaRDD<MyComplexEntity> myObjectJavaRDD = resultJavaRDD.flatMap(result -> result.getMyObjects()); DataFrame df = sqlContext.createDataFrame(myObjectJavaRDD, MyComplexEntity.class); df.saveAsParquetFile("s3a://m

我正在使用JavaSpark,下面是我的代码

JavaRDD<MyComplexEntity> myObjectJavaRDD = resultJavaRDD.flatMap(result -> result.getMyObjects());

DataFrame df = sqlContext.createDataFrame(myObjectJavaRDD, MyComplexEntity.class);

df.saveAsParquetFile("s3a://mybucket/test.parquet");

问题是我在从myObjectJavaRDD创建数据帧时在步骤2失败。如何将复杂java对象列表转换为数据帧。谢谢

不管怎样,你能把它转换成Scala吗

Scala支持这种情况下的
case类

对于您的案例,挑战在于您有一个
Seq/Array
internal
case类as=>
private java.util.ArrayList secodaryId

所以可以按照下面的方法来做

// inner case class Identifier
case class Identifier(Id : Integer , uuid : String)
val innerVal = Seq(Identifier(1,"gsgsg"),Identifier(2,"dvggwgwg"))

// Outer case class MyComplexEntity
case class MyComplexEntity(notes : String, identifierArray : Seq[Identifier])
val outerVal = MyComplexEntity("Hello", innerVal)
请注意=>

// See how it is Infered
// unMappedDs: org.apache.spark.sql.Dataset[(String, Seq[(Int, String)])] = [_1: string, _2: array<struct<_1:int,_2:string>>]
outerVal:MyComplexEntity包含标识符对象的列表,如下所示

outerVal:MyComplexEntity=MyComplexEntity(你好,列表(标识符(1,gsgsg),标识符(2,dvggwgwg))

现在,使用数据集

import spark.implicits._
// Convert Our Input Data in Same Structure as your MyComplexEntity
// Only Trick is To 'Reflect' A Seq[(Int,String)] => Seq[Identifier]
// Hence we have to do 2 Mapping once for Outer Case class (MyComplexEntity) And Once For Inner Seq of Identifier
// If We Just Take this Input Data and Convert To DataSet ( without any Schema Inference)
// This is How It looks 

val inputData = Seq(("Some DAY",Seq((210,"wert67"),(310,"bill123"))),
                    ("I WILL BE", Seq((420,"henry678"),(1000,"baba123"))),
                    ("Saturday Night",Seq((1000,"Roger123"),(2000,"God345")))
                    )
                    
val unMappedDs = inputData.toDS

给我们提供=>

// See how it is Infered
// unMappedDs: org.apache.spark.sql.Dataset[(String, Seq[(Int, String)])] = [_1: string, _2: array<struct<_1:int,_2:string>>]
我们得到的结构类似于=>

// See how it is Infered
// unMappedDs: org.apache.spark.sql.Dataset[(String, Seq[(Int, String)])] = [_1: string, _2: array<struct<_1:int,_2:string>>]
resultDs:org.apache.spark.sql.Dataset[MyComplexEntity]=[注意:字符串,标识符数组:数组]

和数据如下:

+--------------+--------------------------------+
|notes         |identifierArray                 |
+--------------+--------------------------------+
|Some DAY      |[[210,wert67], [310,bill123]]   |
|I WILL BE     |[[420,henry678], [1000,baba123]]|
|Saturday Night|[[1000,Roger123], [2000,God345]]|
+--------------+--------------------------------+
使用Scala很容易。
谢谢。

非常感谢您的详细回答,@SanBan。然而,我不能选择走哪条路。我是在Java编写的遗留代码的基础上构建的。不管怎样,我可以用java实现这一点吗?还是就你所知没有办法?再次感谢。只需在Java语法中使用
map(x=>MyComplexEntity(x.\u 1,x.\u 2.map(y=>Identifier(y.\u 1,y.\u 2)))
。如果您喜欢答案,请向上投票:-)
+--------------+--------------------------------+
|notes         |identifierArray                 |
+--------------+--------------------------------+
|Some DAY      |[[210,wert67], [310,bill123]]   |
|I WILL BE     |[[420,henry678], [1000,baba123]]|
|Saturday Night|[[1000,Roger123], [2000,God345]]|
+--------------+--------------------------------+