Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/85.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 如何使用Spark数据帧合并具有相同id的阵列?_Java_Sql_Apache Spark_Dataframe_Apache Spark Sql - Fatal编程技术网

Java 如何使用Spark数据帧合并具有相同id的阵列?

Java 如何使用Spark数据帧合并具有相同id的阵列?,java,sql,apache-spark,dataframe,apache-spark-sql,Java,Sql,Apache Spark,Dataframe,Apache Spark Sql,我的桌子看起来像这样 +-------+--------------------+ |id | c1| +-------+--------------------+ | 1|ArrayBuffer(a,b) | | 1|ArrayBuffer(c ) | | 2|ArrayBuffer(d ) | | 2|ArrayBuffer(e,f) | | 2|ArrayBuffer(g

我的桌子看起来像这样

+-------+--------------------+
|id     |                  c1|
+-------+--------------------+
|      1|ArrayBuffer(a,b)    |
|      1|ArrayBuffer(c  )    |
|      2|ArrayBuffer(d  )    |
|      2|ArrayBuffer(e,f)    |
|      2|ArrayBuffer(g  )    |
|      3|ArrayBuffer(h  )    |
+-------+--------------------+
我希望输出像这样

+-------+--------------------+
|id     |                  c1|
+-------+--------------------+
|      1|ArrayBuffer(a,b,c)  |
|      2|ArrayBuffer(c,d,e,f,g)
|      3|ArrayBuffer(h  )    |
+-------+--------------------+
我是这么想的

SQLQuery=“选择table.id,按table.id从表组中联接(table.c1)

 sqlContext.udf().register("join",
               new UDF1<ArrayBuffer, ArrayBuffer>() {
                   @Override
                   public ArrayBuffer call(ArrayBuffer idArray) {

                      // how do I join them?

                       return idArray;

                   }
              }, DataTypes.StringType);
sqlContext.udf().register(“加入”,
新UDF1(){
@凌驾
公共阵列缓存呼叫(阵列缓存idArray){
//我如何加入他们?
返回方向;
}
},DataTypes.StringType);

Hmm..操作必须发生在数据帧上吗?如果你得到rdd,它可以用一个简单的reduceByKey来完成:
df.rdd.map{case(id:Int,c1:ArrayBuffer[String])=>(id,c1)}.reduceByKey(++)
我最终这样做了,我只是想用一种干净的方式来查询它。