Apache spark spark解析元素数组并比较相关字段_Apache Spark_Pyspark_Apache Spark Sql

Apache spark spark解析元素数组并比较相关字段

apache-spark pyspark

Apache spark spark解析元素数组并比较相关字段,apache-spark,pyspark,apache-spark-sql,Apache Spark,Pyspark,Apache Spark Sql,我有一个如下所示的数据帧，uinfo列是一个数组，我希望有下面的条件 1.当第一个数组元素为id.name时，使用第二个数组元素user1和列id1值并创建一个新的数据帧如果id1值为null，则取id2值 +-----------------+----------+--------+-----+-------+ | uinfo | count| id1 | id2 | +-----------------+----------+-------

我有一个如下所示的数据帧，uinfo列是一个数组，我希望有下面的条件 1.当第一个数组元素为id.name时，使用第二个数组元素user1和列id1值并创建一个新的数据帧

如果id1值为null，则取id2值

+-----------------+----------+--------+-----+-------+
|            uinfo           |   count| id1 |   id2 |
+-----------------+----------+--------+-----+-------+
|   [id.name, user1, example]|       1| aijk|   null|
|   [id.name, user2]         |       3| null|   bcdk|
|   [id.value, overflow]     |       6| 123k|   null|
|   [id.name, user3]         |       7| klmn|   null|
+-----------------+----------+--------+-----+-- ----+

因此，最终的数据帧应该如下所示

+-----------------+--------------+                                                   
|   uinfo         |      customid|
+-----------------+--------------+
|   user1         |          aijk|
|   user2         |          bcdk|
|   user3         |          klmn| 
+-----------------+--------------+

这应该满足您的需要：

df
.select(
  when($"uinfo"(0)==="id.name",$"uinfo"(1)).as("uinfo"),
  coalesce($"id1",$"id2").as("customid")
)
.where($"uinfo".isNotNull)

请在下面找到解决方案

df.withColumn("customid",when(col("uinfo")(0) === "id.name" && !col("id1").isNull,col("id1")).otherwise(col("id2"))).withColumn("uinfo", when(col("uinfo")(0) === "id.name",col("uinfo")(1))).filter(!col("uinfo").isNull).drop("id1","id2","count").show

请检查答案如何检查何时我有更多的id字段要检查是否为空，如果id1为空，那么id2为空，如果id2为空，那么id3为空，如果id3为空id4如何检查何时我有更多的id字段要检查是否为空，如果id1为空，那么id2为空，那么id3为空，如果id3为空id4