Scala 如何将下表转换为所需格式?
我已将下表作为数据帧加载:Scala 如何将下表转换为所需格式?,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,我已将下表作为数据帧加载: Id Name customCount Custom1 Custom1value custom2 custom2Value custom3 custom3Value 1 qwerty 2 Height 171 Age 76 Null Null 2 asdfg 2 Weight
Id Name customCount Custom1 Custom1value custom2 custom2Value custom3 custom3Value
1 qwerty 2 Height 171 Age 76 Null Null
2 asdfg 2 Weight 78 Height 166 Null Null
3 zxcvb 3 Age 28 SkinColor white Height 67
4 tyuio 1 Height 177 Null Null Null Null
5 asdfgh 2 SkinColor brown Age 34 Null Null
我需要将此表更改为以下格式:
Id Name customCount Height Weight Age SkinColor
1 qwerty 2 171 Null 76 Null
2 asdfg 2 161 78 Null Null
3 zxcvb 3 67 Null 28 white
4 tyuio 1 177 Null Null Null
5 asdfgh 2 Null Null 34 brown
我尝试了两个自定义字段列:
val rawDf= spark.read.option("Header",false).options(Map("sep"->"|")).csv("/sample/data.csv")
rawDf.createOrReplaceTempView("Table")
val dataframe=spark.sql("select distinct * from (select `_c3` from Table union select `_c5` from Table)")
val dfWithDistinctColumns=dataframe.toDF("colNames")
val list=dfWithDistinctColumns.select("colNames").map(x=>x.getString(0)).collect().toList
val rawDfWithSchema=rawDf.toDF("Id","Name",customCount","h1","v1","h2","v2")
val expectedDf=list.foldLeft(rawDfWithSchema)((df1,c)=>(df1.withColumn(c, when(col("h1")===c,col("v1")).when(col("h2")===c,col("v2")).otherwise(null)))).drop("h1","h2","v1","v2")
但当我在3个自定义字段上尝试联合时,我无法在多个列上进行联合。
您能给出一些想法/解决方案吗?您可以做一个透视,但您还需要首先清理数据帧的格式:
val df2=df.select(
$“Id”、$“Name”、$“customCount”,
爆炸(阵列)(
数组($“Custom1”,$“Custom1value”),
数组($“custom2”,$“custom2Value”),
数组($“custom3”,“custom3Value”)
)).别名(“自定义”)
).选择(
$“Id”、$“Name”、$“customCount”,
$“自定义”(0).别名(“键”),
$“自定义”(1).别名(“值”)
).群比(
“Id”、“Name”、“customCount”
).pivot(“key”).agg(first(“value”)).drop(“null”).orderBy(“Id”)
df2.show
+---+------+-----------+----+------+---------+------+
|Id |姓名|客户计数|年龄|身高|肤色|体重|
+---+------+-----------+----+------+---------+------+
|1 | qwerty | 2 | 76 | 171 |空|空|
|2 | asdfg | 2 |空| 166 |空| 78|
|3 | zxcvb | 3 | 28 | 67 |白色|零|
|4 | tyuio | 1 | null | 177 | null | null|
|5 | asdfgh | 2 | 34 |零|布朗|零|
+---+------+-----------+----+------+---------+------+
你们自己做了什么?我为“两个自定义字段”列做了,效果很好。看看pivot,看看是否有观众?