Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 如何将下表转换为所需格式?_Scala_Apache Spark_Apache Spark Sql - Fatal编程技术网

Scala 如何将下表转换为所需格式?

Scala 如何将下表转换为所需格式?,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,我已将下表作为数据帧加载: Id Name customCount Custom1 Custom1value custom2 custom2Value custom3 custom3Value 1 qwerty 2 Height 171 Age 76 Null Null 2 asdfg 2 Weight

我已将下表作为数据帧加载:

Id  Name   customCount    Custom1    Custom1value  custom2    custom2Value   custom3    custom3Value
1  qwerty     2            Height        171        Age            76         Null         Null
2   asdfg     2           Weight          78       Height         166          Null        Null
3   zxcvb     3            Age            28       SkinColor    white         Height        67
4   tyuio     1           Height         177         Null        Null          Null        Null
5   asdfgh    2           SkinColor     brown        Age          34          Null         Null
我需要将此表更改为以下格式:

Id  Name    customCount Height     Weight   Age   SkinColor
1  qwerty       2        171        Null    76      Null
2   asdfg       2        161         78     Null    Null
3   zxcvb       3         67        Null    28      white  
4   tyuio       1        177        Null    Null    Null
5   asdfgh      2        Null       Null    34      brown
我尝试了两个自定义字段列:

val rawDf= spark.read.option("Header",false).options(Map("sep"->"|")).csv("/sample/data.csv")
rawDf.createOrReplaceTempView("Table")
val dataframe=spark.sql("select distinct * from (select `_c3` from Table union select `_c5` from Table)")
val dfWithDistinctColumns=dataframe.toDF("colNames")
val list=dfWithDistinctColumns.select("colNames").map(x=>x.getString(0)).collect().toList
val rawDfWithSchema=rawDf.toDF("Id","Name",customCount","h1","v1","h2","v2")
val expectedDf=list.foldLeft(rawDfWithSchema)((df1,c)=>(df1.withColumn(c, when(col("h1")===c,col("v1")).when(col("h2")===c,col("v2")).otherwise(null)))).drop("h1","h2","v1","v2") 
但当我在3个自定义字段上尝试联合时,我无法在多个列上进行联合。
您能给出一些想法/解决方案吗?

您可以做一个透视,但您还需要首先清理数据帧的格式:

val df2=df.select(
$“Id”、$“Name”、$“customCount”,
爆炸(阵列)(
数组($“Custom1”,$“Custom1value”),
数组($“custom2”,$“custom2Value”),
数组($“custom3”,“custom3Value”)
)).别名(“自定义”)
).选择(
$“Id”、$“Name”、$“customCount”,
$“自定义”(0).别名(“键”),
$“自定义”(1).别名(“值”)
).群比(
“Id”、“Name”、“customCount”
).pivot(“key”).agg(first(“value”)).drop(“null”).orderBy(“Id”)
df2.show
+---+------+-----------+----+------+---------+------+
|Id |姓名|客户计数|年龄|身高|肤色|体重|
+---+------+-----------+----+------+---------+------+
|1 | qwerty | 2 | 76 | 171 |空|空|
|2 | asdfg | 2 |空| 166 |空| 78|
|3 | zxcvb | 3 | 28 | 67 |白色|零|
|4 | tyuio | 1 | null | 177 | null | null|
|5 | asdfgh | 2 | 34 |零|布朗|零|
+---+------+-----------+----+------+---------+------+

你们自己做了什么?我为“两个自定义字段”列做了,效果很好。看看pivot,看看是否有观众?