Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 将函数应用于scala中的数据帧列_Apache Spark_Apache Spark Sql - Fatal编程技术网

Apache spark 将函数应用于scala中的数据帧列

Apache spark 将函数应用于scala中的数据帧列,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,我有一个包含大量列(150)的大型数据集,我想在除第一列之外的所有列上应用一个函数(UDF),该列具有id字段。我能够动态应用该函数,但现在我需要将id为的最终数据集归档回数据帧。spark作业将在群集模式下运行,heere就是我尝试的 val df = sc.parallelize( Seq(("id1", "B", "c","d"), ("id2", "e", "d","k"),("id3", "e", "m","n"))).toDF("id", "dat1", "dat2","dat3

我有一个包含大量列(150)的大型数据集,我想在除第一列之外的所有列上应用一个函数(UDF),该列具有id字段。我能够动态应用该函数,但现在我需要将id为的最终数据集归档回数据帧。spark作业将在群集模式下运行,heere就是我尝试的

val df = sc.parallelize(
  Seq(("id1", "B", "c","d"), ("id2", "e", "d","k"),("id3", "e", "m","n"))).toDF("id", "dat1", "dat2","dat3")
df.show

+---+----+----+----+
| id|dat1|dat2|dat3|
+---+----+----+----+
|id1|   B|   c|   d|
|id2|   e|   d|   k|
|id3|   e|   m|   n|
+---+----+----+----+

df.select(df.columns.slice(1,df.columns.size).map(c => upper(col(c)).alias(c)): _*).show

----+----+----+
|dat1|dat2|dat3|
+----+----+----+
|   B|   C|   D|
|   E|   D|   K|
|   E|   M|   N|
+----+----+----+
预期产量

-----+----+----+
id|dat1|dat2|dat3|
-+----+----+----+
|id1|   B|   C|   D|
|id2|   E|   D|   K|
|id3|   E|   M|   N|
-+----+----+----+

只需将
id
列前置到其他(转换的)列:


简单干净。谢谢
df.select(
    col("id") +: df.columns.tail.map(c => upper(col(c)).alias(c)): _*
).show
+---+----+----+----+
| id|dat1|dat2|dat3|
+---+----+----+----+
|id1|   B|   C|   D|
|id2|   E|   D|   K|
|id3|   E|   M|   N|
+---+----+----+----+