Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/unix/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark spark为每个组动态创建struct/json_Apache Spark_Apache Spark Sql - Fatal编程技术网

Apache spark spark为每个组动态创建struct/json

Apache spark spark为每个组动态创建struct/json,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,我有一个像这样的火花数据框 +-----+---+---+---+------+ |group| a| b| c|config| +-----+---+---+---+------+ | a| 1| 2| 3| [a]| | b| 2| 3| 4|[a, b]| +-----+---+---+---+------+ val df = Seq(("a", 1, 2, 3, Seq("a")),("b", 2, 3,4, Seq("a", "b"))).toDF("

我有一个像这样的火花数据框

+-----+---+---+---+------+
|group|  a|  b|  c|config|
+-----+---+---+---+------+
|    a|  1|  2|  3|   [a]|
|    b|  2|  3|  4|[a, b]|
+-----+---+---+---+------+
val df = Seq(("a", 1, 2, 3, Seq("a")),("b", 2, 3,4, Seq("a", "b"))).toDF("group", "a", "b","c", "config")
如何添加一个附加列,即

df.withColumn("select_by_config", <<>>).show

有没有更好的方法来解决2.2的问题?

如果您使用最新的构建Spark 2.4.0 RC 1或更高版本,高阶函数的组合应该可以解决这个问题。创建列的映射:

导入org.apache.spark.sql.functions{ 数组、列、表达式、照明、映射来自数组、映射来自条目 } val cols=顺序A、b、c val dfm=df.withColumn cmap, 映射\u从\u arraysarraycols映射点亮:\u*,arraycols映射列:_* 并转换配置:

dfm.withColumn 配置映射, 映射\u自\u entriesExprtTransfermConfig,k->structk,cmap[k] 显示 // +---+--+--+--+---+----------+--------+ //|组| a | b | c |配置| cmap |配置|| // +---+--+--+--+---+----------+--------+ //| a | 1 | 2 | 3 |[a]|[a->1,b->2,…|[a->1]| //| b | 2 | 3 | 4 |[a,b]|[a->2,b->3,…|[a->2,b->3]| // +---+--+--+--+---+----------+--------+
val df = Seq((1,"a", 1, 2, 3, Seq("a")),(2, "b", 2, 3,4, Seq("a", "b"))).toDF("id", "group", "a", "b","c", "config")
  df.show
  import spark.implicits._
  final case class Foo(id:Int, c1:Int, specific:Map[String, Int])
  df.map(r => {
    val config = r.getAs[Seq[String]]("config")
    print(config)
    val others = config.map(elem => (elem, r.getAs[Int](elem))).toMap
    Foo(r.getAs[Int]("id"), r.getAs[Int]("c"), others)
  }).show