Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
将值添加到Spark DataFrame列中现有的嵌套json中_Json_Apache Spark_Apache Spark Sql - Fatal编程技术网

将值添加到Spark DataFrame列中现有的嵌套json中

将值添加到Spark DataFrame列中现有的嵌套json中,json,apache-spark,apache-spark-sql,Json,Apache Spark,Apache Spark Sql,使用Spark 2.3.2 我试图使用数据帧中某些列的值,并将它们放入现有的JSON结构中。假设我有这个数据帧: val testDF=Seq((“{”foo:“bar”,“meta:{”app1:{“p:“2”,“o:“100”),“app2:{”p:“5”,“o:“200”}}}}“,“10”,“1337”)).toDF(“key”,“p”,“o”) //用作嵌套json结构的键 val app=“appX” 基本上,我想从这个专栏 { "foo": "b

使用Spark 2.3.2

我试图使用数据帧中某些列的值,并将它们放入现有的JSON结构中。假设我有这个数据帧:

val testDF=Seq((“{”foo:“bar”,“meta:{”app1:{“p:“2”,“o:“100”),“app2:{”p:“5”,“o:“200”}}}}“,“10”,“1337”)).toDF(“key”,“p”,“o”)
//用作嵌套json结构的键
val app=“appX”
基本上,我想从这个专栏

{
  "foo": "bar",
  "meta": {
    "app1": {
      "p": "2",
      "o": "100"
    },
    "app2": {
      "p": "5",
      "o": "200"
    }
  }
}
为此:

{
  "meta": {
    "app1": {
      "p": "2",
      "o": "100"
    },
    "app2": {
      "p": "5",
      "o": "200"
    },
    "appX": {
      "p": "10",
      "o": "1337"
    }
  }
}
基于数据帧的
p
o

我试过:

def进程(inputDF:DataFrame,appName:String):DataFrame={
val res=inputDF
.withColumn(appName,to_json(expr(“(p,o)”))
.withColumn(“meta”,结构(get_json_对象('key,$.meta)))
.selectExpr(s“”“结构(meta.*,${appName}作为${appName})作为myStruct“”)
.select(to_json('myStruct).as(“newMeta”))
res.show(假)
物件
}
val resultDF=进程(testDF,app)
val resultString=resultDF.select(“newMeta”).collectAsList().get(0).getString(0)
treatEscapes(resultString)必须是(“{”meta:{”app1:{”p:“2”,“o:“100”}”,app2:{”p:“5”,“o:“200”},appX:{”p:“10”,“o:“1337”}}”)
但是这个断言不匹配,因为我不能

  • appX
    的内容放入其他两个应用程序的相同级别
  • 不知道如何正确处理引号,以及
  • 不知道如何将“col1”重命名为“meta”
测试失败,原因是:

Expected :"{"[meta":{"app1":{"p":"2","o":"100"},"app2":{"p":"5","o":"200"},"appX":{"p":"10","o":"1337"}}]}"
Actual   :"{"[col1":"{"app1":{"p":"2","o":"100"},"app2":{"p":"5","o":"200"}}","appX":"{"p":"10","o":"1337"}"]}"
  • 提取
    meta
    内容
  • p
    o
    列转换为
    map
    数据类型。映射(lit(appX)、结构($“p”和$“o”))
  • 然后使用
    map\u concat
    函数来压缩数据
  • 检查下面的代码

    scala> testDF.show(false)
    +---------------------------------------------------------------------------------+---+----+
    |key                                                                              |p  |o   |
    +---------------------------------------------------------------------------------+---+----+
    |{"foo": "bar", "meta":{"app1":{"p":"2", "o":"100"}, "app2":{"p":"5", "o":"200"}}}|10 |1337|
    +---------------------------------------------------------------------------------+---+----+
    
    创建
    schema
    以将
    string
    转换为
    json

    scala> val schema = new StructType().add("foo",StringType).add("meta",MapType(StringType,new StructType().add("p",StringType).add("o",StringType)))
    
    打印模式

    scala> schema.printTreeString
    root
     |-- foo: string (nullable = true)
     |-- meta: map (nullable = true)
     |    |-- key: string
     |    |-- value: struct (valueContainsNull = true)
     |    |    |-- p: string (nullable = true)
     |    |    |-- o: string (nullable = true)
    
    最终输出

    +-----------------------------------------------------------------------------------------------------------------+
    |json_data                                                                                                        |
    +-----------------------------------------------------------------------------------------------------------------+
    |{"key":{"foo":"bar","meta":{"app1":{"p":"2","o":"100"},"app2":{"p":"5","o":"200"},"appX":{"p":"10","o":"1337"}}}}|
    +-----------------------------------------------------------------------------------------------------------------+
    
    
    Spark版本>=
    2.4.0

    使用
    UDF
    &案例类帮助

    定义案例类以保存
    p
    o
    列值

    scala> case class PO(p:String,o:String)
    
    定义自定义项到concat映射

    scala> val map_concat = udf((mp:Map[String,PO],mpa:Map[String,PO]) => mp ++ mpa)
    
    最终产量

    +-------------------------------------------+---+----+---------------------------------------------------------------------------------------------------------+
    |key                                        |p  |o   |newMap                                                                                                   |
    +-------------------------------------------+---+----+---------------------------------------------------------------------------------------------------------+
    |[bar,Map(app1 -> [2,100], app2 -> [5,200])]|10 |1337|{"foo":"bar","meta":{"app1":{"p":"2","o":"100"},"app2":{"p":"5","o":"200"},"appX":{"p":"10","o":"1337"}}}|
    +-------------------------------------------+---+----+---------------------------------------------------------------------------------------------------------+
    

    它是2.3.2,添加到问题中这一行
    。withColumn(“meta”,struct(get_json\u object('key,$.meta”))
    是错误的,它没有展平
    meta
    列值。
    scala> df
    .withColumn("key",from_json($"key",schema))
    .withColumn(
        "key",
        to_json(
            struct(
                $"key.foo",
                map_concat(
                    $"key.meta",
                    map(
                        lit(app),
                        struct($"p",$"o")
                    )
                ).as("meta")
            )
        )
    )
    .show(false)
    
    
    +-------------------------------------------+---+----+---------------------------------------------------------------------------------------------------------+
    |key                                        |p  |o   |newMap                                                                                                   |
    +-------------------------------------------+---+----+---------------------------------------------------------------------------------------------------------+
    |[bar,Map(app1 -> [2,100], app2 -> [5,200])]|10 |1337|{"foo":"bar","meta":{"app1":{"p":"2","o":"100"},"app2":{"p":"5","o":"200"},"appX":{"p":"10","o":"1337"}}}|
    +-------------------------------------------+---+----+---------------------------------------------------------------------------------------------------------+