Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/delphi/8.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark Spark正在从DynamoDB Json运行嵌套模式_Apache Spark_Apache Spark Sql - Fatal编程技术网

Apache spark Spark正在从DynamoDB Json运行嵌套模式

Apache spark Spark正在从DynamoDB Json运行嵌套模式,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,我处理的DynamoDB JSON与此类似: { "name" : {"S" : "John"}, "birthday": { "M" : { "month" : {"N": 1}, "year" : {"N": 2000}, "day"

我处理的DynamoDB JSON与此类似:

{ 
  "name" : {"S" : "John"},
  "birthday": {
    "M" : {
       "month" : {"N": 1},
       "year" : {"N": 2000},
       "day" : {"N": 2} 
    }
  }
}
当我在spark上读到这篇文章时

val df = spark.read.json("s3://path")
我得到了一个复杂的模式:

name : structType ( S : String),
birthday: structType (
  M : StructType (
    month : structType (N : int),
    year : structType (N : int),
    day : structType (N : int),
  )
)
相反,我想将模式更改为

name : String
birthday : structType (
  month : int
  year : int
  day : int
)
有办法做到这一点吗

事实上,我的模式比这个示例大得多,有许多深度嵌套的结构。我还想知道是否有“规范化”模式的动态方法

.selectExpr("name", "birthday.M as birthday")
或者你甚至可以把它完全展平到根部

.selectExpr("name", "birthday.M.*")

我能够使用
named_struct
函数:

df.selectExpr("""
named_struct (
  'name', name.S,
  'birthday', named_struct(
    'month', birthday.M.month.N as decimal,
    'year', birthday.M.year.N as decimal,
    'day', birthday.M.day.N as decimal,
  )
) as items
""")

这对我来说很好。

谢谢你的建议,戴夫。但是,这不只是使生日字段向上,但不会修复其中出现的“N”吗?啊,我没有看到架构的这一部分。如果有大量字段,则可以使用.schema()执行某些操作。如果只是这些已知字段,您可以为每个onespark版本添加别名???@Srinivas Spark 2.4.3