Apache spark 在spark sql中将数据帧保存为XML
上述行的架构如下所示Apache spark 在spark sql中将数据帧保存为XML,apache-spark,apache-spark-sql,spark-dataframe,Apache Spark,Apache Spark Sql,Spark Dataframe,上述行的架构如下所示 val final_df = sqlContext.sql("select _xmlns, `md:Date`, `md:Creator`, struct(_ngr, _region, SetofValues) as Station from (select _xmlns, `md:Date`, `md:Creator`, _ngr, _region, struct(_dataType, _period, Value) as SetofValues from (selec
val final_df = sqlContext.sql("select _xmlns, `md:Date`, `md:Creator`, struct(_ngr, _region, SetofValues) as Station from (select _xmlns, `md:Date`, `md:Creator`, _ngr, _region, struct(_dataType, _period, Value) as SetofValues from (select _xmlns, `md:Date`, `md:Creator`, _ngr, _region, _dataType, _period, struct(_VALUE, _time) as Value from df_h a left outer join df_ds b on a.batchId = b.batchId left outer join df_dsv c on b.batchId = c.batchId left outer join df_nv d on c.batchId = d.batchId))"
final_df.repartition(1).write.format("xml").option("rowTag","NewTag").save(output_path)
当我试图使用上面的命令将数据帧保存为XML时,获取XML文件如下
root
|-- _xmlns: string (nullable = true)
|-- md:Date: string (nullable = true)
|-- md:Creator: string (nullable = true)
|-- Station: struct (nullable = false)
| |-- _ngr: string (nullable = true)
| |-- _region: string (nullable = true)
| |-- SetofValues: struct (nullable = false)
| | |-- _dataType: string (nullable = true)
| | |-- _period: string (nullable = true)
| | |-- Value: struct (nullable = false)
| | | |-- _VALUE: double (nullable = true)
| | | |-- _time: string (nullable = true)
2016-10-30
用户1
2016-10-30
用户1
晚到派对,但以防万一有人怀疑您的模式为每个根的每个站点的每一组值包含一个值,例如
<NewTag xmlns="testing">
<md:Date>2016-10-30</md:Date>
<md:Creator>USER_1</md:Creator>
<Station ngr="123456" region="North East">
<SetofValues dataType="Total" period="15 min">
<Value time="05:30:00">3.509</Value>
<Value time="05:45:00">2.6</Value>
<Value time="06:00:00">1.111</Value>
</SetofValues>
</Station>
</NewTag>
如果您想获得该输出,则需要按键进行缩减,并将“值”设置为数组
所以在减少了三个键之后,你的数据帧看起来像
Root Station Set Value
Root Station Set Value
Root Station Set Value
Root Station Set Value
您的数据本身的格式不正确。这就是为什么它会这样印刷。做一个最后的节目,看一看。正确地转换数据,根据需要对其进行分组,然后将其保存。@Abhishekan您能帮助将行转换为数组吗??
Root Station Set Value
Root Station Set Value
Root Station Set Value
Root Station Set Value
Root Station Set [Value, Value, Value, ...]