Apache spark 在Spark数据框中将字符串数据类型列转换为MapType

Apache spark 在Spark数据框中将字符串数据类型列转换为MapType,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,我有一个如下所示的数据帧。我想将最后一列Trandata从String类型转换为MapType。输出应该与我在第二个表中显示的类似 我已经编写了udf,但它需要字符串并转换为Maptype,我很难用sql.row作为输入获得类似的输出:( 对于Spark 2.4+,可以将字符串转换为键值对,然后使用将键和值分隔为两个数组列,然后使用创建最终映射 df.withColumn(“entry”,split('TRANDATA,,')) .withColumn(“key”,expr(“transform

我有一个如下所示的数据帧。我想将最后一列Trandata从String类型转换为MapType。输出应该与我在第二个表中显示的类似

我已经编写了udf,但它需要字符串并转换为Maptype,我很难用sql.row作为输入获得类似的输出:(

对于Spark 2.4+,可以将字符串转换为键值对,然后使用将键和值分隔为两个数组列,然后使用创建最终映射

df.withColumn(“entry”,split('TRANDATA,,'))
.withColumn(“key”,expr(“transform(条目,x->split(x,“=”)[0])”)
.withColumn(“value”,expr(“transform(条目,x->split(x,“=”)[1])”)
.withColumn(“映射”,映射来自数组(“键,值))
.drop(“输入”、“键”、“值”、“传输数据”)
.show(假)
输出:

+---------+--------+----------------------------------------------------------------------------------------+
|MESSAGEID |类别|映射|
+---------+--------+----------------------------------------------------------------------------------------+
|03010 | A |[threadID->123sada,ProcType->InfraLogging,TxnID->4mjx8wfogf]|
|03011 | A |[threadID->xmjxe2j0jz,ProcType->InfraLogging,TxnID->4mjxe2j0tf]|
|09941 | D |[compTxnID->xmawdew0tf,to->ABCD,threadID->4mjxe2j0jz,ProcType->InfraLogging]|
|00994 | D |[compTxnID->xmjxe2j0tf,to->XYZA,threadID->34jxasde0jz,ProcType->InfraLogging]|
+---------+--------+----------------------------------------------------------------------------------------+

非常感谢沃纳。Spark 2.4+包含更多选项,很好,我的生产服务器升级到了2.4.x版本,因此上述代码应该可以正常工作。
def stringToMap(value: String): Map[String, String] = {
  var valMap = collection.mutable.Map[String, String]()
  val values = value.split(",")
  for (i <- values) {
    valMap = valMap + (i.split("=")(0) -> i.split("=")(1))
  }
  return valMap
}


+--------------+--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|MESSAGEID     |CATEGORY|TRANDATA                                                                                                                                                                                                                                                                                       |
+--------------+--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|03010         |A       |threadID=123sada,ProcType=InfraLogging,TxnID=4mjx8wfogf
|03011         |A       |threadID=xmjxe2j0jz,ProcType=InfraLogging,TxnID=4mjxe2j0tf
|09941         |D       |compTxnID=xmawdew0tf,to=ABCD,threadID=4mjxe2j0jz,ProcType=InfraLogging
|00994         |D       |compTxnID=xmjxe2j0tf,to=XYZA,threadID=34jxasde0jz,ProcType=InfraLogging
+--------------+--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+--------------+--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|MESSAGEID     |CATEGORY|TRANDATA                                                                                                                                                                                                                                                                                       |
+--------------+--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|03010         |A       |Map(threadID -> 123sada,ProcType -> InfraLogging,TxnID -> 4mjx8wfogf)