从数组pyspark中删除特殊字符_Pyspark

从数组pyspark中删除特殊字符

pyspark

从数组pyspark中删除特殊字符,pyspark,Pyspark,我有一个pyspark数据帧，它包含一列字符串 df示例： number | id --------------- 12 | [12, .AZ, .UI] ------------------------ 14 | [CL, .RT, OP.] 我想删除字符。我尝试使用regexp\u replace： df = df.select("id", F.regexp_replace(F.col("id"), ".").alias("id")) 但我认为regexp\u repl

我有一个pyspark数据帧，它包含一列字符串

df示例：

number | id
---------------
12     | [12, .AZ, .UI]
------------------------
14     | [CL, .RT, OP.]

我想删除字符

。

我尝试使用

regexp\u replace

：

df = df.select("id", F.regexp_replace(F.col("id"), ".").alias("id"))

但我认为regexp\u replace是字符串而不是数组的好解决方案

如何从数组中删除此字符？

谢谢

在Spark 2.4或更高版本中，您可以使用

工作示例：

import pyspark.sql.functions as F
df.withColumn("id",F.expr("transform(id,x-> replace(x,'.',''))")).show()

+------+------------+
|number|          id|
+------+------------+
|    12|[12, AZ, UI]|
|    14|[CL, RT, OP]|
+------+------------+