Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/variables/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
更改数组值pyspark_Pyspark - Fatal编程技术网

更改数组值pyspark

更改数组值pyspark,pyspark,Pyspark,我有一个pyspark数据帧: 示例df: number | matricule<array> | name<array> | ---------------------------------------------- AA | [] | [7] | ---------------------------------------------- AA | [9]

我有一个pyspark数据帧:

示例df:

number  |  matricule<array>   | name<array>  |    
----------------------------------------------
AA      |  []                 |  [7]         |    
----------------------------------------------
AA      |  [9]                |  []         |     
----------------------------------------------
AA      |  [""]                |  [2]         |    
----------------------------------------------
AA      |  [2]                |  [""]      |  
但我有一个错误:

AnalysisException: u"cannot resolve, `matricule` = '[]')' due to data type mismatch: differing types.
预期结果:

number  |  matricule<array>   | name<array>  |    
----------------------------------------------
AA      |  []                 |  [7]         |    
----------------------------------------------
AA      |  [9]                |  []          |     
----------------------------------------------
AA      |  []                |  [2]          |    
----------------------------------------------
AA      |  [2]                |  []          |  
number |矩阵| name |
----------------------------------------------
AA |[]|[7]|
----------------------------------------------
AA |[9]|[]|
----------------------------------------------
AA |[]|[2]|
----------------------------------------------
AA |[2]|[]|
请有人能帮我吗? 谢谢

数据帧:

+------+---------+----+
|Number|Matricule|Name|
+------+---------+----+
|    AA|     [""]| [7]|
|    AA|      [9]|  []|
|    AA|     [""]| [2]|
|    AA|      [2]|[""]|
+------+---------+----+
从两列中筛选出”:

df.withColumn("Matricule", F.expr("""filter(Matricule, x -> x!= '""')"""))\
  .withColumn("Name", F.expr("""filter(Name, x -> x!= '""')""")).show()


+------+---------+----+
|Number|Matricule|Name|
+------+---------+----+
|    AA|       []| [7]|
|    AA|      [9]|  []|
|    AA|       []| [2]|
|    AA|      [2]|  []|
+------+---------+----+
如评论中所述,您还可以使用数组\u remove:

df.withColumn("Matricule", F.array_remove("Matricule", '""'))\
  .withColumn("Name", F.array_remove("Name", '""')).show()

+------+---------+----+
|Number|Matricule|Name|
+------+---------+----+
|    AA|       []| [7]|
|    AA|      [9]|  []|
|    AA|       []| [2]|
|    AA|      [2]|  []|
+------+---------+----+

您想将空字符串转换为空字符串还是将其从数组中完全删除?@blackishop删除它们并保留一个空数组[]如果您使用的是Spark 2.4+,您可以这样使用:
df=df.withColumn(“matricule_2”,array_remove(col(“matricule”),“”)
。。。
df.withColumn("Matricule", F.array_remove("Matricule", '""'))\
  .withColumn("Name", F.array_remove("Name", '""')).show()

+------+---------+----+
|Number|Matricule|Name|
+------+---------+----+
|    AA|       []| [7]|
|    AA|      [9]|  []|
|    AA|       []| [2]|
|    AA|      [2]|  []|
+------+---------+----+