PySpark中的Python字典查找
在PySpark中与以下内容进行斗争。我有一本Python字典,看起来像这样:PySpark中的Python字典查找,pyspark,Pyspark,在PySpark中与以下内容进行斗争。我有一本Python字典,看起来像这样: COUNTRY_MAP = { "AND": "AD", "ARE": "AE", "AFG": "AF", "ATG": "AG", "AIA": "AI", ... }; 现在我想建立一个由3列组成的值,比
COUNTRY_MAP = {
"AND": "AD", "ARE": "AE", "AFG": "AF", "ATG": "AG", "AIA": "AI", ... };
现在我想建立一个由3列组成的值,比如value1、value2和value3。问题是value3需要使用上述查找将3个字母的代码转换为2个字母的代码,如果它不存在,则应使用“无”,即
from pyspark.sql import functions as sf
combined = sf.trim(sf.concat(sf.col("value1"), sf.lit(":"), sf.col("value2"), sf.lit(":"),
sf.coalesce(sf.col("value3"), "NONE")))
tmp = (df.withColumn('COMBINED_FIELD', combined)
...<other stuff>
)
从pyspark.sql导入函数作为sf
组合=sf.trim(sf.concat(sf.col(“值1”)、sf.lit(“:”)、sf.col(“值2”)、sf.lit(“:”),
sf.合并(sf.列(“值3”),“无”))
tmp=(df.withColumn('COMBINED_FIELD',COMBINED)
...
)
这给了我类似“abc:4545:AND”、“def:7789:ARE”和“ghi:1122:NONE”的值。我现在需要:“abc:4545:AD”、“def:7789:AE”和“ghi:1122:NONE”
作为PySpark的新手,我真的很难让它正常工作。您知道吗?您可以将字典转换为映射类型列,并使用
value3
键获取值:
import pyspark.sql.functions as F
COUNTRY_MAP = {"AND": "AD", "ARE": "AE", "AFG": "AF", "ATG": "AG", "AIA": "AI"}
result = df.withColumn(
'combined_field',
F.trim(
F.concat_ws(':',
'value1', 'value2',
F.coalesce(
F.create_map(*sum([[F.lit(k), F.lit(v)] for (k,v) in COUNTRY_MAP.items()], []))[F.col('value3')],
F.lit('NONE')
)
)
)
)
result.show()
+------+------+------+--------------+
|value1|value2|value3|combined_field|
+------+------+------+--------------+
| abc| 4545| AND| abc:4545:AD|
| def| 7789| ARE| def:7789:AE|
| ghi| 1122| NONE| ghi:1122:NONE|
+------+------+------+--------------+