Python Pyspark:使用相似的名称连接已排序的列
我正在尝试连接字符串列。我可以使用下面的代码实现它,而无需对列进行排序。 感谢有人能帮我排序和连接字符串列 数据帧Python Pyspark:使用相似的名称连接已排序的列,python,dataframe,pyspark,Python,Dataframe,Pyspark,我正在尝试连接字符串列。我可以使用下面的代码实现它,而无需对列进行排序。 感谢有人能帮我排序和连接字符串列 数据帧 import pyspark.sql.functions as f df = spark.createDataFrame([ ("kd", "gr", "hd", "ae", "nw"), ("zj", "sd", "
import pyspark.sql.functions as f
df = spark.createDataFrame([
("kd", "gr", "hd", "ae", "nw"),
("zj", "sd", "mw", "op", "le"),
("ct", "wm", "kr", "vs", "qz"),],
("main", "main1", "main3", "main2", "main4")
)
+----+-----+-----+-----+-----+
|main|main1|main3|main2|main4|
+----+-----+-----+-----+-----+
| kd| gr| hd| ae| nw|
| zj| sd| mw| op| le|
| ct| wm| kr| vs| qz|
+----+-----+-----+-----+-----+
预期结果:
+----+-----+-----+-----+-----+--------------+
|main|main1|main3|main2|main4| result|
+----+-----+-----+-----+-----+--------------+
| kd| gr| hd| ae| nw|kd_gr_ae_hd_nw|
| zj| sd| mw| op| le|zj_sd_op_mw_le|
| ct| wm| kr| vs| qz|ct_wm_vs_kr_qz|
+----+-----+-----+-----+-----+--------------+
df = df.withColumn('result', f.concat_ws(
'_', *[c for c in df.columns if c.startswith("main")]))
df.show()
+----+-----+-----+-----+-----+--------------+
|main|main1|main3|main2|main4| result|
+----+-----+-----+-----+-----+--------------+
| kd| gr| hd| ae| nw|kd_gr_hd_ae_nw|
| zj| sd| mw| op| le|zj_sd_mw_op_le|
| ct| wm| kr| vs| qz|ct_wm_kr_vs_qz|
+----+-----+-----+-----+-----+--------------+
输出:
+----+-----+-----+-----+-----+--------------+
|main|main1|main3|main2|main4| result|
+----+-----+-----+-----+-----+--------------+
| kd| gr| hd| ae| nw|kd_gr_ae_hd_nw|
| zj| sd| mw| op| le|zj_sd_op_mw_le|
| ct| wm| kr| vs| qz|ct_wm_vs_kr_qz|
+----+-----+-----+-----+-----+--------------+
df = df.withColumn('result', f.concat_ws(
'_', *[c for c in df.columns if c.startswith("main")]))
df.show()
+----+-----+-----+-----+-----+--------------+
|main|main1|main3|main2|main4| result|
+----+-----+-----+-----+-----+--------------+
| kd| gr| hd| ae| nw|kd_gr_hd_ae_nw|
| zj| sd| mw| op| le|zj_sd_mw_op_le|
| ct| wm| kr| vs| qz|ct_wm_kr_vs_qz|
+----+-----+-----+-----+-----+--------------+
您可以在连接列名之前对列名进行排序 方案如下:
col_names = [c for c in df.columns if c.startswith("main")]
sorted_names = sorted(col_names)
df = df.withColumn('result', f.concat_ws(
'_', *sorted_names))