Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/276.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Pyspark:使用相似的名称连接已排序的列_Python_Dataframe_Pyspark - Fatal编程技术网

Python Pyspark:使用相似的名称连接已排序的列

Python Pyspark:使用相似的名称连接已排序的列,python,dataframe,pyspark,Python,Dataframe,Pyspark,我正在尝试连接字符串列。我可以使用下面的代码实现它,而无需对列进行排序。 感谢有人能帮我排序和连接字符串列 数据帧 import pyspark.sql.functions as f df = spark.createDataFrame([ ("kd", "gr", "hd", "ae", "nw"), ("zj", "sd", "

我正在尝试连接字符串列。我可以使用下面的代码实现它,而无需对列进行排序。 感谢有人能帮我排序和连接字符串列

数据帧

import pyspark.sql.functions as f

df = spark.createDataFrame([
    ("kd", "gr", "hd", "ae", "nw"),
    ("zj", "sd", "mw", "op", "le"),
    ("ct", "wm", "kr", "vs", "qz"),],
    ("main", "main1", "main3", "main2", "main4")
)

+----+-----+-----+-----+-----+
|main|main1|main3|main2|main4|
+----+-----+-----+-----+-----+
|  kd|   gr|   hd|   ae|   nw|
|  zj|   sd|   mw|   op|   le|
|  ct|   wm|   kr|   vs|   qz|
+----+-----+-----+-----+-----+
预期结果:

+----+-----+-----+-----+-----+--------------+
|main|main1|main3|main2|main4|        result|
+----+-----+-----+-----+-----+--------------+
|  kd|   gr|   hd|   ae|   nw|kd_gr_ae_hd_nw|
|  zj|   sd|   mw|   op|   le|zj_sd_op_mw_le|
|  ct|   wm|   kr|   vs|   qz|ct_wm_vs_kr_qz|
+----+-----+-----+-----+-----+--------------+
df = df.withColumn('result', f.concat_ws(
    '_', *[c for c in df.columns if c.startswith("main")]))

df.show()

+----+-----+-----+-----+-----+--------------+
|main|main1|main3|main2|main4|        result|
+----+-----+-----+-----+-----+--------------+
|  kd|   gr|   hd|   ae|   nw|kd_gr_hd_ae_nw|
|  zj|   sd|   mw|   op|   le|zj_sd_mw_op_le|
|  ct|   wm|   kr|   vs|   qz|ct_wm_kr_vs_qz|
+----+-----+-----+-----+-----+--------------+

输出:

+----+-----+-----+-----+-----+--------------+
|main|main1|main3|main2|main4|        result|
+----+-----+-----+-----+-----+--------------+
|  kd|   gr|   hd|   ae|   nw|kd_gr_ae_hd_nw|
|  zj|   sd|   mw|   op|   le|zj_sd_op_mw_le|
|  ct|   wm|   kr|   vs|   qz|ct_wm_vs_kr_qz|
+----+-----+-----+-----+-----+--------------+
df = df.withColumn('result', f.concat_ws(
    '_', *[c for c in df.columns if c.startswith("main")]))

df.show()

+----+-----+-----+-----+-----+--------------+
|main|main1|main3|main2|main4|        result|
+----+-----+-----+-----+-----+--------------+
|  kd|   gr|   hd|   ae|   nw|kd_gr_hd_ae_nw|
|  zj|   sd|   mw|   op|   le|zj_sd_mw_op_le|
|  ct|   wm|   kr|   vs|   qz|ct_wm_kr_vs_qz|
+----+-----+-----+-----+-----+--------------+


您可以在连接列名之前对列名进行排序

方案如下:

col_names = [c for c in df.columns if c.startswith("main")]

sorted_names = sorted(col_names)

df = df.withColumn('result', f.concat_ws(
    '_', *sorted_names))