Python PySpark:从列的有序连接创建列

Python PySpark:从列的有序连接创建列,python,apache-spark,pyspark,Python,Apache Spark,Pyspark,我在pyspark数据帧上从两个现有列的有序连接创建新列时遇到问题,即: +------+------+--------+ | Col1 | Col2 | NewCol | +------+------+--------+ | ORD | DFW | DFWORD | | CUN | MCI | CUNMCI | | LAX | JFK | JFKLAX | +------+------+--------+ 换句话说,我想抓取Col1和Col2,按字母顺序排列并连接它们 有什么建议

我在pyspark数据帧上从两个现有列的有序连接创建新列时遇到问题,即:

+------+------+--------+
| Col1 | Col2 | NewCol |
+------+------+--------+
| ORD  | DFW  | DFWORD |
| CUN  | MCI  | CUNMCI |
| LAX  | JFK  | JFKLAX |
+------+------+--------+
换句话说,我想抓取Col1和Col2,按字母顺序排列并连接它们


有什么建议吗?

结合
concat\u ws
array
sort\u array

from pyspark.sql.functions import concat_ws, array, sort_array

df = spark.createDataFrame(
    [("ORD", "DFW"), ("CUN", "MCI"), ("LAX", "JFK")],
    ("Col1", "Col2"))

df.withColumn("NewCol", concat_ws("", sort_array(array("Col1", "Col2")))).show()
# +----+----+------+        
# |Col1|Col2|NewCol|
# +----+----+------+
# | ORD| DFW|DFWORD|
# | CUN| MCI|CUNMCI|
# | LAX| JFK|JFKLAX|
# +----+----+------+

组合
concat\u ws
array
sort\u array

from pyspark.sql.functions import concat_ws, array, sort_array

df = spark.createDataFrame(
    [("ORD", "DFW"), ("CUN", "MCI"), ("LAX", "JFK")],
    ("Col1", "Col2"))

df.withColumn("NewCol", concat_ws("", sort_array(array("Col1", "Col2")))).show()
# +----+----+------+        
# |Col1|Col2|NewCol|
# +----+----+------+
# | ORD| DFW|DFWORD|
# | CUN| MCI|CUNMCI|
# | LAX| JFK|JFKLAX|
# +----+----+------+