Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/363.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将列值转换为pyspark dataframe中的列_Python_Python 3.x_Dataframe_Pyspark - Fatal编程技术网

Python 将列值转换为pyspark dataframe中的列

Python 将列值转换为pyspark dataframe中的列,python,python-3.x,dataframe,pyspark,Python,Python 3.x,Dataframe,Pyspark,我想在DataRicks上的pyspark中将一列的值转换为数据帧的多列 e、 g 从pyspark.sql导入SparkSession spark=SparkSession.builder.getOrCreate() df=火花并行化([[“dapd”、“商店”、“零售”], [“dapd”、“商店”、“在线”], [“dapd”、“付款”、“信贷”], [“wrfr”、“商店”、“超市”], [“wrfr”、“商店”、“品牌商店”], [“wrfr”、“付款”、“现金”]])。toDF([“

我想在DataRicks上的pyspark中将一列的值转换为数据帧的多列

e、 g

从pyspark.sql导入SparkSession
spark=SparkSession.builder.getOrCreate()
df=火花并行化([[“dapd”、“商店”、“零售”],
[“dapd”、“商店”、“在线”],
[“dapd”、“付款”、“信贷”],
[“wrfr”、“商店”、“超市”],
[“wrfr”、“商店”、“品牌商店”],
[“wrfr”、“付款”、“现金”]])。toDF([“id”、“value1”、“value2”])
我需要将其转换为:

id,     shop                       payment
dapd    retail|on-line             credit
wrfr    supermarket|brand store    cash
我不知道如何在pyspark中实现这一点


谢谢,

您正在寻找的是
pivot
和聚合函数的组合,例如
collect\u list()
collect\u set()
。请在此处查看可用的聚合函数:。 下面是一些代码示例:

从pyspark.sql导入SparkSession
导入pyspark.sql.f函数
df=火花。_sc.并联([
[“dapd”、“商店”、“零售”],
[“dapd”、“商店”、“在线”],
[“dapd”、“付款”、“信贷”],
[“wrfr”、“商店”、“超市”],
[“wrfr”、“商店”、“品牌商店”],
[“wrfr”、“付款”、“现金”]]
).toDF([“id”、“value1”、“value2”])
df.show()
+----+-------+-----------+
|id |值1 |值2|
+----+-------+-----------+
|dapd |商店|零售|
|dapd |商店|在线|
|dapd |付款|信用|
|wrfr |商店|超市|
|wrfr |店|品牌店|
|wrfr |付款|现金|
+----+-------+-----------+
df.groupBy('id').pivot('value1').agg(f.collect_list('value2')).show(truncate=False)
+----+--------+--------------------------+
|id |付款|商店|
+----+--------+--------------------------+
|dapd |[信贷]|[零售,在线]|
|wrfr |[现金]|[超市、品牌店]|
+----+--------+--------------------------+

您可以这样做

newdf=df.groupby('id').pivot('value1').agg(func.collect_list(func.col('value2')))
newdf=newdf.withColumn('shop',func.concat_ws('|',func.col('shop')[0],func.col('shop')[1]))
newdf=newdf.withColumn('payment',func.col('payment')[0])
newdf.show(20, False)
+----+-------+-----------------------+
|id  |payment|shop                   |
+----+-------+-----------------------+
|dapd|credit |retail|on-line         |
|wrfr|cash   |brand store|supermarket|
+----+-------+-----------------------+


我很难理解这一点,你能换一种解释吗?