Apache spark 在pyspark中添加新的列和行
我有pyspark数据帧:Apache spark 在pyspark中添加新的列和行,apache-spark,pyspark,apache-spark-sql,add,pyspark-dataframes,Apache Spark,Pyspark,Apache Spark Sql,Add,Pyspark Dataframes,我有pyspark数据帧: cust | prob ------------------- A | 0.1 B | 0.7 C | 0.4 我想添加另一列amount,并为每个客户添加行。我的预期结果是: cust | prob | amount ------------------------ A | 0.1 | 1000 A | 0.1 | 2000 A | 0.1 | 3000 A | 0.1 | 4
cust | prob
-------------------
A | 0.1
B | 0.7
C | 0.4
我想添加另一列amount
,并为每个客户添加行。我的预期结果是:
cust | prob | amount
------------------------
A | 0.1 | 1000
A | 0.1 | 2000
A | 0.1 | 3000
A | 0.1 | 4000
A | 0.1 | 5000
B | 0.7 | 1000
B | 0.7 | 2000
B | 0.7 | 3000
B | 0.7 | 4000
B | 0.7 | 5000
C | 0.4 | 1000
C | 0.4 | 2000
C | 0.4 | 3000
C | 0.4 | 4000
C | 0.4 | 5000
我需要帮助来创建这个新的列和行。我的真实数据由许多列组成,因此它应该复制数据集中的原始列。您可以添加分解数组:
import pyspark.sql.functions as F
df2 = df.withColumn(
'amount',
F.explode(
F.array(*[F.lit(i) for i in [1000, 2000, 3000, 4000, 5000]])
)
)
或分解序列:
df2 = df.withColumn(
'amount',
F.explode(
F.sequence(F.lit(1000), F.lit(5000), F.lit(1000))
)
)
可以添加分解数组:
import pyspark.sql.functions as F
df2 = df.withColumn(
'amount',
F.explode(
F.array(*[F.lit(i) for i in [1000, 2000, 3000, 4000, 5000]])
)
)
或分解序列:
df2 = df.withColumn(
'amount',
F.explode(
F.sequence(F.lit(1000), F.lit(5000), F.lit(1000))
)
)