Apache spark 基于包含concat值的现有列添加新列Spark dataframe_Apache Spark_Pyspark_Apache Spark Sql_Spark Streaming

Apache spark 基于包含concat值的现有列添加新列Spark dataframe

apache-spark pyspark

Apache spark 基于包含concat值的现有列添加新列Spark dataframe,apache-spark,pyspark,apache-spark-sql,spark-streaming,Apache Spark,Pyspark,Apache Spark Sql,Spark Streaming,我想根据以下条件在我的数据帧中创建一个新列我的数据帧如下所示： my_string 2020 test 2020 prod 2020 dev 我的情况： value1=subtract string after space from my_string value2=subtract first four digit from my_string If value 1 contains string 'test' then new_col=value2+"01" If v

我想根据以下条件在我的数据帧中创建一个新列

我的数据帧如下所示：

my_string 

2020 test 

2020 prod 

2020 dev

我的情况：

value1=subtract string after space from my_string

value2=subtract first four digit from my_string

If value 1 contains string 'test' then new_col=value2+"01"

If value 1 contains string 'prod' then new_col=value2+"kk"

If value 1 contains string 'dev' then new_col=value2+"ff"

我需要这样的结果：

my_string 

2020 test 

2020 prod 

2020 dev

有人能帮我吗？

使用id单调递增的行数窗口函数

更新：

使用when+others语句

df.withColumn("dyn_col",when(lower(split(col("my_string")," ")[1]) =="prod","kk").\
when(lower(split(col("my_string")," ")[1]) =="dev","ff").\
when(lower(split(col("my_string")," ")[1]) =="test","01").\
otherwise("null")).\
withColumn("new_col",concat(split(col("my_string")," ")[0], col("dyn_col"))).\
drop("dyn_col").\
show()
#+---------+-------+
#|my_string|new_col|
#+---------+-------+
#|2020 test| 202001|
#|2020 prod| 2020kk|
#| 2020 dev| 2020ff|
#+---------+-------+

在Scala中：

你能把你的输入和预期输出格式化成表格格式吗？因此，它是可读的&您可以为任何人提供简单的解决方案..：这不是我刚才给的示例的行数，它可能会像任何重新编辑的字符串一样变化。非常感谢您，我将尝试

df.withColumn("dyn_col",when(lower(split(col("my_string")," ")[1]) =="prod","kk").\
when(lower(split(col("my_string")," ")[1]) =="dev","ff").\
when(lower(split(col("my_string")," ")[1]) =="test","01").\
otherwise("null")).\
withColumn("new_col",concat(split(col("my_string")," ")[0], col("dyn_col"))).\
drop("dyn_col").\
show()
#+---------+-------+
#|my_string|new_col|
#+---------+-------+
#|2020 test| 202001|
#|2020 prod| 2020kk|
#| 2020 dev| 2020ff|
#+---------+-------+

df.withColumn("dyn_col",when(lower(split(col("my_string")," ")(1)) ==="prod","kk").
when(lower(split(col("my_string")," ")(1)) ==="dev","ff").
when(lower(split(col("my_string")," ")(1)) ==="test","01").
otherwise("null")).
withColumn("new_col",concat(split(col("my_string")," ")(0), col("dyn_col"))).
drop("dyn_col").
show()

//+---------+-------+
//|my_string|new_col|
//+---------+-------+
//|2020 test| 202001|
//|2020 prod| 2020kk|
//| 2020 dev| 2020ff|
//+---------+-------+