Scala 如何根据spark数据帧中另一列的值更改列的值
我从这个数据帧开始Scala 如何根据spark数据帧中另一列的值更改列的值,scala,dataframe,apache-spark,Scala,Dataframe,Apache Spark,我从这个数据帧开始 DF1 +----+-------+-------+-------+ |name | type |item1 | item2 | +-----+-------+------+-------+ |apple|fruit |apple1|apple2 | |beans|vege |beans1|beans2 | |beef |meat |beef1 |beef2 | |kiwi |fruit |kiwi1 |kiwi2 | |pork |meat |pork
DF1
+----+-------+-------+-------+
|name | type |item1 | item2 |
+-----+-------+------+-------+
|apple|fruit |apple1|apple2 |
|beans|vege |beans1|beans2 |
|beef |meat |beef1 |beef2 |
|kiwi |fruit |kiwi1 |kiwi2 |
|pork |meat |pork1 |pork2 |
+-----+-------+--------------+
现在我想根据DF2中“type”列的列值填充一个名为“prop”的列。比如说,
If "type"== "fruit" then "prop"="item1"
If "type"== "vege" then "prop"="item1"
If "type"== "meat" then "prop"="item2"
得到这个最好的方法是什么?我想根据每个“类型”进行过滤,填充“道具”
列,然后连接生成的数据帧。这似乎不是很有效
DF2
+----+-------+-------+-------+-------+
|name | type |item1 | item2 | prop |
+-----+-------+------+-------+-------+
|apple|fruit |apple1|apple2 |apple1 |
|beans|vege |beans1|beans2 |beans1 |
|beef |meat |beef1 |beef2 |beef2 |
|kiwi |fruit |kiwi1 |kiwi2 |kiwi1 |
|pork |meat |pork1 |pork2 |pork2 |
+-----+-------+--------------+-------+
在这种情况下使用在Spark中非常有效的语句
//sample data
df.show()
//+-----+-----+------+------+
//| name| type| item1| item2|
//+-----+-----+------+------+
//|apple|fruit|apple1|apple2|
//|beans| vege|beans1|beans2|
//| beef| meat| beef1| beef2|
//| kiwi|fruit| kiwi1| kiwi2|
//| pork| meat| pork1| pork2|
//+-----+-----+------+------+
//using isin function
df.withColumn("prop",when((col("type").isin(Seq("vege","fruit"):_*)),col("item1")).when(col("type") === "meat",col("item2")).otherwise(col("type"))).show()
df.withColumn("prop",when((col("type") === "fruit") ||(col("type") === "vege"),col("item1")).when(col("type") === "meat",col("item2")).
otherwise(col("type"))).
show()
//+-----+-----+------+------+------+
//| name| type| item1| item2| prop|
//+-----+-----+------+------+------+
//|apple|fruit|apple1|apple2|apple1|
//|beans| vege|beans1|beans2|beans1|
//| beef| meat| beef1| beef2| beef2|
//| kiwi|fruit| kiwi1| kiwi2| kiwi1|
//| pork| meat| pork1| pork2| pork2|
//+-----+-----+------+------+------+
在这种情况下使用在Spark中非常有效的语句
//sample data
df.show()
//+-----+-----+------+------+
//| name| type| item1| item2|
//+-----+-----+------+------+
//|apple|fruit|apple1|apple2|
//|beans| vege|beans1|beans2|
//| beef| meat| beef1| beef2|
//| kiwi|fruit| kiwi1| kiwi2|
//| pork| meat| pork1| pork2|
//+-----+-----+------+------+
//using isin function
df.withColumn("prop",when((col("type").isin(Seq("vege","fruit"):_*)),col("item1")).when(col("type") === "meat",col("item2")).otherwise(col("type"))).show()
df.withColumn("prop",when((col("type") === "fruit") ||(col("type") === "vege"),col("item1")).when(col("type") === "meat",col("item2")).
otherwise(col("type"))).
show()
//+-----+-----+------+------+------+
//| name| type| item1| item2| prop|
//+-----+-----+------+------+------+
//|apple|fruit|apple1|apple2|apple1|
//|beans| vege|beans1|beans2|beans1|
//| beef| meat| beef1| beef2| beef2|
//| kiwi|fruit| kiwi1| kiwi2| kiwi1|
//| pork| meat| pork1| pork2| pork2|
//+-----+-----+------+------+------+
当和时,可以通过链接
完成,否则
如下
import org.apache.spark.sql.functions._
object WhenThen {
def main(args: Array[String]): Unit = {
val spark = Constant.getSparkSess
import spark.implicits._
val df = List(("apple","fruit","apple1","apple2"),
("beans","vege","beans1","beans2"),
("beef","meat","beef1","beans2"),
("kiwi","fruit","kiwi1","beef2"),
("pork","meat","pork1","pork2")
).toDF("name","type","item1","item2" )
df.withColumn("prop",
when($"type" === "fruit", $"item1").otherwise(
when($"type" === "vege", $"item1").otherwise(
when($"type" === "meat", $"item2").otherwise("")
)
)).show()
}
}
当和时,可以通过链接
完成,否则
如下
import org.apache.spark.sql.functions._
object WhenThen {
def main(args: Array[String]): Unit = {
val spark = Constant.getSparkSess
import spark.implicits._
val df = List(("apple","fruit","apple1","apple2"),
("beans","vege","beans1","beans2"),
("beef","meat","beef1","beans2"),
("kiwi","fruit","kiwi1","beef2"),
("pork","meat","pork1","pork2")
).toDF("name","type","item1","item2" )
df.withColumn("prop",
when($"type" === "fruit", $"item1").otherwise(
when($"type" === "vege", $"item1").otherwise(
when($"type" === "meat", $"item2").otherwise("")
)
)).show()
}
}