如何使用scala dataframe添加具有以下行值的新列
我有一个数据帧如何使用scala dataframe添加具有以下行值的新列,scala,dataframe,apache-spark,Scala,Dataframe,Apache Spark,我有一个数据帧 +----------+----------+ | longitude| latitude| +----------+----------+ |-7.1732833|32.0414966| |-7.1732844|32.0414406| |-7.1732833|32.0414966| |-7.1732833|32.0414966| |-7.1732833|32.0414966| |-7.1732833|32.0414966| 预期结果 +----------+-------
+----------+----------+
| longitude| latitude|
+----------+----------+
|-7.1732833|32.0414966|
|-7.1732844|32.0414406|
|-7.1732833|32.0414966|
|-7.1732833|32.0414966|
|-7.1732833|32.0414966|
|-7.1732833|32.0414966|
预期结果
+----------+----------+-----------------+----------------------+----------------+-------------+
| longitude| latitude| origin_longitude |destination_longitude|origine_latitude|destination_latitude
+----------+----------+ -----------------+---------------------+----------------+
|-7.1732833|32.0414966|-7.1732833 |-7.1732844 |32.0414966 |32.0414406
|-7.1732844|32.0414406|-7.1732844 |-7.1732833 |32.0414406 |32.0414966
|-7.1732833|32.0414966|-7.1732833 |-7.1732833 |32.0414966 |32.0414966
|-7.1732833|32.0414966|-7.1732833 |-7.1732833 |32.0414966 |32.0414966
|-7.1732833|32.0414966|-7.1732833 |-7.1732833 |32.0414966 |32.0414966
|-7.1732833|32.0414966|
我怎样才能用scala做到这一点,我是scala的新手,请帮忙。
谢谢。您可以使用
df.withColumn(“origin_longitude,lit(-7.1732833))
您可以根据需要使用column函数链接任意多个。您可以使用窗口函数获取下一行(lead)并创建新列,但是lead要求我们使用orderBy,如果我在纬度/经度上按orderBy,您的数据帧顺序将不会被保留,因此,我手动创建了一个seq列来保留您的顺序。在实际数据中,应该有一列帮助您排序
%scala
val df=Seq(
(1,-7.1732833,32.0414966),
(2,-7.1732844,32.0414406),
(3,-7.1732833,32.0414966),
(4,-7.1732833,32.0414966),
(5,-7.1732833,32.0414966),
(6,-7.1732833,32.0414966)
).toDF("seq","longitude","latitude")
df.show()
+---+----------+----------+
|seq| longitude| latitude|
+---+----------+----------+
| 1|-7.1732833|32.0414966|
| 2|-7.1732844|32.0414406|
| 3|-7.1732833|32.0414966|
| 4|-7.1732833|32.0414966|
| 5|-7.1732833|32.0414966|
| 6|-7.1732833|32.0414966|
+---+----------+----------+
import org.apache.spark.sql.functions.lead
import org.apache.spark.sql.functions.col
val w = org.apache.spark.sql.expressions.Window.orderBy("date").orderBy("seq")
df.withColumn("destination_longitude", lead("longitude",1,0).over(w)).withColumn("destination_latitude", lead("latitude",1,0).over(w)).select(col("longitude").alias("origin_longitude"),col("destination_longitude"),col("latitude").alias("origin_latitude"),col("destination_latitude")).filter(col("destination_longitude")!==0.0).show()
+----------------+---------------------+---------------+--------------------+
|origin_longitude|destination_longitude|origin_latitude|destination_latitude|
+----------------+---------------------+---------------+--------------------+
| -7.1732833| -7.1732844| 32.0414966| 32.0414406|
| -7.1732844| -7.1732833| 32.0414406| 32.0414966|
| -7.1732833| -7.1732833| 32.0414966| 32.0414966|
| -7.1732833| -7.1732833| 32.0414966| 32.0414966|
| -7.1732833| -7.1732833| 32.0414966| 32.0414966|
+----------------+---------------------+---------------+--------------------+
这回答了你的问题吗?问题不在于With列,而在于如何将下一行值放入此行。非常感谢,但当我尝试时,我得到了以下错误:错误:(25333)值!==不是org.apache.spark.sql.Column scala的成员。任何想法!谢谢你的回复,这对我很有效。