如何使用scala dataframe添加具有以下行值的新列_Scala_Dataframe_Apache Spark

如何使用scala dataframe添加具有以下行值的新列

scala dataframe apache-spark

如何使用scala dataframe添加具有以下行值的新列,scala,dataframe,apache-spark,Scala,Dataframe,Apache Spark,我有一个数据帧 +----------+----------+ | longitude| latitude| +----------+----------+ |-7.1732833|32.0414966| |-7.1732844|32.0414406| |-7.1732833|32.0414966| |-7.1732833|32.0414966| |-7.1732833|32.0414966| |-7.1732833|32.0414966| 预期结果 +----------+-------

我有一个数据帧

+----------+----------+
| longitude|  latitude|
+----------+----------+
|-7.1732833|32.0414966|
|-7.1732844|32.0414406|
|-7.1732833|32.0414966|
|-7.1732833|32.0414966|
|-7.1732833|32.0414966|
|-7.1732833|32.0414966|

预期结果

 +----------+----------+-----------------+----------------------+----------------+-------------+
| longitude|  latitude| origin_longitude |destination_longitude|origine_latitude|destination_latitude
+----------+----------+ -----------------+---------------------+----------------+
|-7.1732833|32.0414966|-7.1732833        |-7.1732844           |32.0414966      |32.0414406
|-7.1732844|32.0414406|-7.1732844        |-7.1732833           |32.0414406      |32.0414966
|-7.1732833|32.0414966|-7.1732833        |-7.1732833           |32.0414966      |32.0414966
|-7.1732833|32.0414966|-7.1732833        |-7.1732833           |32.0414966      |32.0414966
|-7.1732833|32.0414966|-7.1732833        |-7.1732833           |32.0414966      |32.0414966
|-7.1732833|32.0414966|

我怎样才能用scala做到这一点，我是scala的新手，请帮忙。

谢谢。

您可以使用

df.withColumn（“origin_longitude，lit（-7.1732833））

您可以根据需要使用column函数链接任意多个

。
您可以使用窗口函数获取下一行（lead）并创建新列，但是lead要求我们使用orderBy，如果我在纬度/经度上按orderBy，您的数据帧顺序将不会被保留，因此，我手动创建了一个seq列来保留您的顺序。在实际数据中，应该有一列帮助您排序
%scala
val df=Seq(
       (1,-7.1732833,32.0414966),
       (2,-7.1732844,32.0414406),
       (3,-7.1732833,32.0414966),
       (4,-7.1732833,32.0414966),
       (5,-7.1732833,32.0414966),
       (6,-7.1732833,32.0414966)
        ).toDF("seq","longitude","latitude")

df.show()

+---+----------+----------+
|seq| longitude|  latitude|
+---+----------+----------+
|  1|-7.1732833|32.0414966|
|  2|-7.1732844|32.0414406|
|  3|-7.1732833|32.0414966|
|  4|-7.1732833|32.0414966|
|  5|-7.1732833|32.0414966|
|  6|-7.1732833|32.0414966|
+---+----------+----------+


import org.apache.spark.sql.functions.lead 
import org.apache.spark.sql.functions.col 

val w = org.apache.spark.sql.expressions.Window.orderBy("date").orderBy("seq")

df.withColumn("destination_longitude", lead("longitude",1,0).over(w)).withColumn("destination_latitude", lead("latitude",1,0).over(w)).select(col("longitude").alias("origin_longitude"),col("destination_longitude"),col("latitude").alias("origin_latitude"),col("destination_latitude")).filter(col("destination_longitude")!==0.0).show()

+----------------+---------------------+---------------+--------------------+
|origin_longitude|destination_longitude|origin_latitude|destination_latitude|
+----------------+---------------------+---------------+--------------------+
|      -7.1732833|           -7.1732844|     32.0414966|          32.0414406|
|      -7.1732844|           -7.1732833|     32.0414406|          32.0414966|
|      -7.1732833|           -7.1732833|     32.0414966|          32.0414966|
|      -7.1732833|           -7.1732833|     32.0414966|          32.0414966|
|      -7.1732833|           -7.1732833|     32.0414966|          32.0414966|
+----------------+---------------------+---------------+--------------------+

这回答了你的问题吗？问题不在于With列，而在于如何将下一行值放入此行。非常感谢，但当我尝试时，我得到了以下错误：错误：（25333）值！==不是org.apache.spark.sql.Column scala的成员。任何想法！谢谢你的回复，这对我很有效。