Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/google-app-engine/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 如何用另一个dataframe列替换dataframe列_Scala_Apache Spark_Apache Spark Sql - Fatal编程技术网

Scala 如何用另一个dataframe列替换dataframe列

Scala 如何用另一个dataframe列替换dataframe列,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,我有两个数据帧: dataframe1 DATE1| +----------+ |2017-01-08| |2017-10-10| |2017-05-01| dataframe2 | NAME | SID | DATE1 | DATE2 | ROLL | SCHOOL| +------+----+----------+----------+----+--------+ |萨亚姆| 22.0 | 8/1/2017 | 7 1 2017 | 3223 |巴巴| |阿达什| 2.0 | 10-10-

我有两个数据帧:

dataframe1

DATE1|
+----------+
|2017-01-08|
|2017-10-10|
|2017-05-01|
dataframe2

| NAME | SID | DATE1 | DATE2 | ROLL | SCHOOL|
+------+----+----------+----------+----+--------+
|萨亚姆| 22.0 | 8/1/2017 | 7 1 2017 | 3223 |巴巴|
|阿达什| 2.0 | 10-10-2017 | 10.03.2017 | 222 |阳光|
|萨迪姆| 1.0 | 1.5.2017 | 1/2/2017 | 111 | DAV|
预期产量

| NAME | SID | DATE1 | DATE2 | ROLL | SCHOOL|
+------+----+----------+----------+----+--------+
|萨亚姆| 22.0 | 2017-01-08 | 7 1 2017 | 3223 |巴巴|
|阿达什| 2.0 | 2017-10-10 | 10.03.2017 | 222 |阳光|
|SADIM | 1.0 | 2017-05-01 | 1/2/2017 | 111 | DAV|
我想用dataframe1的
DATE1
列替换dataframe2中的
DATE1
列。我需要一个通用的解决方案

任何帮助都将不胜感激

我尝试了以下列
方法

dataframe2.withColumn(newColumnTransformInfo._1, dataframe1.col("DATE1").cast(DateType))
但是,我得到了一个错误:

org.apache.spark.sql.AnalysisException:已解析属性

无法从其他数据帧添加列

您可以做的是连接两个数据帧并保留所需的列,这两个数据帧必须有一个公共连接列。如果您没有公共列且数据符合顺序,则可以为两个dataframe分配一个递增的id,然后加入

这是你的案例的简单例子

//Dummy data
  val df1 = Seq(
    ("2017-01-08"),
    ("2017-10-10"),
    ("2017-05-01")
  ).toDF("DATE1")

  val df2 = Seq(
    ("Sayam", 22.0, "2017-01-08", "7 1 2017", 3223, "BHABHA"),
    ("ADARSH", 2.0, "2017-10-10", "10.03.2017", 222, "SUNSHINE"),
    ("SADIM", 1.0, "2017-05-01", "1/2/2017", 111, "DAV")
  ).toDF("NAME", "SID", "DATE1", "DATE2", "ROLL", "SCHOOL")

  //create new Dataframe1 with new column id
  val rows1 = df1.rdd.zipWithIndex().map{
    case (r: Row, id: Long) => Row.fromSeq(id +: r.toSeq)}
  val dataframe1 = spark.createDataFrame(rows1, StructType(StructField("id", LongType, false) +: df1.schema.fields))

  //create new Dataframe2 with new column id
  val rows2= df2.rdd.zipWithIndex().map{
    case (r: Row, id: Long) => Row.fromSeq(id +: r.toSeq)}
  val dataframe2 = spark.createDataFrame(rows2, StructType(StructField("id", LongType, false) +: df2.schema.fields))


  dataframe2.drop("DATE1")
    .join(dataframe1, "id")
    .drop("id").show()
输出:

+------+----+----------+----+--------+----------+
|  NAME| SID|     DATE2|ROLL|  SCHOOL|     DATE1|
+------+----+----------+----+--------+----------+
| Sayam|22.0|  7 1 2017|3223|  BHABHA|2017-01-08|
|ADARSH| 2.0|10.03.2017| 222|SUNSHINE|2017-10-10|
| SADIM| 1.0|  1/2/2017| 111|     DAV|2017-05-01|
+------+----+----------+----+--------+----------+

希望这有帮助

无法从其他数据帧添加列

您可以做的是连接两个数据帧并保留所需的列,这两个数据帧必须有一个公共连接列。如果您没有公共列且数据符合顺序,则可以为两个dataframe分配一个递增的id,然后加入

这是你的案例的简单例子

//Dummy data
  val df1 = Seq(
    ("2017-01-08"),
    ("2017-10-10"),
    ("2017-05-01")
  ).toDF("DATE1")

  val df2 = Seq(
    ("Sayam", 22.0, "2017-01-08", "7 1 2017", 3223, "BHABHA"),
    ("ADARSH", 2.0, "2017-10-10", "10.03.2017", 222, "SUNSHINE"),
    ("SADIM", 1.0, "2017-05-01", "1/2/2017", 111, "DAV")
  ).toDF("NAME", "SID", "DATE1", "DATE2", "ROLL", "SCHOOL")

  //create new Dataframe1 with new column id
  val rows1 = df1.rdd.zipWithIndex().map{
    case (r: Row, id: Long) => Row.fromSeq(id +: r.toSeq)}
  val dataframe1 = spark.createDataFrame(rows1, StructType(StructField("id", LongType, false) +: df1.schema.fields))

  //create new Dataframe2 with new column id
  val rows2= df2.rdd.zipWithIndex().map{
    case (r: Row, id: Long) => Row.fromSeq(id +: r.toSeq)}
  val dataframe2 = spark.createDataFrame(rows2, StructType(StructField("id", LongType, false) +: df2.schema.fields))


  dataframe2.drop("DATE1")
    .join(dataframe1, "id")
    .drop("id").show()
输出:

+------+----+----------+----+--------+----------+
|  NAME| SID|     DATE2|ROLL|  SCHOOL|     DATE1|
+------+----+----------+----+--------+----------+
| Sayam|22.0|  7 1 2017|3223|  BHABHA|2017-01-08|
|ADARSH| 2.0|10.03.2017| 222|SUNSHINE|2017-10-10|
| SADIM| 1.0|  1/2/2017| 111|     DAV|2017-05-01|
+------+----+----------+----+--------+----------+

希望这有帮助

现在它正在工作。如果其他测试用例有任何问题,我会告诉您。现在它正在工作。如果其他测试用例有任何问题,我会告诉您。