Scala 如何将数据帧的选定列移动到数据帧的末尾（重新排列列位置）？_Scala_Apache Spark_Dataframe_Hive

Scala 如何将数据帧的选定列移动到数据帧的末尾（重新排列列位置）？

scala apache-spark dataframe hive

Scala 如何将数据帧的选定列移动到数据帧的末尾（重新排列列位置）？,scala,apache-spark,dataframe,hive,Scala,Apache Spark,Dataframe,Hive,我正试图将RDBMS（Greenplum）表摄入蜂巢。我阅读了该表并从中获得了如下数据帧： val yearDF = spark.read.format("jdbc").option("url", connectionUrl) .option("dbtable", "(select * from schema.table where source_system_name='DB2' and pe

我正试图将RDBMS（Greenplum）表摄入蜂巢。我阅读了该表并从中获得了如下数据帧：

val yearDF = spark.read.format("jdbc").option("url", connectionUrl)
                                                   .option("dbtable", "(select * from schema.table where source_system_name='DB2' and period_year='2017') as year2017")
                                                   .option("user", devUserName)
                                                   .option("password", devPassword)
                                                   .option("numPartitions",15)
                                                   .load()

上述DF的模式为：

forecast_id:bigint
period_year:numeric(15,0)
period_num:numeric(15,0)
period_name:character varying(15)
source_system_name:character varying(30)
source_record_type:character varying(30)
ptd_balance:numeric
xx_data_hash_id:bigint
xx_pk_id:bigint

为了将上面的数据帧吸收到配置单元中，我将模式放入一个列表中，并将所有greenplum数据类型更改为与配置单元兼容的数据类型。我有一个映射：

dataMapper

，它告诉我们gp的数据类型应该转换成Hive的

class ChangeDataTypes(val gpColumnDetails: List[String], val dataMapper: Map[String, String]) {
  val dataMap: Map[String, String] = dataMapper
  def gpDetails(): String = {
    val hiveDataTypes = gpColumnDetails.map(_.split(":\\s*")).map(s => s(0) + " " + dMap(s(1))).mkString(",")
    hiveDataTypes
  }
  def dMap(gpColType: String): String = {
    val patterns = dataMap.keySet
    val mkey = patterns.dropWhile{
      p => gpColType != p.r.findFirstIn(gpColType).getOrElse("")
    }.headOption match {
      case Some(p) => p
      case None => ""
    }
    dataMap.getOrElse(mkey, "n/a")
  }
}

这些是执行上述代码后的数据类型：

forecast_id:bigint
period_year:bigint
period_num:bigint
period_name:String
source_system_name:String
source_record_type:String
ptd_balance:double
xx_data_hash_id:bigint
xx_pk_id:bigint

由于我的配置单元表是根据源\系统\名称和周期\年动态划分的，我需要通过将列数据：

source\u system\u name&period\u year

移动到数据帧的末尾来更改数据帧的内容，因为在数据帧中插入数据时，配置单元表的分区列应该是表的最后一列

谁能告诉我如何将列：source\u system\u name&period\u dataframe的年份：yearDF从当前位置移动到末尾（基本上是重新排列列）？

从主列表中提取列，然后在末尾追加并对数据框执行select：

val lastCols = Seq("col1","col2")
val allColOrdered = df.columns.diff(lastCols) ++ lastCols
val allCols = allColOrdered.map(cn => org.apache.spark.sql.functions.col(cn))
val result = df.select(allCols: _*)

它奏效了…非常感谢。。