Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 如何将数据帧的选定列移动到数据帧的末尾(重新排列列位置)?_Scala_Apache Spark_Dataframe_Hive - Fatal编程技术网

Scala 如何将数据帧的选定列移动到数据帧的末尾(重新排列列位置)?

Scala 如何将数据帧的选定列移动到数据帧的末尾(重新排列列位置)?,scala,apache-spark,dataframe,hive,Scala,Apache Spark,Dataframe,Hive,我正试图将RDBMS(Greenplum)表摄入蜂巢。我阅读了该表并从中获得了如下数据帧: val yearDF = spark.read.format("jdbc").option("url", connectionUrl) .option("dbtable", "(select * from schema.table where source_system_name='DB2' and pe

我正试图将RDBMS(Greenplum)表摄入蜂巢。我阅读了该表并从中获得了如下数据帧:

val yearDF = spark.read.format("jdbc").option("url", connectionUrl)
                                                   .option("dbtable", "(select * from schema.table where source_system_name='DB2' and period_year='2017') as year2017")
                                                   .option("user", devUserName)
                                                   .option("password", devPassword)
                                                   .option("numPartitions",15)
                                                   .load()
上述DF的模式为:

forecast_id:bigint
period_year:numeric(15,0)
period_num:numeric(15,0)
period_name:character varying(15)
source_system_name:character varying(30)
source_record_type:character varying(30)
ptd_balance:numeric
xx_data_hash_id:bigint
xx_pk_id:bigint
为了将上面的数据帧吸收到配置单元中,我将模式放入一个列表中,并将所有greenplum数据类型更改为与配置单元兼容的数据类型。 我有一个映射:
dataMapper
,它告诉我们gp的数据类型应该转换成Hive的

class ChangeDataTypes(val gpColumnDetails: List[String], val dataMapper: Map[String, String]) {
  val dataMap: Map[String, String] = dataMapper
  def gpDetails(): String = {
    val hiveDataTypes = gpColumnDetails.map(_.split(":\\s*")).map(s => s(0) + " " + dMap(s(1))).mkString(",")
    hiveDataTypes
  }
  def dMap(gpColType: String): String = {
    val patterns = dataMap.keySet
    val mkey = patterns.dropWhile{
      p => gpColType != p.r.findFirstIn(gpColType).getOrElse("")
    }.headOption match {
      case Some(p) => p
      case None => ""
    }
    dataMap.getOrElse(mkey, "n/a")
  }
}
这些是执行上述代码后的数据类型:

forecast_id:bigint
period_year:bigint
period_num:bigint
period_name:String
source_system_name:String
source_record_type:String
ptd_balance:double
xx_data_hash_id:bigint
xx_pk_id:bigint
由于我的配置单元表是根据源\系统\名称和周期\年动态划分的,我需要通过将列数据:
source\u system\u name&period\u year
移动到数据帧的末尾来更改数据帧的内容,因为在数据帧中插入数据时,配置单元表的分区列应该是表的最后一列


谁能告诉我如何将列:source\u system\u name&period\u dataframe的年份:yearDF从当前位置移动到末尾(基本上是重新排列列)?

从主列表中提取列,然后在末尾追加并对数据框执行select:

val lastCols = Seq("col1","col2")
val allColOrdered = df.columns.diff(lastCols) ++ lastCols
val allCols = allColOrdered.map(cn => org.apache.spark.sql.functions.col(cn))
val result = df.select(allCols: _*)

它奏效了…非常感谢。。