Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 动态重命名dataframe中的列,然后与另一个表联接_Scala_Apache Spark_Apache Spark Sql - Fatal编程技术网

Scala 动态重命名dataframe中的列,然后与另一个表联接

Scala 动态重命名dataframe中的列,然后与另一个表联接,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,在dataframe中,我有一个如下所示的属性表 在要重命名的列中 我必须根据此输入重命名该列 如果cust_id标志为yes,我只想加入customer表 在最后的输出中,我想用实际的列名显示哈希列值 val maintab\u df=maintable val cust_df=customertable 将主表列e重命名为a后连接主表和客户表 maintable.a=customertable.a 以下是一个如何操作的示例: propertydf.show +-------------

在dataframe中,我有一个如下所示的属性表

在要重命名的列中

  • 我必须根据此输入重命名该列
  • 如果cust_id标志为yes,我只想加入customer表
  • 在最后的输出中,我想用实际的列名显示哈希列值
  • val maintab\u df=maintable

    val cust_df=customertable

    将主表列e重命名为a后连接主表和客户表

    maintable.a=customertable.a


    以下是一个如何操作的示例:

    propertydf.show
    +-----------------+------------+
    |columns-to-rename|cust_id_flag|
    +-----------------+------------+
    |(e to a),(d to b)|           Y|
    +-----------------+------------+
    
    val columns_to_rename = propertydf.head(1)(0).getAs[String]("columns-to-rename")
    val cust_id_flag = propertydf.head(1)(0).getAs[String]("cust_id_flag")
    
    val parsed_columns = columns_to_rename.split(",")
        .map(c => c.replace("(", "").replace(")", "")
        .split(" to "))
    // parsed_columns: Array[Array[String]] = Array(Array(e, a), Array(d, b))
    
    val rename_columns = maintab_df.columns.map(c => {
        val matched = parsed_columns.filter(p => c == p(0))
        if (matched.size != 0)
            col(c).as(matched(0)(1).toString) 
        else 
            col(c)
    })
    // rename_columns: Array[org.apache.spark.sql.Column] = Array(e AS `a`, f, c, d AS `b`)
    
    val select_columns = maintab_df.columns.map(c => {
        val matched = parsed_columns.filter(p => c == p(0))
        if (matched.size != 0) 
            col(matched(0)(1) + "_hash").as(matched(0)(1).toString) 
        else 
            col(c)
    })
    // select_columns: Array[org.apache.spark.sql.Column] = Array(a_hash AS `a`, f, c, b_hash AS `b`)
    
    val join_cond = parsed_columns.map(_(1))
    // join_cond: Array[String] = Array(a, b)
    
    if (cust_id_flag == "Y") {
        val result = maintab_df.select(rename_columns:_*)
                               .join(cust_df, join_cond)
                               .select(select_columns:_*)
    } else {
        val result = maintab_df
    }
    
    result.show
    +------+---+---+--------+
    |     a|  f|  c|       b|
    +------+---+---+--------+
    |*****!|  1| 11|    &&&&|
    | ****%|  2| 12|;;;;;;;;|
    |*****@|  3| 13|  \\\\\\|
    +------+---+---+--------+
    

    它不应该是从d到b而不是从f到b吗?哦,是的,对了,应该是从d到b!!!