Apache spark 如何相对于其他数据帧更改数据帧的列名

Apache spark 如何相对于其他数据帧更改数据帧的列名,apache-spark,dataframe,pyspark,apache-spark-sql,pyspark-sql,Apache Spark,Dataframe,Pyspark,Apache Spark Sql,Pyspark Sql,我需要使用pyspark更改数据帧df相对于其他数据帧df\u col的列名 df 德福沟 +-----------+-----------+ |col_current|col_updated| +-----------+-----------+ | id| Row_id| | name| Name| | code| Row_code| | Work| Work_Code| +-----------+---------

我需要使用pyspark更改数据帧
df
相对于其他数据帧
df\u col
的列名

df

德福沟

+-----------+-----------+
|col_current|col_updated|
+-----------+-----------+
|         id|     Row_id|
|       name|       Name|
|       code|   Row_code|
|       Work|  Work_Code|
+-----------+-----------+
如果df列与COLU current匹配,则df列应替换为COLU updated。例如:如果df.id与df.col_current匹配,则df.id应替换为Row_id

预期产量

Row_id,Name,Row_code,Work_code
101,John,ASD,DEV
102,ben,klj,prod

注意:我希望这个过程是动态的

只需将
df\u col
收集为字典:

df = spark.createDataFrame(
    [("ASD", "101" "John", "DEV"), ("klj","102", "ben", "prod")],
    ("code", "id", "name", "work")
)

df_col = spark.createDataFrame(
    [("id", "Row_id"), ("name", "Name"), ("code", "Row_code"), ("Work", "Work_Code")],
    ("col_current", "col_updated")
)

name_dict = df_col.rdd.collectAsMap()
{'Work': 'Work_Code', 'code': 'Row_code', 'id': 'Row_id', 'name': 'Name'}
并使用
选择
和列表理解:

df.select([df[c].alias(name_dict.get(c, c)) for c in df.columns]).printSchema()
# root
#  |-- Row_code: string (nullable = true)
#  |-- Row_id: string (nullable = true)
#  |-- Name: string (nullable = true)
#  |-- work: string (nullable = true)
其中
name\u dict
是标准Python字典:

df = spark.createDataFrame(
    [("ASD", "101" "John", "DEV"), ("klj","102", "ben", "prod")],
    ("code", "id", "name", "work")
)

df_col = spark.createDataFrame(
    [("id", "Row_id"), ("name", "Name"), ("code", "Row_code"), ("Work", "Work_Code")],
    ("col_current", "col_updated")
)

name_dict = df_col.rdd.collectAsMap()
{'Work': 'Work_Code', 'code': 'Row_code', 'id': 'Row_id', 'name': 'Name'}
name\u dict.get(c,c)
获取新名称、给定当前名称或当前名称(如果不匹配):

name_dict.get("code", "code")
# 'Row_code'

name_dict.get("work", "work")  # Case sensitive 
# 'work'

alias
只需将列(
df[col]
)重命名为从
name\u dict.get

返回的名称,只需将
df\u col
收集为字典:

df = spark.createDataFrame(
    [("ASD", "101" "John", "DEV"), ("klj","102", "ben", "prod")],
    ("code", "id", "name", "work")
)

df_col = spark.createDataFrame(
    [("id", "Row_id"), ("name", "Name"), ("code", "Row_code"), ("Work", "Work_Code")],
    ("col_current", "col_updated")
)

name_dict = df_col.rdd.collectAsMap()
{'Work': 'Work_Code', 'code': 'Row_code', 'id': 'Row_id', 'name': 'Name'}
并使用
选择
和列表理解:

df.select([df[c].alias(name_dict.get(c, c)) for c in df.columns]).printSchema()
# root
#  |-- Row_code: string (nullable = true)
#  |-- Row_id: string (nullable = true)
#  |-- Name: string (nullable = true)
#  |-- work: string (nullable = true)
其中
name\u dict
是标准Python字典:

df = spark.createDataFrame(
    [("ASD", "101" "John", "DEV"), ("klj","102", "ben", "prod")],
    ("code", "id", "name", "work")
)

df_col = spark.createDataFrame(
    [("id", "Row_id"), ("name", "Name"), ("code", "Row_code"), ("Work", "Work_Code")],
    ("col_current", "col_updated")
)

name_dict = df_col.rdd.collectAsMap()
{'Work': 'Work_Code', 'code': 'Row_code', 'id': 'Row_id', 'name': 'Name'}
name\u dict.get(c,c)
获取新名称、给定当前名称或当前名称(如果不匹配):

name_dict.get("code", "code")
# 'Row_code'

name_dict.get("work", "work")  # Case sensitive 
# 'work'
alias
只需将列(
df[col]
)重命名为从
name\u dict.get
返回的名称