Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/haskell/9.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Join Pyspark-如何根据查找表重命名列名?_Join_Pyspark_Rename_Lookup - Fatal编程技术网

Join Pyspark-如何根据查找表重命名列名?

Join Pyspark-如何根据查找表重命名列名?,join,pyspark,rename,lookup,Join,Pyspark,Rename,Lookup,我有两个表格如下: 表1: 表2: 我想用表2中的日期列替换表1的名称 最终输出应如下表所示: 感谢您的帮助 谢谢大家! 考虑到表2是列名映射,我假设表2没有那么大,否则将它们带到驱动程序中会有内存问题。试试这个 tst1=sqlContext.createDataFrame([(1,2,3,4,5,6,7,8),(5,6,7,8,9,10,11,12),(13,14,15,16,17,18,19,20)],["a","b","c"

我有两个表格如下:

表1:

表2:

我想用表2中的日期列替换表1的名称

最终输出应如下表所示:

感谢您的帮助


谢谢大家!

考虑到表2是列名映射,我假设表2没有那么大,否则将它们带到驱动程序中会有内存问题。试试这个

tst1=sqlContext.createDataFrame([(1,2,3,4,5,6,7,8),(5,6,7,8,9,10,11,12),(13,14,15,16,17,18,19,20)],["a","b","c","d","e","f","g","h"])
tst2=sqlContext.createDataFrame([('a','apple'),('b','ball'),('c','cat'),('d','dog'),('e','elephant'),('f','fox'),('g','goat'),('h','hat')],["short","long"])
tst1.show()
+---+---+---+---+---+---+---+---+
|  a|  b|  c|  d|  e|  f|  g|  h|
+---+---+---+---+---+---+---+---+
|  1|  2|  3|  4|  5|  6|  7|  8|
|  5|  6|  7|  8|  9| 10| 11| 12|
| 13| 14| 15| 16| 17| 18| 19| 20|
+---+---+---+---+---+---+---+---+
# Collect the table 2 to extract the mapping
tst_cl = tst2.collect()
# get the old and new names of the columns
old_name=[str(tst_cl[i][0]) for i in range(len(tst_cl))]
new_name=[str(tst_cl[i][1]) for i in range(len(tst_cl))]
# Rename the columns
tst_rn = tst1.select(old_name).toDF(*new_name)
tst_rn.show()
+-----+----+---+---+--------+---+----+---+
|apple|ball|cat|dog|elephant|fox|goat|hat|
+-----+----+---+---+--------+---+----+---+
|    1|   2|  3|  4|       5|  6|   7|  8|
|    5|   6|  7|  8|       9| 10|  11| 12|
|   13|  14| 15| 16|      17| 18|  19| 20|
+-----+----+---+---+--------+---+----+---+
收集列映射后,可以使用此处使用的任何重命名技术:

提示:在收集过程中,如果你面临一些次序不匹配问题(大多数情况下你不会,但是如果你想要三重肯定),那么考虑结合F.ARRAY()方法在表2中的映射,然后收集。映射必须稍微更改

tst_array= tst2.withColumn("name_array",F.array(F.col('short'),F.col('long')))
tst_clc = tst_array.collect()
old_name = [str(tst_clc[i][2][0]) for i in range(len(tst_clc))]
new_name = [str(tst_clc[i][2][1]) for i in range(len(tst_clc))]