Pyspark V3.0如何从dataframe创建mapcolumn

Pyspark V3.0如何从dataframe创建mapcolumn,pyspark,pyspark-dataframes,Pyspark,Pyspark Dataframes,我试图理解如何使用create_map作为数据帧列的查找值。我已经参考了StackOverflow中的一些代码,但它们似乎都没有给我正确的输出。 下面是我的代码和错误输出。查看文档时,我使用了相同的语法来创建列,但是使用itertools的map列和使用data frame列的map列似乎有所不同 关于使用数据框中的列创建输出的正确语法有什么建议吗 import pyspark.sql.functions as fn from itertools import chain lookup_dt

我试图理解如何使用create_map作为数据帧列的查找值。我已经参考了StackOverflow中的一些代码,但它们似乎都没有给我正确的输出。 下面是我的代码和错误输出。查看文档时,我使用了相同的语法来创建列,但是使用itertools的map列和使用data frame列的map列似乎有所不同

关于使用数据框中的列创建输出的正确语法有什么建议吗

import pyspark.sql.functions as fn
from itertools import chain

lookup_dt = {1:'Elephant', 2:'Tiger', 3:'Moose', 4: 'Bear'}
cust_schema= StructType([StructField('Id',IntegerType()), StructField('Animal', StringType(),True)])
lookup_df = spark.createDataFrame([(k,v) for k,v in lookup_dt.items()], cust_schema)

values = [(1,300), (3,400), (4, 600), (2,240)]
pop_df = spark.createDataFrame(values, ['animalId','Pop'])

#creating a map column from the lookup dataframe
mapper = fn.create_map([lookup_df.Id, lookup_df.Animal])

pop_df.withColumn('city', mapper[pop_df['animalId']]).show()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/apache-spark/3.0.1/libexec/python/pyspark/sql/dataframe.py", line 2096, in withColumn
    return DataFrame(self._jdf.withColumn(colName, col._jc), self.sql_ctx)
  File "/usr/local/Cellar/apache-spark/3.0.1/libexec/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
  File "/usr/local/Cellar/apache-spark/3.0.1/libexec/python/pyspark/sql/utils.py", line 134, in deco
    raise_from(converted)
  File "<string>", line 3, in raise_from
pyspark.sql.utils.AnalysisException: Resolved attribute(s) Id#0,Animal#1 missing from animalId#4L,Pop#5L in operator !Project [animalId#4L, Pop#5L, map(Id#0, Animal#1)[cast(animalId#4L as int)] AS animal#8].;;
!Project [animalId#4L, Pop#5L, map(Id#0, Animal#1)[cast(animalId#4L as int)] AS animal#8]
+- LogicalRDD [animalId#4L, Pop#5L], false


>>> fn.create_map(*[lookup_df.Id, lookup_df.Animal])
Column<b'map(Id, Animal)'>
>>> 
>>> fn.create_map([fn.lit(x) for x in chain(*lookup_dt.items())])  
Column<b'map(1, Elephant, 2, Tiger, 3, Moose, 4, Bear)'>
>>>
>>> map_expr = fn.create_map([fn.lit(x) for x in chain(*lookup_dt.items())])
>>> pop_df.withColumn('animal', map_expr[pop_df['animalId']]).show() ##right output with mapType
+--------+---+--------+                                                         
|animalId|Pop|  animal|
+--------+---+--------+
|       1|300|Elephant|
|       3|400|   Moose|
|       4|600|    Bear|
|       2|240|   Tiger|
+--------+---+--------+


import pyspark.sql.函数为fn
来自itertools进口链
查找_dt={1:'大象',2:'老虎',3:'驼鹿',4:'熊'}
cust_schema=StructType([StructField('Id',IntegerType()),StructField('Animal',StringType(),True)])
lookup_df=spark.createDataFrame([(k,v)表示lookup_dt.items()中的k,v],客户模式)
值=[(1300),(3400),(4600),(2240)]
pop_df=spark.createDataFrame(值,['animalId','pop'])
#从查找数据框创建映射列
mapper=fn.create_map([lookup_df.Id,lookup_df.Animal])
pop_df.with column('city',mapper[pop_df['animalId']])。show()
回溯(最近一次呼叫最后一次):
文件“”,第1行,在
文件“/usr/local/ceral/apache spark/3.0.1/libexec/python/pyspark/sql/dataframe.py”,第2096行,在With列中
返回数据帧(self.\u jdf.withColumn(colName,col.\u jc),self.sql\u ctx)
文件“/usr/local/ceral/apache spark/3.0.1/libexec/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py”,第1305行,在调用__
文件“/usr/local/ceral/apachespark/3.0.1/libexec/python/pyspark/sql/utils.py”,第134行,deco格式
从(已转换的)中提升
文件“”,第3行,从
pyspark.sql.utils.AnalysisException:已解析属性Id#0,animalId#4L中缺少动物#1,运算符中缺少Pop#5L!项目[animalId#4L,Pop#5L,map(Id#0,Animal#1)[cast(animalId#4L作为int)]作为Animal#8]。;;
!项目[animalId#4L,Pop#5L,map(Id#0,Animal#1)[cast(animalId#4L作为int)]作为Animal#8]
+-LogicalRDD[animalId#4L,Pop#5L],错误
>>>fn.创建地图(*[lookup\u df.Id,lookup\u df.Animal])
纵队
>>> 
>>>fn.create_map([fn.lit(x)表示链中的x(*lookup_dt.items())]))
纵队
>>>
>>>map_expr=fn.create_map([fn.lit(x)表示链中的x(*lookup_dt.items())]))
>>>pop_df.with column('animal',map_expr[pop_df['animalId'])。show()##右输出与mapType
+--------+---+--------+                                                         
|动物|
+--------+---+--------+
|1头300头大象|
|3 | 400 |驼鹿|
|4 | 600 |熊|
|2 | 240 |虎|
+--------+---+--------+