Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/324.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 为什么即使列存在于数据帧中,PySpark仍会告知列名上的键错误?_Python_Apache Spark_Pyspark - Fatal编程技术网

Python 为什么即使列存在于数据帧中,PySpark仍会告知列名上的键错误?

Python 为什么即使列存在于数据帧中,PySpark仍会告知列名上的键错误?,python,apache-spark,pyspark,Python,Apache Spark,Pyspark,在从HDFS读取拼花文件后,我尝试将数据帧保存到snowflake中,如下所示。在将其加载到snowflake表之前,将向其添加一个ID列,该列中有序列号 df = spark.read.parquet('file_path') schema = StructType([StructField("ID", LongType(), True)] + df.schema.fields[:]) data_rdd = df.rdd.zipWithIndex() new_rdd

在从HDFS读取拼花文件后,我尝试将数据帧保存到snowflake中,如下所示。在将其加载到snowflake表之前,将向其添加一个ID列,该列中有序列号

 df = spark.read.parquet('file_path')
 schema = StructType([StructField("ID", LongType(), True)] + df.schema.fields[:])
 data_rdd = df.rdd.zipWithIndex()
 new_rdd = data_rdd.map(lambda row: (row[1],) + tuple(row[0].asDict()[c] for c in file_schema.fieldNames()[:-1]))
 final_df = spark.createDataFrame(new_rdd, schema)
 print(final_df.printSchema())
 final_df.show()
当我提交作业时,我可以看到数据帧的模式如下:

root
 |-- ID: long (nullable = true)
 |-- COL1: string (nullable = true)
 |-- COL2: string (nullable = true)
 |-- COL3: string (nullable = true)
 |-- COL4: string (nullable = true)
 |-- COLn: string (nullable = true)
但是错误出现在
final_df.show()行

无
回溯(最近一次呼叫最后一次):
文件“autocheck.py”,第66行,在
如果读取和加载拼花地板文件(文件路径):
文件“autocheck.py”,第42行,在读取和加载拼花文件中
最终设计图显示()
文件“/opt/hadoop/data/08/hadoop/thread/local/usercache/hdfstest/appcache/application_1603175231393_0446/container_e500_1603175231393_0446_02_000001/pyspark.zip/pyspark/sql/dataframe.py”,第350行,如图所示
文件“/opt/hadoop/data/08/hadoop/thread/local/usercache/hdfstest/appcache/application_1603175231393_0446/container_e500_1603175231393_0446_02_000001/py4j-0.10.7-src.zip/py4j/java_gateway.py”,第1257行,在__
文件“/opt/hadoop/data/08/hadoop/thread/local/usercache/hdfstest/appcache/application_1603175231393_0446/container_e500_1603175231393_0446_02_000001/pyspark.zip/pyspark/sql/utils.py”,第63行,装饰
文件“/opt/hadoop/data/08/hadoop/thread/local/usercache/hdfstest/appcache/application_1603175231393_0446/container_e500_1603175231393_0446_02_000001/py4j-0.10.7-src.zip/py4j/protocol.py”,第328行,在get_return_值中
py4j.protocol.Py4JJavaError:调用o131.showString时出错。
文件“autocheck.py”,第38行,在
文件“autocheck.py”,第38行,在
KeyError:'ID'
代码中的第38行是
new_rdd=data_rdd.map(lambda行:(行[1],)+tuple(行[0].asDict()[c]表示文件中的c_schema.fieldNames()[:-1])

我不明白我在这里该怎么办。我将ID列添加到lambda函数中的现有行中。但是我看到了错误
KeyError:'ID'
有没有人能告诉我我在这里犯了什么错误,我该如何改正

None
Traceback (most recent call last):
  File "autocheck.py", line 66, in <module>
    if read_and_load_parquet_files(file_path):
  File "autocheck.py", line 42, in read_and_load_parquet_files
    final_df.show()
  File "/opt/hadoop/data/08/hadoop/yarn/local/usercache/hdfstest/appcache/application_1603175231393_0446/container_e500_1603175231393_0446_02_000001/pyspark.zip/pyspark/sql/dataframe.py", line 350, in show
  File "/opt/hadoop/data/08/hadoop/yarn/local/usercache/hdfstest/appcache/application_1603175231393_0446/container_e500_1603175231393_0446_02_000001/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/opt/hadoop/data/08/hadoop/yarn/local/usercache/hdfstest/appcache/application_1603175231393_0446/container_e500_1603175231393_0446_02_000001/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "/opt/hadoop/data/08/hadoop/yarn/local/usercache/hdfstest/appcache/application_1603175231393_0446/container_e500_1603175231393_0446_02_000001/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o131.showString.
 File "autocheck.py", line 38, in <lambda>
  File "autocheck.py", line 38, in <genexpr>
KeyError: 'ID'