rdd pyspark/python上flatmap中的处理错误

rdd pyspark/python上flatmap中的处理错误,pyspark,rdd,databricks,flatmap,delta-lake,Pyspark,Rdd,Databricks,Flatmap,Delta Lake,我正在使用一个用户定义的函数(readByteUFF)来读取文件、执行内容转换并返回一个pyspark.sql行 我在以下设置中对rdd(一个应遵循相同模式的大型文件集合)使用此函数: mapped_rdd = rdd.map(readByteUFF) df = mapped_rdd.toDF() df.write.format("delta").mode("append").option("mergeSchema", "true").option("path", "dbfs:/mnt/")

我正在使用一个用户定义的函数(readByteUFF)来读取文件、执行内容转换并返回一个pyspark.sql行

我在以下设置中对rdd(一个应遵循相同模式的大型文件集合)使用此函数:

mapped_rdd = rdd.map(readByteUFF)

df = mapped_rdd.toDF()

df.write.format("delta").mode("append").option("mergeSchema", "true").option("path", "dbfs:/mnt/").saveAsTable("table")
当文件遵循相同的模式时,这很有效,但在函数返回错误或
None
的情况下,整个过程都会抛出错误

org.apache.spark.SparkException:作业中止

---------------------------------------------------------------------------
Py4JJavaError回溯(最近一次调用)
在里面
8.
9#df.createOrReplaceTempView(“表格”)
--->10 df.write.format(“delta”).mode(“append”).option(“mergeSchema”、“true”).option(“path”、“dbfs:/mnt/”).saveAsTable(“table”)
/saveAsTable中的databricks/spark/python/pyspark/sql/readwriter.py(self、name、format、mode、partitionBy、**选项)
775如果格式不是无:
776 self.format(格式)
-->777 self.\u jwrite.saveAsTable(名称)
778
779@自(1.4)
/调用中的databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py(self,*args)
1255 answer=self.gateway\u client.send\u命令(command)
1256返回值=获取返回值(
->1257应答,self.gateway_客户端,self.target_id,self.name)
1258
1259对于临时参数中的临时参数:
/deco中的databricks/spark/python/pyspark/sql/utils.py(*a,**kw)
61 def装饰(*a,**千瓦):
62尝试:
--->63返回f(*a,**kw)
64除py4j.protocol.Py4JJavaError外的其他错误为e:
65 s=e.java_exception.toString()
/获取返回值(应答、网关客户端、目标id、名称)中的databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py
326 raise Py4JJavaError(
327“调用{0}{1}{2}时出错。\n”。
-->328格式(目标id,“.”,名称),值)
329其他:
330升起Py4JError(
我希望能够在处理剩余文件时标记导致问题的文件。是否有方法在flatMap()功能中处理此问题

---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<command-3680885025756706> in <module>
      8 
      9 #df.createOrReplaceTempView("table")
---> 10 df.write.format("delta").mode("append").option("mergeSchema", "true").option("path", "dbfs:/mnt/").saveAsTable("table")


/databricks/spark/python/pyspark/sql/readwriter.py in saveAsTable(self, name, format, mode, partitionBy, **options)
    775         if format is not None:
    776             self.format(format)
--> 777         self._jwrite.saveAsTable(name)
    778 
    779     @since(1.4)

/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1255         answer = self.gateway_client.send_command(command)
   1256         return_value = get_return_value(
-> 1257             answer, self.gateway_client, self.target_id, self.name)
   1258 
   1259         for temp_arg in temp_args:

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
     61     def deco(*a, **kw):
     62         try:
---> 63             return f(*a, **kw)
     64         except py4j.protocol.Py4JJavaError as e:
     65             s = e.java_exception.toString()

/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    326                 raise Py4JJavaError(
    327                     "An error occurred while calling {0}{1}{2}.\n".
--> 328                     format(target_id, ".", name), value)
    329             else:
    330                 raise Py4JError(