Python PySpark无法将任何数据文件理解为数据帧

Python PySpark无法将任何数据文件理解为数据帧,python,apache-spark,pyspark,apache-spark-sql,parquet,Python,Apache Spark,Pyspark,Apache Spark Sql,Parquet,在过去的几天里,我遇到了一个奇怪的错误,无法解决这个问题 我正在使用pyspark并试图将csv加载到DF[下面的代码]中,它给出了相同的错误: py4j.protocol.Py4JJavaError: An error occurred while calling o10.textFile. : java.lang.reflect.InaccessibleObjectException: Unable to make field transient java.lang.Object[] 堆栈跟

在过去的几天里,我遇到了一个奇怪的错误,无法解决这个问题

  • 我正在使用pyspark并试图将csv加载到DF[下面的代码]中,它给出了相同的错误: py4j.protocol.Py4JJavaError: An error occurred while calling o10.textFile. : java.lang.reflect.InaccessibleObjectException: Unable to make field transient java.lang.Object[] 堆栈跟踪错误:

    File "/home/v/scripts/g_s_pipe/a.py", line 14, in Employee_rdd = sc.textFile("abc.csv").map(lambda line: line.split(",")) File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/pyspark/context.py", line 476, in textFile return RDD(self._jsc.textFile(name, minPartitions), self, File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py", line 1133, in __call__ File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, in deco return f(*a, **kw) File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o10.textFile. : java.lang.reflect.InaccessibleObjectException: Unable to make field transient java.lang.Object[] java.util.ArrayList.elementData accessible: module java.base does not "opens java.util" to unnamed module @51b63e70 at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:335) at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:278) at java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:175) at java.base/java.lang.reflect.Field.setAccessible(Field.java:169) 它也给出了相同的错误:

    py4j.protocol.Py4JJavaError: An error occurred while calling o10.textFile. : java.lang.reflect.InaccessibleObjectException: Unable to make field transient java.lang.Object[] 17/06/12 21:17:21 WARN BlockManager: Putting block broadcast_1 failed due to an exception 17/06/12 21:17:21 WARN BlockManager: Block broadcast_1 could not be removed as it was not found on disk or in memory Traceback (most recent call last): File "/home/vna/scripts/global_score_pipeline/test_code_here.py", line 62, in print (df.count()) File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/pyspark/sql/dataframe.py", line 299, in count return int(self._jdf.count()) File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py", line 1133, in __call__ File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, in deco return f(*a, **kw) File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o25.count. : java.lang.reflect.InaccessibleObjectException: Unable to make field transient java.lang.Object[] java.util.ArrayList.elementData accessible: module java.base does not "opens java.util" to unnamed module @5e37932e at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:335) at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:278) at java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:175) at java.base/java.lang.reflect.Field.setAccessible(Field.java:169) at org.apache.spark.util.SizeEstimator$$anonfun$getClassInfo$3.apply(SizeEstimator.scala:336) at org.apache.spark.util.SizeEstimator$$anonfun$getClassInfo$3.apply(SizeEstimator.scala:330) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at org.apache.spark.util.SizeEstimator$.getClassInfo(SizeEstimator.scala:330) 17/06/12 21:17:21警告块管理器:由于异常,放置块广播_1失败 17/06/12 21:17:21警告块管理器:无法删除块广播_1,因为在磁盘或内存中找不到它 回溯(最近一次呼叫最后一次): 文件“/home/vna/scripts/global_score_pipeline/test_code_here.py”,第62行,在 打印(df.count()) 文件“/opt/apps/spark-2.0.1-bin-hadoop2.7/python/pyspark/sql/dataframe.py”,第299行,计入 返回int(self.\u jdf.count()) 文件“/opt/apps/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py”,第1133行,在__ 文件“/opt/apps/spark-2.0.1-bin-hadoop2.7/python/pyspark/sql/utils.py”,第63行,deco格式 返回f(*a,**kw) 文件“/opt/apps/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py”,第319行,在get_return_值中 py4j.protocol.Py4JJavaError:调用o25.count时出错。 :java.lang.reflect.InAccessibleObject异常:无法使字段临时java.lang.Object[]java.util.ArrayList.elementData可访问:模块java.base未“打开java.util”到未命名模块@5e37932e 位于java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:335) 位于java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:278) 位于java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:175) 位于java.base/java.lang.reflect.Field.setAccessible(Field.java:169) 位于org.apache.spark.util.SizeEstimator$$anonfun$getClassInfo$3.apply(SizeEstimator.scala:336) 在org.apache.spark.util.SizeEstimator$$anonfun$getClassInfo$3.apply上(SizeEstimator.scala:330) 在scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) 位于scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) 位于org.apache.spark.util.SizeEstimator$.getClassInfo(SizeEstimator.scala:330)
    这个不可访问的对象异常是什么,我在Google上找不到多少帮助以及如何解决这个问题???

    它不会帮助您理解Spark的问题,但是如果您只想将文件从CSV转换为Apache Parquet,您可以使用Pandas和Apache Arrow(python包称为Pyarow)并跳过Java堆栈的这一部分。感谢您指向箭头。但我已经创建了一个小例子拼花使用箭头,但它仍然无法阅读,甚至这个拼花。。。我也有同样的错误,这里有什么问题? 17/06/12 21:17:21 WARN BlockManager: Putting block broadcast_1 failed due to an exception 17/06/12 21:17:21 WARN BlockManager: Block broadcast_1 could not be removed as it was not found on disk or in memory Traceback (most recent call last): File "/home/vna/scripts/global_score_pipeline/test_code_here.py", line 62, in print (df.count()) File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/pyspark/sql/dataframe.py", line 299, in count return int(self._jdf.count()) File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py", line 1133, in __call__ File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, in deco return f(*a, **kw) File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o25.count. : java.lang.reflect.InaccessibleObjectException: Unable to make field transient java.lang.Object[] java.util.ArrayList.elementData accessible: module java.base does not "opens java.util" to unnamed module @5e37932e at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:335) at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:278) at java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:175) at java.base/java.lang.reflect.Field.setAccessible(Field.java:169) at org.apache.spark.util.SizeEstimator$$anonfun$getClassInfo$3.apply(SizeEstimator.scala:336) at org.apache.spark.util.SizeEstimator$$anonfun$getClassInfo$3.apply(SizeEstimator.scala:330) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at org.apache.spark.util.SizeEstimator$.getClassInfo(SizeEstimator.scala:330)