- python/
- Python PySpark无法将任何数据文件理解为数据帧
Python PySpark无法将任何数据文件理解为数据帧
Python PySpark无法将任何数据文件理解为数据帧,python,apache-spark,pyspark,apache-spark-sql,parquet,Python,Apache Spark,Pyspark,Apache Spark Sql,Parquet,在过去的几天里,我遇到了一个奇怪的错误,无法解决这个问题
我正在使用pyspark并试图将csv加载到DF[下面的代码]中,它给出了相同的错误:
py4j.protocol.Py4JJavaError: An error occurred while calling o10.textFile.
: java.lang.reflect.InaccessibleObjectException: Unable to make field transient java.lang.Object[]
堆栈跟
在过去的几天里,我遇到了一个奇怪的错误,无法解决这个问题
我正在使用pyspark并试图将csv加载到DF[下面的代码]中,它给出了相同的错误:
py4j.protocol.Py4JJavaError: An error occurred while calling o10.textFile.
: java.lang.reflect.InaccessibleObjectException: Unable to make field transient java.lang.Object[]
堆栈跟踪错误:
File "/home/v/scripts/g_s_pipe/a.py", line 14, in
Employee_rdd = sc.textFile("abc.csv").map(lambda line: line.split(","))
File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/pyspark/context.py", line 476, in textFile
return RDD(self._jsc.textFile(name, minPartitions), self,
File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o10.textFile.
: java.lang.reflect.InaccessibleObjectException: Unable to make field transient java.lang.Object[] java.util.ArrayList.elementData accessible: module java.base does not "opens java.util" to unnamed module @51b63e70
at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:335)
at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:278)
at java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:175)
at java.base/java.lang.reflect.Field.setAccessible(Field.java:169)
它也给出了相同的错误:
py4j.protocol.Py4JJavaError: An error occurred while calling o10.textFile.
: java.lang.reflect.InaccessibleObjectException: Unable to make field transient java.lang.Object[]
17/06/12 21:17:21 WARN BlockManager: Putting block broadcast_1 failed due to an exception
17/06/12 21:17:21 WARN BlockManager: Block broadcast_1 could not be removed as it was not found on disk or in memory
Traceback (most recent call last):
File "/home/vna/scripts/global_score_pipeline/test_code_here.py", line 62, in
print (df.count())
File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/pyspark/sql/dataframe.py", line 299, in count
return int(self._jdf.count())
File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o25.count.
: java.lang.reflect.InaccessibleObjectException: Unable to make field transient java.lang.Object[] java.util.ArrayList.elementData accessible: module java.base does not "opens java.util" to unnamed module @5e37932e
at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:335)
at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:278)
at java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:175)
at java.base/java.lang.reflect.Field.setAccessible(Field.java:169)
at org.apache.spark.util.SizeEstimator$$anonfun$getClassInfo$3.apply(SizeEstimator.scala:336)
at org.apache.spark.util.SizeEstimator$$anonfun$getClassInfo$3.apply(SizeEstimator.scala:330)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at org.apache.spark.util.SizeEstimator$.getClassInfo(SizeEstimator.scala:330)
17/06/12 21:17:21警告块管理器:由于异常,放置块广播_1失败
17/06/12 21:17:21警告块管理器:无法删除块广播_1,因为在磁盘或内存中找不到它
回溯(最近一次呼叫最后一次):
文件“/home/vna/scripts/global_score_pipeline/test_code_here.py”,第62行,在
打印(df.count())
文件“/opt/apps/spark-2.0.1-bin-hadoop2.7/python/pyspark/sql/dataframe.py”,第299行,计入
返回int(self.\u jdf.count())
文件“/opt/apps/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py”,第1133行,在__
文件“/opt/apps/spark-2.0.1-bin-hadoop2.7/python/pyspark/sql/utils.py”,第63行,deco格式
返回f(*a,**kw)
文件“/opt/apps/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py”,第319行,在get_return_值中
py4j.protocol.Py4JJavaError:调用o25.count时出错。
:java.lang.reflect.InAccessibleObject异常:无法使字段临时java.lang.Object[]java.util.ArrayList.elementData可访问:模块java.base未“打开java.util”到未命名模块@5e37932e
位于java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:335)
位于java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:278)
位于java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:175)
位于java.base/java.lang.reflect.Field.setAccessible(Field.java:169)
位于org.apache.spark.util.SizeEstimator$$anonfun$getClassInfo$3.apply(SizeEstimator.scala:336)
在org.apache.spark.util.SizeEstimator$$anonfun$getClassInfo$3.apply上(SizeEstimator.scala:330)
在scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
位于scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
位于org.apache.spark.util.SizeEstimator$.getClassInfo(SizeEstimator.scala:330)
这个不可访问的对象异常是什么,我在Google上找不到多少帮助以及如何解决这个问题???它不会帮助您理解Spark的问题,但是如果您只想将文件从CSV转换为Apache Parquet,您可以使用Pandas和Apache Arrow(python包称为Pyarow)并跳过Java堆栈的这一部分。感谢您指向箭头。但我已经创建了一个小例子拼花使用箭头,但它仍然无法阅读,甚至这个拼花。。。我也有同样的错误,这里有什么问题?
17/06/12 21:17:21 WARN BlockManager: Putting block broadcast_1 failed due to an exception
17/06/12 21:17:21 WARN BlockManager: Block broadcast_1 could not be removed as it was not found on disk or in memory
Traceback (most recent call last):
File "/home/vna/scripts/global_score_pipeline/test_code_here.py", line 62, in
print (df.count())
File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/pyspark/sql/dataframe.py", line 299, in count
return int(self._jdf.count())
File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o25.count.
: java.lang.reflect.InaccessibleObjectException: Unable to make field transient java.lang.Object[] java.util.ArrayList.elementData accessible: module java.base does not "opens java.util" to unnamed module @5e37932e
at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:335)
at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:278)
at java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:175)
at java.base/java.lang.reflect.Field.setAccessible(Field.java:169)
at org.apache.spark.util.SizeEstimator$$anonfun$getClassInfo$3.apply(SizeEstimator.scala:336)
at org.apache.spark.util.SizeEstimator$$anonfun$getClassInfo$3.apply(SizeEstimator.scala:330)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at org.apache.spark.util.SizeEstimator$.getClassInfo(SizeEstimator.scala:330)