Apache spark 使用python将pickle(.pck)文件转换为spark数据帧
你好强> 亲爱的成员们,我想使用Bigdl训练模型,我有一组医学图像数据,以pickle对象文件(,pck)的形式。该pickle文件是一个3D图像(3D数组) 我尝试使用BigDl python API将其转换为spark dataframApache spark 使用python将pickle(.pck)文件转换为spark数据帧,apache-spark,bigdl,Apache Spark,Bigdl,你好 亲爱的成员们,我想使用Bigdl训练模型,我有一组医学图像数据,以pickle对象文件(,pck)的形式。该pickle文件是一个3D图像(3D数组) 我尝试使用BigDl python API将其转换为spark datafram pickleRdd = sc.pickleFilehome/student/BigDL- trainings/elephantscale/data/volumetric_data/329637-8.pck sqlContext = SQLContext
pickleRdd = sc.pickleFilehome/student/BigDL-
trainings/elephantscale/data/volumetric_data/329637-8.pck
sqlContext = SQLContext(sc)
df = sqlContext.createDataFrame(pickleRdd)
它抛出错误
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 2, localhost, executor driver)
: java.io.IOException: file:/home/student/BigDL-trainings/elephantscale/data/volumetric_data/329637-8.pck not a SequenceFile
我已经在Python3.5和2.7上执行了这段代码,在这两种情况下我都遇到了错误