Python 使用二进制文件保存图像文件-pyspark_Python_Image Processing_Apache Spark_Pyspark

Python 使用二进制文件保存图像文件-pyspark

python image-processing apache-spark pyspark

Python 使用二进制文件保存图像文件-pyspark,python,image-processing,apache-spark,pyspark,Python,Image Processing,Apache Spark,Pyspark,如何将图像文件（JPG格式）保存到本地系统中。我使用二进制文件将图片加载到spark中，将它们转换成数组并进行处理。下面是代码 from PIL import Image import numpy as np import math images = sc.binaryFiles("path/car*") imagerdd = images.map(lambda (x,y): (x,(np.asarray(Image.open(StringIO(y))))) 做了一些图像处理，现在键有路径，

如何将图像文件（JPG格式）保存到本地系统中。我使用二进制文件将图片加载到spark中，将它们转换成数组并进行处理。下面是代码

from PIL import Image
import numpy as np
import math
images = sc.binaryFiles("path/car*") 
imagerdd = images.map(lambda (x,y): (x,(np.asarray(Image.open(StringIO(y)))))

做了一些图像处理，现在键有路径，值有图像数组

imageOutuint = imagelapRDD.map(lambda (x,y): (x,(y.astype(np.uint8))))
imageOutIMG = imageOutuint.map(lambda (x,y): (x,(Image.fromarray(y))))

如何将映像保存到本地/HDFS系统，我发现没有与之相关的选项。

如果要将数据保存到本地文件系统，只需收集为本地迭代器，并使用标准工具按记录保存文件记录：

for x, img in imagerdd.toLocalIterator():
    path = ... # Some path .jpg (based on x?)
    img.save(path)

只需确保

缓存

imagerdd

，以避免重新计算

感谢zero323解决方案，如果我在HDFS上运行它，我应该怎么做？例如，您可以分析如何处理它。但如果你问起pure PySpark，我不知道。