Image processing Spark如何使用图像格式读取我的图像？_Image Processing_Pyspark_Apache Spark Sql_Pixel

Image processing Spark如何使用图像格式读取我的图像？

image-processing pyspark

Image processing Spark如何使用图像格式读取我的图像？,image-processing,pyspark,apache-spark-sql,pixel,Image Processing,Pyspark,Apache Spark Sql,Pixel,这可能是一个愚蠢的问题，但我不知道Spark如何使用Spark.read.formatimage.load读取我的图像。。。。争论导入我的图像后，我会看到以下内容： >>> image_df.select("image.height","image.width","image.nChannels", "image.mode", "image.data").show() +------+-----+---------+----+--------------------+ |he

这可能是一个愚蠢的问题，但我不知道Spark如何使用Spark.read.formatimage.load读取我的图像。。。。争论

导入我的图像后，我会看到以下内容：

>>> image_df.select("image.height","image.width","image.nChannels", "image.mode", "image.data").show()
+------+-----+---------+----+--------------------+
|height|width|nChannels|mode|                data|
+------+-----+---------+----+--------------------+
|   430|  470|        3|  16|[4D 55 4E 4C 54 4...|
+------+-----+---------+----+--------------------+

我得出的结论是：

我的图像是430x470像素，我的图像是彩色RGB，因为nChannels=3是openCV兼容类型，我的图像模式是16，对应于特定的openCV字节顺序。有人知道我可以浏览哪个网站/文档来了解更多信息吗？数据列中的数据为二进制类型，但：当我运行image_df.selectimage.data.take1时，我得到的输出似乎只有一个数组，如下所示。接下来的内容链接到上面显示的结果。这可能是因为我缺乏有关openCV或其他方面的知识。尽管如此：

1/我不理解这样一个事实：如果我得到一个RGB图像，我应该有3个矩阵，但输出以……\x84\x87~']结束。我更想得到像[…，…，…\x87~']这样的东西。 2/这部分有特殊含义吗？像这些是每个矩阵之间的分隔符还是什么？为了更清楚地了解我要实现的目标，我想对图像进行处理，以便在每个图像之间进行像素比较。因此，我想知道图像中给定位置的像素值，我假设如果我有一个RGB图像，那么给定位置的像素值将为3

例如：假设我有一个只在白天指向天空的网络摄像头，我想知道与左上方天空部分相对应位置的像素值，我发现这些值的串联给出了浅蓝色，表示照片是在晴天拍摄的。让我们假设，唯一的可能性是晴朗的一天呈现浅蓝色。接下来，我想将前一个拼接与另一个像素值的拼接进行比较，它们位于完全相同的位置，但来自第二天拍摄的照片。如果我发现它们不相等，那么我得出结论，给定的照片是在阴天/雨天拍摄的。如果相等，则为晴天

在此方面的任何帮助都将不胜感激。为了更好地理解，我把我的例子庸俗化了，但我的目标几乎是一样的。我知道ML模型可以实现这些功能，但我很乐意先尝试一下。我的第一个目标是将此列拆分为对应于每个颜色代码的3列：红色矩阵、绿色矩阵、蓝色矩阵

我想我有逻辑。我使用keras.preprocessing.image.img_to_数组函数来理解值是如何分类的，因为我有一个RGB图像，我必须有3个矩阵：每种颜色对应一个RGB。如果有人想知道它是如何工作的，我可能是错的，但我想我有一些东西：

from keras.preprocessing import image
import numpy as np
from PIL import Image

# Using spark built-in data source
first_img = spark.read.format("image").schema(imageSchema).load(".....")
raw = first_img.select("image.data").take(1)[0][0]
np.shape(raw)
(606300,) # which is 470*430*3



# Using keras function
img = image.load_img(".../path/to/img")
yy = image.img_to_array(img)
>>> np.shape(yy)
(430, 470, 3) # the form is good but I have a problem of order since:

>>> raw[0], raw[1], raw[2]
(77, 85, 78)
>>> yy[0][0]
array([78., 85., 77.], dtype=float32)

# Therefore I used the numpy reshape function directly on raw 
# to have 470 matrix of 3 lines and 470 columns:

array = np.reshape(raw, (430,470,3))
xx = image.img_to_array(array)     # OPTIONAL and not used here

>>> array[0][0] == (raw[0],raw[1],raw[2])
array([ True,  True,  True])

>>> array[0][1] == (raw[3],raw[4],raw[5])
array([ True,  True,  True])

>>> array[0][2] == (raw[6],raw[7],raw[8])
array([ True,  True,  True])

>>> array[0][3] == (raw[9],raw[10],raw[11])
array([ True,  True,  True])

如果我理解的很好，spark会将图像作为一个大数组来读取-这里是606300-实际上每个元素都是有序的，并且对应于它们各自的色度RGB。在做了一些变换之后，我得到了430个矩阵，3列x470行。由于我的图像宽度x高为470x430，每个矩阵对应一个像素高度位置，每个矩阵内部：每种颜色3列，每种宽度位置470行

希望能帮助别人：

您是如何提取rbg值的？

from keras.preprocessing import image
import numpy as np
from PIL import Image

# Using spark built-in data source
first_img = spark.read.format("image").schema(imageSchema).load(".....")
raw = first_img.select("image.data").take(1)[0][0]
np.shape(raw)
(606300,) # which is 470*430*3



# Using keras function
img = image.load_img(".../path/to/img")
yy = image.img_to_array(img)
>>> np.shape(yy)
(430, 470, 3) # the form is good but I have a problem of order since:

>>> raw[0], raw[1], raw[2]
(77, 85, 78)
>>> yy[0][0]
array([78., 85., 77.], dtype=float32)

# Therefore I used the numpy reshape function directly on raw 
# to have 470 matrix of 3 lines and 470 columns:

array = np.reshape(raw, (430,470,3))
xx = image.img_to_array(array)     # OPTIONAL and not used here

>>> array[0][0] == (raw[0],raw[1],raw[2])
array([ True,  True,  True])

>>> array[0][1] == (raw[3],raw[4],raw[5])
array([ True,  True,  True])

>>> array[0][2] == (raw[6],raw[7],raw[8])
array([ True,  True,  True])

>>> array[0][3] == (raw[9],raw[10],raw[11])
array([ True,  True,  True])