Python 加载tensorflow图像并创建面片_Python_Image_Tensorflow_Patch

Python 加载tensorflow图像并创建面片

python image tensorflow

Python 加载tensorflow图像并创建面片,python,image,tensorflow,patch,Python,Image,Tensorflow,Patch,我正在使用将一个非常大的RGB图像数据集从磁盘加载到一个数据库中。比如说, dataset = tf.keras.preprocessing.image_dataset_from_directory( <directory>, label_mode=None, seed=1, subset='training', validation_split=0.1) dataset=tf.keras.preprocessing.image\u data

我正在使用将一个非常大的RGB图像数据集从磁盘加载到一个数据库中。比如说,

dataset = tf.keras.preprocessing.image_dataset_from_directory(
    <directory>,
    label_mode=None,
    seed=1,
    subset='training',
    validation_split=0.1)

dataset=tf.keras.preprocessing.image\u dataset\u来自目录(
,
label_mode=None，
种子=1，
"训练",，
验证（拆分=0.1）

例如，该数据集有100000张图像，分成大小为32的批次，生成规格为

tf.data.Dataset

（批次=32，宽度=256，高度=256，通道=3）的


我想从图像中提取补丁，创建一个新的tf.data.Dataset
，图像空间尺寸为64x64
因此，我想创建一个新的数据集，其中400000个补丁仍然是32个批次，带有tf.data.Dataset
和spec（batch=32，width=64，height=64，channels=3）

我已经看过了方法和函数，但从文档中不清楚如何使用它们来创建新的数据集，我需要开始关于补丁的培训。窗口
似乎面向一维张量，而提取补丁
似乎与数组而不是数据集一起工作
关于如何做到这一点有什么建议吗
更新：
只是为了澄清我的需要。我试图避免在磁盘上手动创建修补程序。第一，这在磁盘方面是站不住脚的。第二，补丁大小不是固定的。实验将在多个补丁大小上进行。因此，我不想在磁盘上手动执行补丁创建，也不想在内存中手动加载映像并执行补丁。我希望tensorflow将补丁创建作为管道工作流的一部分来处理，以最大限度地减少磁盘和内存的使用。
我相信您可以使用python类生成器。您可以将此生成器传递到模型。如果需要，可以安装功能。实际上，我曾经用它来做标签预处理
我编写了以下数据集生成器，它从数据集中加载一个批，根据tile\u shape
参数将批中的图像分割为多个图像。如果有足够的图像，则返回下一批
在这个例子中，我使用了一个来自张量切片的简单数据集进行简化。当然，你可以把它换成你的
import tensorflow as tf

class TileDatasetGenerator:
    
    def __init__(self, dataset, batch_size, tile_shape):
        self.dataset_iterator = iter(dataset)
        self.batch_size = batch_size
        self.tile_shape = tile_shape
        self.image_queue = None
    
    def __iter__(self):
        return self
    
    def __next__(self):
        if self._has_queued_enough_for_batch():
            return self._dequeue_batch()
        
        batch = next(self.dataset_iterator)
        self._split_images(batch)    
        return self.__next__()
            
    def _has_queued_enough_for_batch(self):
        return self.image_queue is not None and tf.shape(self.image_queue)[0] >= self.batch_size
    
    def _dequeue_batch(self):
        batch, remainder = tf.split(self.image_queue, [self.batch_size, -1], axis=0)
        self.image_queue = remainder
        return batch
        
    def _split_images(self, batch):
        batch_shape = tf.shape(batch)
        batch_splitted = tf.reshape(batch, shape=[-1, self.tile_shape[0], self.tile_shape[1], batch_shape[-1]])
        if self.image_queue is None:
            self.image_queue = batch_splitted
        else:
            self.image_queue = tf.concat([self.image_queue, batch_splitted], axis=0)
            


dataset = tf.data.Dataset.from_tensor_slices(tf.ones(shape=[128, 64, 64, 3]))
dataset.batch(32)
generator = TileDatasetGenerator(dataset, batch_size = 16, tile_shape = [32,32])

for batch in generator:
    tf.print(tf.shape(batch))

编辑：
如果需要，可以将生成器转换为tf.data.Dataset
，但需要向返回迭代器（本例中为self）的生成器添加一个_call__函数
你要找的是。下面是一个例子：
import tensorflow as tf
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt
import numpy as np

data = tfds.load('mnist', split='test', as_supervised=True)

get_patches = lambda x, y: (tf.reshape(
    tf.image.extract_patches(
        images=tf.expand_dims(x, 0),
        sizes=[1, 14, 14, 1],
        strides=[1, 14, 14, 1],
        rates=[1, 1, 1, 1],
        padding='VALID'), (4, 14, 14, 1)), y)

data = data.map(get_patches)

fig = plt.figure()
plt.subplots_adjust(wspace=.1, hspace=.2)
images, labels = next(iter(data))
for index, image in enumerate(images):
    ax = plt.subplot(2, 2, index + 1)
    ax.set_xticks([])
    ax.set_yticks([])
    ax.imshow(image)
plt.show()

也许您可以加载每个数据集，然后“手动”将每个图像分成四个图像并保存到相应的文件夹中。@LadislavOndris请参阅更新的问题。您想将图像分成四个部分还是随机分成四个部分？@NicolasGervais我正在考虑将图像平均分割。问题中的数字4就是一个例子。我打算分为2人，4人，8人。但是，我也会考虑随机化作物。其机制是什么？两个都可以使用相同的机制吗？谢谢。我将测试并尝试一下。谢谢。我将测试并尝试一下。
import tensorflow as tf
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt
import numpy as np

data = tfds.load('mnist', split='test', as_supervised=True)

get_patches = lambda x, y: (tf.reshape(
    tf.image.extract_patches(
        images=tf.expand_dims(x, 0),
        sizes=[1, 14, 14, 1],
        strides=[1, 14, 14, 1],
        rates=[1, 1, 1, 1],
        padding='VALID'), (4, 14, 14, 1)), y)

data = data.map(get_patches)

fig = plt.figure()
plt.subplots_adjust(wspace=.1, hspace=.2)
images, labels = next(iter(data))
for index, image in enumerate(images):
    ax = plt.subplot(2, 2, index + 1)
    ax.set_xticks([])
    ax.set_yticks([])
    ax.imshow(image)
plt.show()