批加载和处理图像的最具python风格的方法_Python

批加载和处理图像的最具python风格的方法

python

批加载和处理图像的最具python风格的方法,python,Python,下面的代码将jpeg图像加载到numpy ndarray数组中。目前它工作得很好，但我觉得必须有一种更像蟒蛇的方式来做到这一点 import scipy.ndimage as spimg import numpy as np # Read images into scipy and flatten to greyscale # Using generator function instead of list comprehension # for memory efficiency huma

下面的代码将jpeg图像加载到numpy ndarray数组中。目前它工作得很好，但我觉得必须有一种更像蟒蛇的方式来做到这一点

import scipy.ndimage as spimg
import numpy as np


# Read images into scipy and flatten to greyscale
# Using generator function instead of list comprehension
# for memory efficiency
human_files_convert = (spimg.imread(path, flatten=True) for path in human_files[:2099])

使用上面的生成器函数，以便单独处理每个图像，此处的列表理解失败

batch_size = 1000
step = 0
human_files_ndarray = np.empty((1, 250, 250))

# Create empty list, to append empty image arrays
human_files_list = []
batch = 1
total_processed = 0

# iterate through image arrays
for path in human_files_convert:
    # Append to list
    human_files_list.append(path)
    # Stack list of arrays
    step += 1
    total_processed += 1
    if (step % batch_size == 0) or (len(human_files[:2099]) == total_processed):
        new_stack = np.stack(human_files_list)
        print("Batch: ", batch)
        print(new_stack.shape)
        step = 0
        human_files_ndarray = np.concatenate((human_files_ndarray, new_stack))
        print(human_files_ndarray.shape)
        print(total_processed)
        # Create empty list, to append empty image arrays
        human_files_list = []
        batch += 1

关于如何使此代码更高效或更具pythonic的想法？

根据上面@sascha的建议，我将生成器函数的输出发送到一个文件。执行此操作将集合的内存使用率从>4GB降至小于200MB。额外的好处是我现在有一个加载数据集的磁盘副本，很像一个pickle文件

# Confirm correct import of images
import scipy.ndimage as spimg
import numpy as np
import h5py
import tqdm

np.set_printoptions(threshold=1000)

# Use h5py to store large uncompressed image arrays
img = h5py.File("images.hdf5", "w")
human_dset = img.create_dataset("human_images", (len(human_files), 250, 250))

# Read images into scipy and flatten to greyscale
# Using generator function instead of list comprehension
# for memory efficiency
slice = len(human_files)
human_files_convert = (spimg.imread(path, flatten=True) for path in human_files[:slice])

i = 0
for r in tqdm.tqdm(human_files_convert, total=slice):
    # Rescale [0,255] --> [0,1]
    r = r.astype('float32')/255
    # Insert Row into dset
    human_dset[i] = r
    i += 1
img.close()

你到底在那里干什么？有什么想法？（x，250250）的形状看起来就像你只想要

imgs=np.stack（human\u files\u convert）

？您有哪种内存不足（毕竟您将有一个密集的输出阵列）？如果有，你真的想把它们全部加载到内存中（与HDF5或类似的东西相反）？我正在尝试将所有图像文件堆叠到一个数组中。就像你刚才提到的。但由于内存限制，我正在尝试批处理转换。我一定会给你一个尝试的想法。这很有魅力。我知道我是在用核大锤打击它。如果你愿意回答，我就接受。谢谢。恐怕我不喜欢把它作为回答。很好，现在对你有用了。我的假设是，在某些有限的场景中，批处理有帮助。简化猜测：如果一个图像接着一个图像添加，内存使用量应该是

x+eps

，而一次添加所有图像最多应该是

x*2

。现在，这似乎只在您处于这种特殊内存限制的情况下才相关，例如，您的最终数组占用了大约80%的内存。@sascha my final array确实占用了我大约80%-90%的内存，而这只是我整个数据集的一部分。我来看看你上面提到的话题。