Tensorflow输入管道中的在线过采样_Tensorflow_Pipeline

Tensorflow输入管道中的在线过采样

tensorflow

Tensorflow输入管道中的在线过采样,tensorflow,pipeline,Tensorflow,Pipeline,我有一个类似于教程中的输入管道。我的数据集是不平衡的，我想用少数过采样来解决这个问题。理想情况下，我希望“在线”执行此操作，即我不希望在磁盘上复制数据样本本质上，我想做的是根据标签复制单个示例（有一定的可能性）。我在Tensorflow中读了一些关于控制流的内容。而且似乎tf.cond（pred，fn1，fn2）是一条路要走。我正在努力寻找正确的参数化，因为fn1和fn2需要输出张量列表，其中列表的大小相同到目前为止，我大致做到了这一点： image = image_preprocessin

我有一个类似于教程中的输入管道。我的数据集是不平衡的，我想用少数过采样来解决这个问题。理想情况下，我希望“在线”执行此操作，即我不希望在磁盘上复制数据样本

本质上，我想做的是根据标签复制单个示例（有一定的可能性）。我在Tensorflow中读了一些关于控制流的内容。而且似乎

tf.cond（pred，fn1，fn2）

是一条路要走。我正在努力寻找正确的参数化，因为

fn1

和

fn2

需要输出张量列表，其中列表的大小相同

到目前为止，我大致做到了这一点：

image = image_preprocessing(image_buffer, bbox, False, thread_id)            
pred = tf.reshape(tf.equal(label, tf.convert_to_tensor([2])), [])
r_image = tf.cond(pred, lambda: [tf.identity(image), tf.identity(image)], lambda: [tf.identity(image),])
r_label = tf.cond(pred, lambda: [tf.identity(label), tf.identity(label)], lambda: [tf.identity(label),])

但是，正如我前面提到的，这会产生一个错误：

ValueError: fn1 and fn2 must return the same number of results.

有什么想法吗

这是我的第一个堆栈溢出问题。非常感谢对我问题的任何反馈。

在做了更多的研究之后，我找到了一个解决我想做的事情的方法。我忘记提到的是，我问题中提到的代码后面跟着一个批处理方法，例如

batch（）

或

batch\u join（）

这些函数采用一个参数，该参数允许您对不同批量大小的张量进行分组，而不仅仅是单个示例的张量。参数为

enqueue\u multi

并且应设置为

True

以下代码为我提供了窍门：

for thread_id in range(num_preprocess_threads):

    # Parse a serialized Example proto to extract the image and metadata.
    image_buffer, label_index = parse_example_proto(
            example_serialized)

    image = image_preprocessing(image_buffer, bbox, False, thread_id)

    # Convert 3D tensor of shape [height, width, channels] to 
    # a 4D tensor of shape [batch_size, height, width, channels]
    image = tf.expand_dims(image, 0)

    # Define the boolean predicate to be true when the class label is 1
    pred = tf.equal(label_index, tf.convert_to_tensor([1]))
    pred = tf.reshape(pred, [])

    oversample_factor = 2
    r_image = tf.cond(pred, lambda: tf.concat(0, [image]*oversample_factor), lambda: image)
    r_label = tf.cond(pred, lambda: tf.concat(0, [label_index]*oversample_factor), lambda: label_index)
    images_and_labels.append([r_image, r_label])

images, label_batch = tf.train.shuffle_batch_join(
    images_and_labels,
    batch_size=batch_size,
    capacity=2 * num_preprocess_threads * batch_size,
    min_after_dequeue=1 * num_preprocess_threads * batch_size,
    enqueue_many=True)

这通常是用fifoqueues解决的。谢谢你的提示。这确实让我找到了正确的答案。我编辑了这个帖子。