Multithreading 使用多线程写入TFR记录时发生异常_Multithreading_Exception_Tensorflow_Python Multithreading_Tfrecord

Multithreading 使用多线程写入TFR记录时发生异常

multithreading exception tensorflow

Multithreading 使用多线程写入TFR记录时发生异常,multithreading,exception,tensorflow,python-multithreading,tfrecord,Multithreading,Exception,Tensorflow,Python Multithreading,Tfrecord,我有一个巨大的视频数据集；对于每个视频，我都有一个包含相应帧的文件夹。我正在为每个视频编写一个TFRecord，使用，其中FeatureList是视频的帧我使用一个python线程池来迭代视频列表，其中每个线程处理一个视频。然后，我使用一个tensorflow队列对帧进行操作我的脚本的结构如下所示： videos_id = os.listdir(dset_dir) def main_loop(video): frames_list = get_frames(video)

我有一个巨大的视频数据集；对于每个视频，我都有一个包含相应帧的文件夹。
我正在为每个视频编写一个TFRecord，使用，其中FeatureList是视频的帧

我使用一个python线程池来迭代视频列表，其中每个线程处理一个视频。然后，我使用一个tensorflow队列对帧进行操作

我的脚本的结构如下所示：

videos_id = os.listdir(dset_dir)    

def main_loop(video):
    frames_list = get_frames(video)
    filename_queue = tf.train.string_input_producer(frames_list)
    reader = tf.WholeFileReader()
    key, value = reader.read(filename_queue)

    my_img = tf.image.decode_jpeg(value)
    # resize, etc ...

    init_op = tf.global_variables_initializer()
    sess = tf.InteractiveSession()
    with sess.as_default():
        sess.run(init_op)

    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)

    # accumulating images of 1 video
    image_list = []
    for i in range(len(frames_list)):
        image_list.append(my_img.eval(session=sess))

    coord.request_stop()
    coord.join(threads)

    writer = tf.python_io.TFRecordWriter(tfrecord_name)
    ex = make_example(image_list)
    writer.write(ex.SerializeToString())
    writer.close()
    sess.close()

with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
    future = {executor.submit(
        main_loop, video): video for video in videos_id}

在播放了上千个视频之后，我得到了以下异常（重复了很多次，针对不同的“线程id”）：

知道为什么会这样吗？

提前感谢。

我使用了这种显然更干净的方法来停止协调器。我不确定这是否有帮助

# ....
# this will throw an OutOfRange exeption after 1  epoch, i.e. one video
filename_queue = tf.train.string_input_producer(frames_list, num_epochs=1)

# ....

coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)

# ...

# After everything is built, start the loop.
try:
    while not coord.should_stop():
        #read you frame
except tf.errors.OutOfRangeError:
     # means the loop has finished
     # write yuor tfrecord
finally:
     # When done, ask the threads to stop.
      coord.request_stop()

我用了一种显然更干净的方法来阻止协调人。我不确定这是否有帮助

# ....
# this will throw an OutOfRange exeption after 1  epoch, i.e. one video
filename_queue = tf.train.string_input_producer(frames_list, num_epochs=1)

# ....

coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)

# ...

# After everything is built, start the loop.
try:
    while not coord.should_stop():
        #read you frame
except tf.errors.OutOfRangeError:
     # means the loop has finished
     # write yuor tfrecord
finally:
     # When done, ask the threads to stop.
      coord.request_stop()

嗯，你已经要求排队的人停下来。问题是他们在吵闹吗？谢谢@AllenLavoie。我认为请求_stop（）是在for循环之后执行的，所以当这个视频没有更多的帧要处理时，不是吗？如果是这样的话，我想这很好。问题是它能正确处理±1000个视频，但在某个时候它会引发这个异常……它应该处理多少个视频？此错误是否在请求停止之前打印？可能是数据耗尽，并以一种模糊的方式显示；您可以从

tf.contrib.data

（

tf.data

在tf1.4中）获得更好的错误消息。我将50k个元素排队。我现在在看内存使用情况，它一直在不断增加。当内存使用率达到98%时会引发异常，这可以解释“排队操作被取消”的原因。TFRecordWriter是否可能存在内存泄漏？好吧，您正在将它们添加到列表中；如果去掉它，内存使用率还会增加吗？中间队列有多大（如果有的话）？您已经请求队列运行者停止。问题是他们在吵闹吗？谢谢@AllenLavoie。我认为请求_stop（）是在for循环之后执行的，所以当这个视频没有更多的帧要处理时，不是吗？如果是这样的话，我想这很好。问题是它能正确处理±1000个视频，但在某个时候它会引发这个异常……它应该处理多少个视频？此错误是否在请求停止之前打印？可能是数据耗尽，并以一种模糊的方式显示；您可以从

tf.contrib.data

（

tf.data