Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/tensorflow/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Memory leaks 带有tf.py_func的队列读取器产生内存泄漏_Memory Leaks_Tensorflow_Queue_Reader - Fatal编程技术网

Memory leaks 带有tf.py_func的队列读取器产生内存泄漏

Memory leaks 带有tf.py_func的队列读取器产生内存泄漏,memory-leaks,tensorflow,queue,reader,Memory Leaks,Tensorflow,Queue,Reader,我试图编写一个队列读取器,它遍历一个大文件,并在将其传递给实际操作之前在每一行上运行一个python函数 我使用string\u input\u producer读取单个.tsv文件。然后我用tf.TextLineReader创建一个队列,并用tf.py_func增强每一行。这样做,我注意到一些内存泄漏,只有在调用tf.py_func时才会生效(是的,甚至作为noop) 运行以下代码将产生以下结果: $ python test_memory.py 2> /dev/null run WITH

我试图编写一个队列读取器,它遍历一个大文件,并在将其传递给实际操作之前在每一行上运行一个python函数

我使用
string\u input\u producer
读取单个
.tsv
文件。然后我用
tf.TextLineReader
创建一个队列,并用
tf.py_func
增强每一行。这样做,我注意到一些内存泄漏,只有在调用
tf.py_func
时才会生效(是的,甚至作为noop)

运行以下代码将产生以下结果:

$ python test_memory.py 2> /dev/null
run WITHOUT tf.py_func
00001/50000, 1.4260% mem
05001/50000, 1.4512% mem
10001/50000, 1.4512% mem
15001/50000, 1.4512% mem
20001/50000, 1.4512% mem
25001/50000, 1.4516% mem
30001/50000, 1.4516% mem
35001/50000, 1.4516% mem
40001/50000, 1.4516% mem
45001/50000, 1.4516% mem
50000/50000, 1.4516% mem
===========================
run WITH tf.py_func
00001/50000, 1.4975% mem
05001/50000, 1.5051% mem
10001/50000, 1.5066% mem
15001/50000, 1.5081% mem
20001/50000, 1.5110% mem
25001/50000, 1.5137% mem
30001/50000, 1.5148% mem
35001/50000, 1.5165% mem
40001/50000, 1.5195% mem
45001/50000, 1.5210% mem
50000/50000, 1.5235% mem
===========================
如您所见,在不使用tf.py_func的情况下运行代码可以保持使用的内存稳定,而在使用python函数的情况下运行代码会使内存不断增加。这种效果在行较大的文件上更加明显

测试内存.py

import os
import sys
import psutil
import tensorflow as tf

def py_funner(x, do_py=True):
    '''
    this function returns the exact input.
    if do_py==True, it passes the data through a python noop using tf.py_func
    '''
    if do_py:
        def py_func(y):
            # this is just another noop.
            return y
        # py_func wraps a python function as a tensorflow op.
        return tf.py_func(py_func, [x], [tf.string], stateful=False)[0]
    else:
        return x

def get_data(do_py=True):
    # take the code as input. the effect is way more pronounced on larger files,
    # e.g., a tsv that encode image data in base64, as for ms-celeb-1m
    in_str = os.__file__

    # produce a queue that reads the one file row by row.
    input_queue = tf.train.string_input_producer([in_str])
    reader = tf.TextLineReader()
    ind, row = reader.read(input_queue)

    # call the wrapper to either include tf.py_func or not.
    return py_funner(row, do_py=do_py)

def main():
    # get the current proccess to monitor memory usage
    process = psutil.Process(os.getpid())

    # execute the same code both with a tf.py_func noop and without it
    for tt in [False, True]:
        print 'run WITH%s tf.py_func'%('' if tt else 'OUT')

        # generate the data queue
        data = get_data(do_py=tt)

        # start the session and the queue coordinator
        sess = tf.Session()
        coord = tf.train.Coordinator()
        queue_threads = tf.train.start_queue_runners(sess, coord=coord)

        # read a lot of the file
        max_iter = 50000
        for i in range(max_iter):
            run_ops = [data]
            d = sess.run(run_ops)
            mem = process.memory_percent()
            print '\r%05d/%d, %.4f%% mem'%(i+1, max_iter, mem),
            sys.stdout.flush()
            if i%5000==0:
                print
        print '\n==========================='

if __name__=='__main__':
    main()
我很感激任何关于如何进一步调试的建议和想法?!也许有什么办法可以看出python函数是否保留了某种存储


谢谢

不推荐使用队列读取器。你能用tf.data pipelines重现这些吗?不推荐使用队列读取器。你能用tf.data管道重现这些吗?