Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/tensorflow/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 即使采用keras-fit_生成器方法,GPU的性能仍然很慢_Python_Tensorflow_Keras - Fatal编程技术网

Python 即使采用keras-fit_生成器方法,GPU的性能仍然很慢

Python 即使采用keras-fit_生成器方法,GPU的性能仍然很慢,python,tensorflow,keras,Python,Tensorflow,Keras,我有一个5GB的大数据集,我想用它来训练使用Keras设计的神经网络模型。虽然我使用的是Nvidia Tesla P100 GPU,但训练速度非常慢(每个历元需要约60-70秒)(我选择批量大小=10000)。在阅读和搜索之后,我发现我可以通过使用keras而不是典型的fit来提高训练速度。为此,我编写了以下代码: from __future__ import print_function import numpy as np from keras import Sequential from

我有一个5GB的大数据集,我想用它来训练使用Keras设计的神经网络模型。虽然我使用的是Nvidia Tesla P100 GPU,但训练速度非常慢(每个历元需要约60-70秒)(我选择批量大小=10000)。在阅读和搜索之后,我发现我可以通过使用keras而不是典型的
fit
来提高训练速度。为此,我编写了以下代码:

from __future__ import print_function
import numpy as np
from keras import Sequential
from keras.layers import Dense
import keras
from sklearn.model_selection import train_test_split


def generator(C, r, batch_size):
    samples_per_epoch = C.shape[0]
    number_of_batches = samples_per_epoch / batch_size
    counter = 0

    while 1:
        X_batch = np.array(C[batch_size * counter:batch_size * (counter + 1)])
        y_batch = np.array(r[batch_size * counter:batch_size * (counter + 1)])
        counter += 1
        yield X_batch, y_batch

        # restart counter to yeild data in the next epoch as well
        if counter >= number_of_batches:
            counter = 0


if __name__ == "__main__":
    X, y = readDatasetFromFile()
    X_tr, X_ts, y_tr, y_ts = train_test_split(X, y, test_size=.2)

    model = Sequential()
    model.add(Dense(16, input_dim=X.shape[1]))
    model.add(keras.layers.advanced_activations.PReLU())
    model.add(Dense(16))
    model.add(keras.layers.advanced_activations.PReLU())
    model.add(Dense(16))
    model.add(keras.layers.advanced_activations.PReLU())
    model.add(Dense(1, activation='sigmoid'))

    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

    batch_size = 1000
    model.fit_generator(generator(X_tr, y_tr, batch_size), epochs=200, steps_per_epoch=X.shape[0]/ batch_size,
                        validation_data=generator(X_ts, y_ts, batch_size * 2),
                        validation_steps=X.shape[0] / batch_size * 2, verbose=2, use_multiprocessing=True)

    loss, accuracy = model.evaluate(X_ts, y_ts, verbose=0)
    print(loss, accuracy)
使用
fit_generator
运行后,训练时间略有提高,但仍然很慢(每个历元现在需要约40-50秒)。在终端上运行
nvidiasmi
时,我发现GPU的利用率只有约15%,这让我怀疑我的代码是否有误。我在上面发布我的代码是想问你是否有导致GPU性能降低的bug


谢谢,

请尝试强制分配GPU

import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"  # or if you want more than 1 GPU set it as "0", "1"

希望这有帮助

您是否尝试使用
CUDA\u VISIBLE\u设备
强制分配GPU给它?@ParthasarathySubburaj感谢您的快速响应!我该怎么做?非常感谢。在导入tensorflow之前我必须使用此赋值吗?最好先导入
os
,然后在导入其他包之前设置所有环境变量