Python tensorflow:jupyter内核在运行卷积网络时死亡_Python_Tensorflow_Neural Network_Jupyter Notebook

Python tensorflow:jupyter内核在运行卷积网络时死亡

python tensorflow neural-network jupyter-notebook

Python tensorflow:jupyter内核在运行卷积网络时死亡,python,tensorflow,neural-network,jupyter-notebook,Python,Tensorflow,Neural Network,Jupyter Notebook,我试图从Sewak等人的《实用卷积神经网络》一书中的代码样本运行一个卷积神经网络演示。这是一个使用Tensorflow的简单的dog/cat分类器。问题是我在一个Jupyter笔记本上运行这个Tensorflow代码，当我执行代码开始训练网络时，内核不断死亡。我不确定这是否是笔记本电脑的问题，或者演示代码中是否缺少某些内容，或者这是否是一个已知问题，我不应该在jupyter笔记本电脑中进行训练因此，让我提供一些关于环境的细节。我有一个docker容器，其中安装了Tensorflow GPU、

我试图从Sewak等人的《实用卷积神经网络》一书中的代码样本运行一个卷积神经网络演示。这是一个使用Tensorflow的简单的dog/cat分类器。问题是我在一个Jupyter笔记本上运行这个Tensorflow代码，当我执行代码开始训练网络时，内核不断死亡。我不确定这是否是笔记本电脑的问题，或者演示代码中是否缺少某些内容，或者这是否是一个已知问题，我不应该在jupyter笔记本电脑中进行训练

因此，让我提供一些关于环境的细节。我有一个docker容器，其中安装了Tensorflow GPU、Keras和其他CUDA库。我的电脑上有3个GPU。容器内安装了

Miniconda，

，因此我可以装载和运行笔记本电脑等

以下是我的一些想法，它们可能会导致笔记本电脑Python 3.6内核死亡

我没有明确指出要在Tensorflow代码中使用的GPU

容器中的内存现在允许增长时可能会出现问题（）

我对Tensorflow还不够熟悉，还没有真正了解问题的根源。因为代码是在容器中运行的，所以通常的调试工具有点有限

培训的完整代码位于github存储库中：

以下是用于培训的

优化

功能。现在确定是否有人可以看到某些特定功能缺失

def optimize(num_iterations):
    # Ensure we update the global variable rather than a local copy.
    global total_iterations

    # Start-time used for printing time-usage below.
    start_time = time.time()

    best_val_loss = float("inf")
    patience = 0

    for i in range(total_iterations, total_iterations + num_iterations):

        # Get a batch of training examples.
        # x_batch now holds a batch of images and
        # y_true_batch are the true labels for those images.
        x_batch, y_true_batch, _, cls_batch = data.train.next_batch(train_batch_size)
        x_valid_batch, y_valid_batch, _, valid_cls_batch = data.valid.next_batch(train_batch_size)

        # Convert shape from [num examples, rows, columns, depth]
        # to [num examples, flattened image shape]

        x_batch = x_batch.reshape(train_batch_size, img_size_flat)
        x_valid_batch = x_valid_batch.reshape(train_batch_size, img_size_flat)

        # Put the batch into a dict with the proper names
        # for placeholder variables in the TensorFlow graph.
        feed_dict_train = {x: x_batch, y_true: y_true_batch}        
        feed_dict_validate = {x: x_valid_batch, y_true: y_valid_batch}

        # Run the optimizer using this batch of training data.
        # TensorFlow assigns the variables in feed_dict_train
        # to the placeholder variables and then runs the optimizer.
        session.run(optimizer, feed_dict=feed_dict_train)        

        # Print status at end of each epoch (defined as full pass through training Preprocessor).
        if i % int(data.train.num_examples/batch_size) == 0: 
            val_loss = session.run(cost, feed_dict=feed_dict_validate)
            epoch = int(i / int(data.train.num_examples/batch_size))

            acc, val_acc = print_progress(epoch, feed_dict_train, feed_dict_validate, val_loss)
            msg = "Epoch {0} --- Training Accuracy: {1:>6.1%}, Validation Accuracy: {2:>6.1%}, Validation Loss: {3:.3f}"
            print(msg.format(epoch + 1, acc, val_acc, val_loss))
            print(acc)
            acc_list.append(acc)
            val_acc_list.append(val_acc)
            iter_list.append(epoch+1)

            if early_stopping:    
                if val_loss < best_val_loss:
                    best_val_loss = val_loss
                    patience = 0
                else:
                    patience += 1
                if patience == early_stopping:
                    break

    # Update the total number of iterations performed.
    total_iterations += num_iterations

    # Ending time.
    end_time = time.time()

    # Difference between start and end-times.
    time_dif = end_time - start_time

    # Print the time-usage.
    print("Time elapsed: " + str(timedelta(seconds=int(round(time_dif)))))

def优化（num_迭代次数）：
#确保更新全局变量而不是本地副本。
全局总迭代次数
#用于以下打印时间使用的开始时间。
开始时间=time.time（）
最佳价值损失=浮动（“inf”）
耐心=0
对于范围内的i（总迭代次数，总迭代次数+num迭代次数）：
#获取一批培训示例。
#x_批次现在保存一批图像和
#y_true_batch是这些图像的真实标签。
x\u批次、y\u真实批次、cls\u批次=data.train.next\u批次（train\u batch\u size）
x\u有效\u批次，y\u有效\u批次，\u，有效\u cls\u批次=数据。有效。下一批（批量大小）
#从[num示例、行、列、深度]转换形状
#至[num示例，展平图像形状]
x_批次=x_批次。重塑（系列批次尺寸、img尺寸平面）
x_有效批次=x_有效批次。重塑（序列批次尺寸、img尺寸平面）
#把这批货放入一个有正确名称的字典里
#用于TensorFlow图中的占位符变量。
feed_dict_train={x:x_batch，y_true:y_true_batch}
feed_dict_validate={x:x_valid_batch，y_true:y_valid_batch}
#使用这批培训数据运行优化器。
#TensorFlow分配feed_dict_列中的变量
#添加到占位符变量，然后运行优化器。
运行（优化器，feed\u dict=feed\u dict\u train）
#在每个历元结束时打印状态（定义为完全通过训练预处理器）。
如果i%int（data.train.num\u示例/批次大小）==0：
val\u loss=session.run（成本、提要内容=提要内容验证）
epoch=int（i/int（data.train.num\u示例/批大小））
acc，val_acc=打印进度（历元、进纸记录序列、进纸记录验证、val_丢失）
msg=“历元{0}--训练精度：{1:>6.1%}，验证精度：{2:>6.1%}，验证丢失：{3:.3f}”
打印（消息格式（历元+1，会计科目，会计科目，会计科目，会计损失））
打印（acc）
附件列表。附加（附件）
val_acc_列表。追加（val_acc）
iter_列表追加（历元+1）
如果提前停车：
如果价值损失<最佳价值损失：
最佳价值损失=价值损失
耐心=0
其他：
耐心+=1
如果耐心==提前停止：
打破
#更新执行的迭代总数。
总迭代次数+=总迭代次数
#结束时间。
结束时间=time.time（）
#开始和结束时间之间的差异。
时间dif=结束时间-开始时间
#打印时间使用情况。
打印（“经过的时间：”+str（时间增量（秒=int）（四舍五入（时间差）））

你能解决吗？@CarlosVegas我没有完全解决这个问题。相反，我只是在python文本文件中编写脚本，并从终端运行它。不知道内核为什么会死。我认为你可以在python脚本中进行训练，然后保存模型权重，等等。然后你可以将数据导入jupyter笔记本，说看看预测，等等。不过我很快会再次尝试一下。我正在看一些视频，有人在CNN上使用jupyter nb。我在mac上也有类似的问题。在本文中找到了解决方案