Tensorflow TF估计器梯度增强分类器在训练时突然停止
我用TF示例代码训练了梯度增强分类器 但是,, TF估计器梯度增强分类器在训练时突然停止 我想它在乞讨时要走几步,然后突然毫无例外地停了下来 我怎样才能知道python崩溃的原因 很难找到它停止的原因 res: 日志: 2019-04-15 16:40:26.175889:I tensorflow/core/common_运行时/gpu/gpu_设备。cc:1433]找到设备0 带属性:名称:GeForce GTX 1060 6GB主要:6次要:1 memoryClockRate(GHz):1.7845 pCIBSID:0000:07:00.0总内存: 6.00GiB自由内存:4.97GiB 2019-04-15 16:40:26.182620:I tensorflow/core/common_runtime/gpu/gpu设备。cc:1512]添加可见 gpu设备:0 2019-04-15 16:40:26.832040:I tensorflow/core/common_runtime/gpu/gpu_device.cc:984]device 具有强度1边缘矩阵的互连拖缆执行器:2019-04-15 16:40:26.835620:I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]0 2019-04-15 16:40:26.836840:I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003]0:N 2019-04-15 16:40:26.838276:I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115]已创建 TensorFlow设备(/job:localhost/replica:0/task:0/device:GPU:0) 4716 MB内存)->物理GPU(设备:0,名称:GeForce GTX 1060 6GB,pci总线id:0000:07:00.0,计算能力:6.1) 警告:tensorflow:来自 D:\python\lib\site packages\tensorflow\python\training\saver.py:1266: 检查点_存在(从 tensorflow.python.training.checkpoint_management)已被弃用,并且 将在将来的版本中删除。更新说明:使用 用于检查具有此前缀的文件的标准文件API。 警告:tensorflow:来自 D:\python\lib\site packages\tensorflow\python\training\saver.py:1070: 获取检查点时间(从 tensorflow.python.training.checkpoint_management)已被弃用,并且 将在将来的版本中删除。更新说明:使用 获取MTIME的标准文件实用程序。警告:tensorflow:问题 序列化资源时遇到错误。类型不受支持,或者 项的类型和CollectionDef中的字段类型不匹配。注意这一点 这是一个警告,可能是可以忽略的_“资源”对象没有 属性“name”警告:tensorflow:序列化时遇到问题 资源。类型不受支持,或者项目的类型不匹配 CollectionDef中的字段类型。注意这是一个警告,可能是安全的 不理。”_资源“对象”没有属性“名称” D:\py>(刚刚完成培训) -------------------------------------------停止Tensorflow TF估计器梯度增强分类器在训练时突然停止,tensorflow,crash,classification,tensorflow-estimator,Tensorflow,Crash,Classification,Tensorflow Estimator,我用TF示例代码训练了梯度增强分类器 但是,, TF估计器梯度增强分类器在训练时突然停止 我想它在乞讨时要走几步,然后突然毫无例外地停了下来 我怎样才能知道python崩溃的原因 很难找到它停止的原因 res: 日志: 2019-04-15 16:40:26.175889:I tensorflow/core/common_运行时/gpu/gpu_设备。cc:1433]找到设备0 带属性:名称:GeForce GTX 1060 6GB主要:6次要:1 memoryClockRate(GHz):1
metrics = est.evaluate(input_fn=val_input_fn,steps=1)
results = est.predict(input_fn=ttt )
result_list = list(results)
classi = list(map(lambda x : x['classes'][0].decode("utf-8"), result_list))
num = list(range(0,len(classi)))
numi = list(map(lambda x : 'test_' + str(x),num))
#df1 = pd.DataFrame(columns=('ID_code','target'))
df_result = pd.DataFrame({'ID_code' : numi, 'target' : classi})
df_result.to_csv('result/submission03.csv',index=False)
def make_input_fn(X, y, n_epochs=None, shuffle=True):
def input_fn():
NUM_EXAMPLES = len(y)
dataset = tf.data.Dataset.from_tensor_slices((dict(X), y))
# dataset = tf.data.Dataset.from_tensor_slices((X.to_dict(orient='list'), y))
#if shuffle:
# dataset = dataset.shuffle(NUM_EXAMPLES)
# For training, cycle thru dataset as many times as need (n_epochs=None).
dataset = (dataset.repeat(n_epochs).batch(NUM_EXAMPLES))
return dataset
return input_fn
应该显示评估结果我认为问题是由GPU内存溢出引起的。 您可以根据GPU内存大小,尝试将“每层n个批次”的值修改为更大的值。
我使用的是6G GPU,其值为16。def make_input_fn(X,y,n_epochs=None,shuffle=True):def input_fn():NUM_EXAMPLES=len(y)dataset=tf.data.dataset.from_tensor_slices((dict(X,y))#dataset=tf.data.dataset.from_tensor___slices((X.to)(orient='list'),y))#如果是shuffle:#dataset=dataset.shuffle(NUM_示例)#对于训练,根据需要循环遍历dataset多次(n_epochs=None)。dataset=(dataset.repeat(n_epochs.batch(NUM_EXAMPLES))return dataset return input_fn此更改是否解决了问题?工作顺利-谢谢-但您究竟是如何知道的?我遇到了类似的问题,我相信它也与此参数有关。“你能告诉我们更多关于你是如何选择它的细节吗,@tommey?”?
trn = pd.read_csv('data/santander-customer-transaction-prediction/train.csv')
tst = pd.read_csv('data/santander-customer-transaction-prediction/test.csv')
#trn = upsample(trn[trn.target==0], trn[trn.target==1])
# trn = downsample(trn[trn.target==0], trn[trn.target==1])
features = trn.columns.values[2:202]
target_name = trn.columns.values[1]
train=trn[features]
target=trn[target_name]
NUM_EXAMPLES = len (target)
print (NUM_EXAMPLES)
feat1 = train.corrwith(target).sort_values().head(20).index
feat2 = train.corrwith(target).sort_values().tail(20).index
featonly = feat1.append(feat2)
feat = featonly.append(pd.Index(['target']))
train_origin, tt = train_test_split(trn, test_size=0.2)
train = train_origin[featonly]
target = train_origin[target_name]
test = tst[featonly]
target_name_tst = tst.columns.values[1]
target_tst=tst[target_name_tst]
val_origin=tt
val_train = tt[featonly]
val_target = tt[target_name]
# Training and evaluation input functions.
train_input_fn = make_input_fn(train, target)
val_input_fn = make_input_fn(val_train, val_target)
ttt=tf.estimator.inputs.pandas_input_fn(x=test,num_epochs=1,shuffle=False)
del train,target,val_train,train_origin,trn,tst
fc = tf.feature_column
feature_columns = []
for feature_name in featonly:
feature_columns.append(fc.numeric_column(feature_name,dtype=tf.float32))
#feature_columns
#5
#tf.logging.set_verbosity(tf.logging.INFO)
#logging_hook = tf.train.LoggingTensorHook({"loss" : loss, "accuracy" : accuracy}, every_n_iter=10)
params = {
'n_trees': 50,
'max_depth': 3,
'n_batches_per_layer': 1,
# You must enable center_bias = True to get DFCs. This will force the model to
# make an initial prediction before using any features (e.g. use the mean of
# the training labels for regression or log odds for classification when
# using cross entropy loss).
'center_bias': True
}
# config = tf.estimator.RunConfig().replace(keep_checkpoint_max = 1,
# log_step_count_steps=20, save_checkpoints_steps=20)
est = tf.estimator.BoostedTreesClassifier(feature_columns, **params,model_dir='d:\py/model/')
est.train(train_input_fn, max_steps=50)
metrics = est.evaluate(input_fn=val_input_fn,steps=1)
results = est.predict(input_fn=ttt )
result_list = list(results)
classi = list(map(lambda x : x['classes'][0].decode("utf-8"), result_list))
num = list(range(0,len(classi)))
numi = list(map(lambda x : 'test_' + str(x),num))
#df1 = pd.DataFrame(columns=('ID_code','target'))
df_result = pd.DataFrame({'ID_code' : numi, 'target' : classi})
df_result.to_csv('result/submission03.csv',index=False)
def make_input_fn(X, y, n_epochs=None, shuffle=True):
def input_fn():
NUM_EXAMPLES = len(y)
dataset = tf.data.Dataset.from_tensor_slices((dict(X), y))
# dataset = tf.data.Dataset.from_tensor_slices((X.to_dict(orient='list'), y))
#if shuffle:
# dataset = dataset.shuffle(NUM_EXAMPLES)
# For training, cycle thru dataset as many times as need (n_epochs=None).
dataset = (dataset.repeat(n_epochs).batch(NUM_EXAMPLES))
return dataset
return input_fn