Machine learning 将预测测试标签添加到PyTorch LSTM中的原始测试数据帧

Machine learning 将预测测试标签添加到PyTorch LSTM中的原始测试数据帧,machine-learning,pytorch,lstm,recurrent-neural-network,Machine Learning,Pytorch,Lstm,Recurrent Neural Network,我已经在PyTorch中对文本数据运行了一个LSTM模型。我的原始数据帧(测试和训练)包含3列。下面是我的测试数据框架的末尾,我将其放入了我的模型中 TEXT ICD_25000 HADM_ID 23995 s brother diabetes deceased brother mastoid ca... 0 115229.0 23996 x to be used with insulin pen three times a da... 0 170587

我已经在PyTorch中对文本数据运行了一个LSTM模型。我的原始数据帧(测试和训练)包含3列。下面是我的测试数据框架的末尾,我将其放入了我的模型中

    TEXT    ICD_25000   HADM_ID
23995   s brother diabetes deceased brother mastoid ca...   0   115229.0
23996   x to be used with insulin pen three times a da...   0   170587.0
23997   lung biopsy in borderline diabetes diagnosed y...   0   174893.0
23998   have nv she has been unable to eat today past ...   0   108008.0
23999   one brother had a stroke at age several member...   0   151301.0
我运行我的模型并获得约82%的精度,但需要帮助将预测结果(来自此LSTM模型)返回到上面的原始测试数据帧(其中包含实际标签,(ICD9_2500))。我需要将HADM_ID代码与预测值(0或1)、实际值(0或1)和文本一起保存在输出测试数据帧中

我的代码如下:



fields = [('TEXT', TEXT), ('ICD_25000', LABEL)]

train_df = data.TabularDataset(
    path="window_train_with_HADM.csv", format='csv',
    skip_header=True, fields=fields)

test_df = data.TabularDataset(
    path="window_test_with_HADM.csv", format='csv',
    skip_header=True, fields=fields)

TEXT.build_vocab(train_df, max_size=VOCABULARY_SIZE)
LABEL.build_vocab(train_df)

print(f'Vocabulary size: {len(TEXT.vocab)}')
print(f'Number of classes: {len(LABEL.vocab)}')
这是我在培训中使用的代码,用于计算培训和测试精度

def compute_binary_accuracy(model, data_loader, device):
    model.eval()
    correct_pred, num_examples = 0, 0
    with torch.no_grad():
        for batch_idx, batch_data in enumerate(data_loader):
            text, text_lengths = batch_data.TEXT
            logits = model(text, text_lengths.cpu())
            predicted_labels = (torch.sigmoid(logits) > 0.5).long()
            num_examples += batch_data.ICD_25000.size(0)
            correct_pred += (predicted_labels.long() == batch_data.ICD_25000.long()).sum()
        return correct_pred.float()/num_examples * 100


有谁能帮我想出一个函数,它获取测试加载程序数据并输出一个包含所有原始测试数据的数据帧,以及来自模型的预测?我不知道测试数据的模型结果存储在哪里,需要将其与原始测试数据一起输入到数据框中

可复制示例:

制作数据帧:


import pandas as pd
d = {'Review': [1,0,0,0,1,1,1,0,1], 'Text': ['This movies rocks', 'I hate this movie', "what a bad movie",'This movie was not good','Amazing movie!', 'This was a good film', 'I enjoyed watching this movie','Not the best','Super interesting movie'], 'ID':[1,2,3,4,5,6,7,8,9]}
df = pd.DataFrame(data=d)

# make training and testing data 

train_df = df.sample(frac=0.8,random_state=1234) #random state is a seed value
test_df = df.drop(train_df.index).copy()


train.to_csv('window_train_with_IDs.csv', index=False)
test.to_csv('window_test_with_IDs.csv', index=False)

请你用更少的代码重新表述这个问题,这样它就可以独立再现了?请阅读。似乎问题在于将数据放入数据框架,而与深度学习或NLP无关。如果我错了,请详细说明。如果没有,请编辑。我需要一个PyTorch函数,该函数使用列车装载机数据输出每个测试数据点的预测数据帧,但也包括原始数据帧中包含的所有内容。我想我不需要包括模型训练的所有代码,但我想包括训练加载程序,测试加载程序。我不能理解这个问题。请编辑并包括一个可复制的输入输出,以及在这两者之间应该发生什么的解释。这是否回答了您的问题?我没有输出,唯一的输出是训练和测试的准确性。我的输入是文章顶部的数据框,我运行整个模型,但只需要一个函数,将模型中的预测添加到原始测试数据框中。
def compute_binary_accuracy(model, data_loader, device):
    model.eval()
    correct_pred, num_examples = 0, 0
    with torch.no_grad():
        for batch_idx, batch_data in enumerate(data_loader):
            text, text_lengths = batch_data.TEXT
            logits = model(text, text_lengths.cpu())
            predicted_labels = (torch.sigmoid(logits) > 0.5).long()
            num_examples += batch_data.ICD_25000.size(0)
            correct_pred += (predicted_labels.long() == batch_data.ICD_25000.long()).sum()
        return correct_pred.float()/num_examples * 100



import pandas as pd
d = {'Review': [1,0,0,0,1,1,1,0,1], 'Text': ['This movies rocks', 'I hate this movie', "what a bad movie",'This movie was not good','Amazing movie!', 'This was a good film', 'I enjoyed watching this movie','Not the best','Super interesting movie'], 'ID':[1,2,3,4,5,6,7,8,9]}
df = pd.DataFrame(data=d)

# make training and testing data 

train_df = df.sample(frac=0.8,random_state=1234) #random state is a seed value
test_df = df.drop(train_df.index).copy()


train.to_csv('window_train_with_IDs.csv', index=False)
test.to_csv('window_test_with_IDs.csv', index=False)