Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/349.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Torchtext TABLARDATASET()错误读取数据字段_Python_Pytorch_Torchtext - Fatal编程技术网

Python Torchtext TABLARDATASET()错误读取数据字段

Python Torchtext TABLARDATASET()错误读取数据字段,python,pytorch,torchtext,Python,Pytorch,Torchtext,目标:我想根据我的自定义数据集创建一个文本分类器,mlexplained提供的simillar(及以下)教程 发生了什么事 我成功地格式化了我的数据,创建了一个培训、验证和测试数据集,并对其进行了格式化,使其与他们使用的“有毒tweet”数据集相等(每个标记有一列,1/0表示真/假)。大多数其他部分也按预期工作,但当涉及到迭代时,我得到了一个错误 The `device` argument should be set by using `torch.device` or passing a st

目标:我想根据我的自定义数据集创建一个文本分类器,mlexplained提供的simillar(及以下)教程

发生了什么事 我成功地格式化了我的数据,创建了一个培训、验证和测试数据集,并对其进行了格式化,使其与他们使用的“有毒tweet”数据集相等(每个标记有一列,1/0表示真/假)。大多数其他部分也按预期工作,但当涉及到迭代时,我得到了一个错误

The `device` argument should be set by using `torch.device` or passing a string as an argument. 

This behavior will be deprecated soon and currently defaults to cpu.
The `device` argument should be set by using `torch.device` or passing a string as an argument. This behavior will be deprecated soon and currently defaults to cpu.
The `device` argument should be set by using `torch.device` or passing a string as an argument. This behavior will be deprecated soon and currently defaults to cpu.
The `device` argument should be set by using `torch.device` or passing a string as an argument. This behavior will be deprecated soon and currently defaults to cpu.
  0%|          | 0/25517 [00:01<?, ?it/s]
Traceback (most recent call last):
... (trace back messages)
AttributeError: 'Example' object has no attribute 'text'
试图解决已经出现的问题,我认为原因是:

opt = optim.Adam(model.parameters(), lr=1e-2)
loss_func = nn.BCEWithLogitsLoss()

epochs = 2

for epoch in range(1, epochs + 1):
    running_loss = 0.0
    running_corrects = 0
    model.train() # turn on training mode
    for x, y in tqdm.tqdm(train_dl): # **THIS LINE CONTAINS THE ERROR**
        opt.zero_grad()

        preds = model(x)
        loss = loss_func(y, preds)
        loss.backward()
        opt.step()

        running_loss += loss.data[0] * x.size(0)

    epoch_loss = running_loss / len(trn)

    # calculate the validation loss for this epoch
    val_loss = 0.0
    model.eval() # turn on evaluation mode
    for x, y in valid_dl:
        preds = model(x)
        loss = loss_func(y, preds)
        val_loss += loss.data[0] * x.size(0)

    val_loss /= len(vld)
    print('Epoch: {}, Training Loss: {:.4f}, Validation Loss: {:.4f}'.format(epoch, epoch_loss, val_loss))
trn[0].__dict__.keys()
Out[19]: dict_keys([])

trn[1].__dict__.keys()
Out[20]: dict_keys([])

trn[2].__dict__.keys()
Out[21]: dict_keys([])

trn[3].__dict__.keys()
Out[22]: dict_keys(['text'])
我知道这个问题发生在其他人身上,这里甚至有两个问题,bot都有跳过数据集中的列或行的问题(我检查了空行/Cokumns,没有发现)。另一个解决方案是,给定模型的参数必须与.csv文件中的参数顺序相同(没有遗漏)

但是,相关代码(tst、trn和vld集的加载和创建) def createTestTrain():

Has使用相同的列表和顺序,就像我的csv一样。tv_数据字段的结构与文件完全相同。此外,由于Datafield对象只是带有数据点的DICT,我通过以下方式读取字典的键,就像教程一样:

trn[0].dict_keys()
应该发生什么: 这个例子的行为是这样的

trn[0]
torchtext.data.example.Example at 0x10d3ed3c8
trn[0].__dict__.keys()
dict_keys(['comment_text', 'toxic', 'severe_toxic', 'threat', 'obscene', 'insult', 'identity_hate'])
我的结果:

opt = optim.Adam(model.parameters(), lr=1e-2)
loss_func = nn.BCEWithLogitsLoss()

epochs = 2

for epoch in range(1, epochs + 1):
    running_loss = 0.0
    running_corrects = 0
    model.train() # turn on training mode
    for x, y in tqdm.tqdm(train_dl): # **THIS LINE CONTAINS THE ERROR**
        opt.zero_grad()

        preds = model(x)
        loss = loss_func(y, preds)
        loss.backward()
        opt.step()

        running_loss += loss.data[0] * x.size(0)

    epoch_loss = running_loss / len(trn)

    # calculate the validation loss for this epoch
    val_loss = 0.0
    model.eval() # turn on evaluation mode
    for x, y in valid_dl:
        preds = model(x)
        loss = loss_func(y, preds)
        val_loss += loss.data[0] * x.size(0)

    val_loss /= len(vld)
    print('Epoch: {}, Training Loss: {:.4f}, Validation Loss: {:.4f}'.format(epoch, epoch_loss, val_loss))
trn[0].__dict__.keys()
Out[19]: dict_keys([])

trn[1].__dict__.keys()
Out[20]: dict_keys([])

trn[2].__dict__.keys()
Out[21]: dict_keys([])

trn[3].__dict__.keys()
Out[22]: dict_keys(['text'])
虽然trn[0]不包含任何内容,但它是从3扩展到15,通常应该存在的列的数量应该远远超过这个数量

现在我不知道我错在哪里了。数据合适,函数显然可以工作,但是tablerDataSet()似乎以错误的方式读取我的列(如果有的话)。我分类了吗

# Defining Tag and Text
TEXT = Field(sequential=True, tokenize=tokenize, lower=True)
LABEL = Field(sequential=False, use_vocab=False)
走错了路?至少我的调试程序似乎表明了这一点

关于Torchtext的文档很少,我很难找到答案,但是当我查看Torchtext的定义时,我看不出有什么问题


感谢您的帮助。

我发现了问题所在,显然Torchtext只接受引号中的数据,并且只使用“,”作为分隔符。我的数据不在引号内,以“;”作为分隔符