Python 对于BERT文本分类，ValueError:发生过多维度“str”错误_Python_Tensor_Text Classification_Bert Language Model_Mlp

Python 对于BERT文本分类，ValueError:发生过多维度“str”错误

python

Python 对于BERT文本分类，ValueError:发生过多维度“str”错误,python,tensor,text-classification,bert-language-model,mlp,Python,Tensor,Text Classification,Bert Language Model,Mlp,尝试用伯特模型制作文本情感分类器，但得到了值错误：维度“str”太多即列车数据值的数据帧；所以它们是火车的标签 0 notr 1 notr 2 notr 3 negative 4 notr ... ... 854 positive 855 notr 856 notr 857 notr 858 positive 还有一个代码，它为 train_seq = torch.tensor(tokens_train['input_ids']) train_mask = torch.t

尝试用伯特模型制作文本情感分类器，但得到了值错误：维度“str”太多

即列车数据值的数据帧；所以它们是火车的标签

0   notr
1   notr
2   notr
3   negative
4   notr
... ...
854 positive
855 notr
856 notr
857 notr
858 positive

还有一个代码，它为

train_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())

在列车上，y=torch.tensortrain\U labels.tolist；获取错误： ValueError:维度“str”太多

你能帮我吗

理由

问题是您正在传递torch.tensor中的字符串str列表，它只接受整数、浮点等数值列表

解决方案

因此，我建议您在将字符串标签传递给torch.tensor之前将其转换为整数值

实施

下面的代码可能会对您有所帮助

# a temporary list to store the string labels
temp_list = train_labels.tolist()

# dictionary that maps integer to its string value 
label_dict = {}

# list to store integer labels 
int_labels = []

for i in range(len(temp_list)):
    label_dict[i] = temp_list[i]
    int_labels.append(i)

现在将这个int_标签传递给torch.tensor并将其用作标签

train_y = torch.tensor(int_labels)

无论何时，只要想看到任何整数的相应字符串标签，只需使用label_dict dictionary即可。

原因

问题是您正在传递torch.tensor中的字符串str列表，它只接受整数、浮点等数值列表

解决方案

因此，我建议您在将字符串标签传递给torch.tensor之前将其转换为整数值

实施

下面的代码可能会对您有所帮助

# a temporary list to store the string labels
temp_list = train_labels.tolist()

# dictionary that maps integer to its string value 
label_dict = {}

# list to store integer labels 
int_labels = []

for i in range(len(temp_list)):
    label_dict[i] = temp_list[i]
    int_labels.append(i)

现在将这个int_标签传递给torch.tensor并将其用作标签

train_y = torch.tensor(int_labels)

无论何时，只要您想看到任何整数的相应字符串标签，只需使用label_dict dictionary。

谢谢，它确实转换为整数，但在分类方面存在问题

0
0   positive
1   negative
2   positive
3   notr
4   positive
... ...
4002    notr
4003    positive
4004    positive
4005    notr
4006    negative

帧中有数据，在转换为int之后

0   0
1   1
2   2
3   3
4   4
... ...
4002    4002
4003    4003
4004    4004
4005    4005
4006    4006

它变成这样，我需要的是所有的正数、中性数和负数，表示为0表示负数，1表示中性数，2表示pos，谢谢，它确实转换为整数，但是分类有问题

0
0   positive
1   negative
2   positive
3   notr
4   positive
... ...
4002    notr
4003    positive
4004    positive
4005    notr
4006    negative

帧中有数据，在转换为int之后

0   0
1   1
2   2
3   3
4   4
... ...
4002    4002
4003    4003
4004    4004
4005    4005
4006    4006

它变成这样，我需要的是所有的正数，中性数和负数，表示为0的neg-1的neutral-2的pos，我有同样的问题：这对我来说是可行的，我想您需要在阅读csv后在代码的开头这样做： df['labels']=df['labels']。替换['negative'，'notr'，'positive']，[0,1,2]

然后从这些标签中分离出来进行培训和测试。

我遇到了同样的问题：这对我来说是可行的，我想您需要在阅读csv后在代码的开头这样做： df['labels']=df['labels']。替换['negative'，'notr'，'positive']，[0,1,2]

然后从这些标签中拆分以进行培训和测试