Python 使用torchtext为预测准备字符串_Python_Deep Learning_Data Science_Torch_Torchtext

Python 使用torchtext为预测准备字符串

python deep-learning

Python 使用torchtext为预测准备字符串,python,deep-learning,data-science,torch,torchtext,Python,Deep Learning,Data Science,Torch,Torchtext,关于如何使用torchtext训练模型，有很好的说明。但是我如何准备生产呢？如何创建预测管道？ list_of_sentences= ['not sure, still in progress', 'Yes', 'How can I increase my mbps', 'I have had to call every single month', 'Hi! Can you help me get started with my new phone?'] ] 因为eval函数很简

关于如何使用torchtext训练模型，有很好的说明。
但是我如何准备生产呢？
如何创建预测管道？

list_of_sentences=
['not sure, still in progress',
 'Yes',
 'How can I increase my mbps',
 'I have had to call every single month',
 'Hi! Can you help me get started with my new phone?']
]

因为eval函数很简单-我已经准备好了一切。
如何以同样的方式准备一些新数据

例如，以下是一些标准培训流程：
标记化=>padding=>split=>iterator

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# Model parameter
MAX_SEQ_LEN = 32
PAD_INDEX = tokenizer.convert_tokens_to_ids(tokenizer.pad_token)
UNK_INDEX = tokenizer.convert_tokens_to_ids(tokenizer.unk_token)

# Fields
id_field = Field(sequential=False, use_vocab=False, batch_first=True, dtype=torch.float)
label_field = Field(sequential=False, use_vocab=False, batch_first=True, dtype=torch.float)
text_field = Field(use_vocab=False, tokenize=tokenizer.encode, lower=False, include_lengths=False, batch_first=True,
                   fix_length=MAX_SEQ_LEN, pad_token=PAD_INDEX, unk_token=UNK_INDEX)
fields = [('id',id_field), ('message', text_field),('label', label_field),]

# TabularDataset
train, valid, test = TabularDataset.splits(path=data_dir, train='train.csv', validation='valid.csv',test='test.csv', format='CSV', fields=fields, skip_header=True)

# Iterators
train_iter = BucketIterator(train, batch_size=16, sort_key=lambda x: len(x.message),
                            device=device, train=True, sort=True, sort_within_batch=True)
valid_iter = BucketIterator(valid, batch_size=16, sort_key=lambda x: len(x.message),
                            device=device, train=True, sort=True, sort_within_batch=True)
test_iter = Iterator(test, batch_size=16, device=device, train=False, shuffle=False, sort=False)

如何准备一个简单的字符串列表供模型预测？

list_of_sentences=
['not sure, still in progress',
 'Yes',
 'How can I increase my mbps',
 'I have had to call every single month',
 'Hi! Can you help me get started with my new phone?']
]