Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/330.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/cplusplus/126.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 了解gpt-2如何标记字符串_Python_Huggingface Transformers_Transformer_Gpt 2 - Fatal编程技术网

Python 了解gpt-2如何标记字符串

Python 了解gpt-2如何标记字符串,python,huggingface-transformers,transformer,gpt-2,Python,Huggingface Transformers,Transformer,Gpt 2,使用教程,我编写了以下代码: from transformers import GPT2Tokenizer, GPT2Model import torch tokenizer = GPT2Tokenizer.from_pretrained('gpt2') model = GPT2Model.from_pretrained('gpt2') inputs = tokenizer("Hello, my dog is cute", return_tensors="pt&

使用教程,我编写了以下代码:

from transformers import GPT2Tokenizer, GPT2Model
import torch

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2Model.from_pretrained('gpt2')

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

last_hidden_states = outputs.last_hidden_state
所以我意识到“输入”是由句子中的标记项组成的。 但是我如何获得标记化项目的值呢?(参见示例[“你好”、“我的”、“狗”、“是”、“可爱”])


我这样问是因为有时我认为如果一个词不在字典中(即,一个词来自另一种语言),它会将该词分开。所以我想在我的代码中检查一下。

您可以调用
标记器。对标记器的输出进行解码,以从给定索引下的词汇表中获取单词:

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> list(map(tokenizer.decode, inputs.input_ids[0]))
['Hello', ',', ' my', ' dog', ' is', ' cute']

你的价值观是什么意思?您是否在寻找从
最后一个隐藏的\u状态
到每个(未分类的)单词的映射?