Python Spacy get pos&；特定单词的标记_Python_Nlp_Spacy_Tagging_Pos

Python Spacy get pos&；特定单词的标记

python nlp

Python Spacy get pos&；特定单词的标记,python,nlp,spacy,tagging,pos,Python,Nlp,Spacy,Tagging,Pos,我遇到了一种情况，我必须从spacy doc对象获取pos_u&tag_u 比如说, text = "Australian striker John hits century" doc = nlp(text) for nc in doc.noun_chunks: print(nc) #Australian striker John doc[1].tag_ # gives for striker 如果我想为单词“striker”获取pos_&tag_，我需要再次将该句子赋给nlp（）还

我遇到了一种情况，我必须从spacy doc对象获取pos_u&tag_u

比如说,

text = "Australian striker John hits century"
doc = nlp(text)
for nc in doc.noun_chunks:
    print(nc) #Australian striker John
doc[1].tag_ # gives for striker

如果我想为单词“striker”获取

pos_

tag_

，我需要再次将该句子赋给

nlp（）

还有doc[1]。tag_u在那里，但我需要doc['striker']之类的东西。tag_u

有可能吗

您只需处理文本一次：

text = "Australian striker John hits century"
doc = nlp(text)
for nc in doc.noun_chunks:
    print(nc)  
    print([(token.text, token.tag_, token.pos_) for token in nc])

如果您只想在名词chunck中获得一个特定的单词，您可以通过将第二个print语句更改为例如

print([(token.text, token.tag_, token.pos_) for token in nc if token.tag_ == 'NN'])

请注意，这可能会打印多个点击，具体取决于您的型号和输入句子。

您可以执行以下操作：

text = "Australian striker John hits century"
x1 = "striker"
x2 = re.compile(x1,re.IGNORECASE | re.VERBOSE)
loc_indexes = [m.start(0) for m in re.finditer(x2, text )]
tag = [i.tag_ for i in nlp(text) if i.idx in loc_indexes ]
print(x1,tag[0])

它给出了输出：

锁扣NN

如果需要，您也可以轻松地将其动态化，x1是变量。

首先，类似于

doc['sliker']的内容。如果句子中有多个“sliker”单词，则标记

将不明确。但是关于你最初的问题，你的意思是什么，我需要再次给出那句话吗？您已经有了

doc[1]。tag_=='NN'

和

doc[1]。pos_=='NOUN'

“我是否需要再次将该句子赋给

nlp（）

？”是，因为pos标记取决于上下文。例如，没有上下文的“hits”可以是名词（“hit”的复数形式）或动词。您可能会将每个标记映射到它们的位置，并执行类似于

doc[index[word]]

的操作，但如果同一个单词出现多次，则会导致问题。好的。我认为你不需要多次解析一个句子。当

doc

准备就绪时，正如您所说，所有位置都已根据上下文正确计算。您可以执行

打印（[token.pos uu.for token in doc]）

它返回

['ADJ'，'NOUN'，'PROPN'，'VERB'，'NOUN']

@darksky，但是如何仅获取检测到的名词块的标记。。比如如果有一个大段落。@VivekAnanthan一个名词块是

spacy.tokens.span.span

而不是token。您必须迭代它来打印区块内每个令牌的标记，如so

print（[[token.tag\uuuo for token in nc]for nc in doc.noun\u chunks]）