Python NLTK中没有pos_标记的ne_块_Python_Tree_Tags_Nltk_Chunking

Python NLTK中没有pos_标记的ne_块

python tree tags

Python NLTK中没有pos_标记的ne_块,python,tree,tags,nltk,chunking,Python,Tree,Tags,Nltk,Chunking,我试图用nltk中的ne_chunk和pos_标记来拼凑一个句子 from nltk import tag from nltk.tag import pos_tag from nltk.tree import Tree from nltk.chunk import ne_chunk sentence = "Michael and John is reading a booklet in a library of Jakarta" tagged_sent = pos_tag(sentence.s

我试图用nltk中的ne_chunk和pos_标记来拼凑一个句子

from nltk import tag
from nltk.tag import pos_tag
from nltk.tree import Tree
from nltk.chunk import ne_chunk

sentence = "Michael and John is reading a booklet in a library of Jakarta"
tagged_sent = pos_tag(sentence.split())

print_chunk = [chunk for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)]

print print_chunk

这就是结果：

[Tree('GPE', [('Michael', 'NNP')]), Tree('PERSON', [('John', 'NNP')]), Tree('GPE', [('Jakarta', 'NNP')])]

我的问题是，是否可以不包括pos_标签（如上面的NNP），而只包括树“GPE”和“PERSON”？ “GPE”是什么意思

提前感谢

命名实体chunker将为您提供一个包含块和标记的树。你不能改变，但是你可以把标签拿出来。从您发送的

标记开始

：

chunks = nltk.ne_chunk(tagged_sent)
simple = []
for elt in chunks:
    if isinstance(elt, Tree):
        simple.append(Tree(elt.label(), [ word for word, tag in elt ]))
    else:
        simple.append( elt[0] )

如果只需要块，请省略上面的

else:

子句。您可以调整代码以任意方式包装块。我使用了一个nltk

树

，将更改保持在最低限度。请注意，某些块由多个单词组成（请尝试在示例中添加“New York”），因此块的内容必须是列表，而不是单个元素

PS.“GPE”代表“地缘政治实体”（显然是一个chunker错误）。您可以在nltk手册中看到“常用标记”的列表。

很可能您需要对标记上的代码稍作修改

是否可以不包括pos_标签（如上面的NNP），而只包括树“GPE”和“PERSON”

是，只需遍历树对象=）请参见

“GPE”是什么意思

GPE是指“地缘政治实体”

```
GPE
```
标记来自
有两个经过预培训的NE Chunker可用，请参阅
支持3个标记集：
有关详细说明，请参阅

谢谢，它很管用！但是我怎样才能训练一些特殊的技能呢？比如Michael必须是“PERSON”而不是“GPE”，因为它是人名。阅读nltk书籍。如果你还想知道的话，在这里问一个新问题。简而言之，你可以添加一本人名词典来覆盖统计线索，但一般来说你做不了多少。你试图手工修复太多，你打破了比你修复更多。（例如，“伊丽莎白”是新泽西州的人还是城市？）

>>> from nltk import Tree, pos_tag, ne_chunk
>>> sentence = "Michael and John is reading a booklet in a library of Jakarta"
>>> tagged_sent = ne_chunk(pos_tag(sentence.split()))
>>> tagged_sent
Tree('S', [Tree('GPE', [('Michael', 'NNP')]), ('and', 'CC'), Tree('PERSON', [('John', 'NNP')]), ('is', 'VBZ'), ('reading', 'VBG'), ('a', 'DT'), ('booklet', 'NN'), ('in', 'IN'), ('a', 'DT'), ('library', 'NN'), ('of', 'IN'), Tree('GPE', [('Jakarta', 'NNP')])])

>>> from nltk.sem.relextract import NE_CLASSES
>>> ace_tags = NE_CLASSES['ace']

>>> for node in tagged_sent:
...     if type(node) == Tree and node.label() in ace_tags:
...         words, tags = zip(*node.leaves())
...         print node.label() + '\t' +  ' '.join(words)
... 
GPE Michael
PERSON  John
GPE Jakarta