Python NLTK中分块后如何获取句子？_Python_Nlp_Nltk

Python NLTK中分块后如何获取句子？

python nlp

Python NLTK中分块后如何获取句子？,python,nlp,nltk,Python,Nlp,Nltk,我有一句话如下： txt = "i am living in the West Bengal and my brother live in New York. My name is John Smith" 我需要的是：获取以GPE/location作为标签的块，并使用“\ux”组合这些块使用PERSON标签获取块并删除这些块我需要的输出： preprocessed_txt = "i am living in the West_Bengal and my

我有一句话如下：

txt =  "i am living in the West Bengal and my brother live in New York. My name is John Smith"

我需要的是：

获取以GPE/location作为标签的块，并使用“\ux”组合这些块

使用PERSON标签获取块并删除这些块

我需要的输出：

preprocessed_txt =  "i am living in the West_Bengal and my brother live in New_York. My name is "

我使用来自的代码获取块的标签

import nltk
for sent in nltk.sent_tokenize(sentence):
   for chunk in nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(sent))):
      if hasattr(chunk, 'label'):
         print(chunk.label(), '_'.join(c[0] for c in chunk))

这会将输出返回给我，如下所示：

LOCATION West_Bengal
GPE New_York
PERSON John_Smith

下一步怎么办？

这应该是您所需要的全部：

new = list()
for chunk in nltk.ne_chunk(nltk.pos_tag(tokens)):
  try:
    if chunk.label().lower() == 'person':
      continue
    else:
      new.append('_'.join(c[0] for c in chunk))

  except AttributeError:
    new.append(chunk[0])

print(' '.join(new))

尝试

“'.join（c[0]表示块中的c））

这将输出为：地点西孟加拉邦GPE纽约人John_Smith您必须重新编码，捕获列表中的令牌，然后提取名称，并用列表中的原始名称替换它们tokens@YashvanderBamel... 如何做到这一点？这就是我的问题所在。