Python NLTK中分块后如何获取句子?
我有一句话如下:Python NLTK中分块后如何获取句子?,python,nlp,nltk,Python,Nlp,Nltk,我有一句话如下: txt = "i am living in the West Bengal and my brother live in New York. My name is John Smith" 我需要的是: 获取以GPE/location作为标签的块,并使用“\ux”组合这些块 使用PERSON标签获取块并删除这些块 我需要的输出: preprocessed_txt = "i am living in the West_Bengal and my
txt = "i am living in the West Bengal and my brother live in New York. My name is John Smith"
我需要的是:
preprocessed_txt = "i am living in the West_Bengal and my brother live in New_York. My name is "
我使用来自的代码获取块的标签
import nltk
for sent in nltk.sent_tokenize(sentence):
for chunk in nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(sent))):
if hasattr(chunk, 'label'):
print(chunk.label(), '_'.join(c[0] for c in chunk))
这会将输出返回给我,如下所示:
LOCATION West_Bengal
GPE New_York
PERSON John_Smith
下一步怎么办?这应该是您所需要的全部:
new = list()
for chunk in nltk.ne_chunk(nltk.pos_tag(tokens)):
try:
if chunk.label().lower() == 'person':
continue
else:
new.append('_'.join(c[0] for c in chunk))
except AttributeError:
new.append(chunk[0])
print(' '.join(new))
尝试
“'.join(c[0]表示块中的c))
这将输出为:地点西孟加拉邦GPE纽约人John_Smith您必须重新编码,捕获列表中的令牌,然后提取名称,并用列表中的原始名称替换它们tokens@YashvanderBamel... 如何做到这一点?这就是我的问题所在。