Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/341.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python NLTK中分块后如何获取句子?_Python_Nlp_Nltk - Fatal编程技术网

Python NLTK中分块后如何获取句子?

Python NLTK中分块后如何获取句子?,python,nlp,nltk,Python,Nlp,Nltk,我有一句话如下: txt = "i am living in the West Bengal and my brother live in New York. My name is John Smith" 我需要的是: 获取以GPE/location作为标签的块,并使用“\ux”组合这些块 使用PERSON标签获取块并删除这些块 我需要的输出: preprocessed_txt = "i am living in the West_Bengal and my

我有一句话如下:

txt =  "i am living in the West Bengal and my brother live in New York. My name is John Smith"
我需要的是:

  • 获取以GPE/location作为标签的块,并使用“\ux”组合这些块
  • 使用PERSON标签获取块并删除这些块
  • 我需要的输出:

    preprocessed_txt =  "i am living in the West_Bengal and my brother live in New_York. My name is "
    
    我使用来自的代码获取块的标签

    import nltk
    for sent in nltk.sent_tokenize(sentence):
       for chunk in nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(sent))):
          if hasattr(chunk, 'label'):
             print(chunk.label(), '_'.join(c[0] for c in chunk))
    
    这会将输出返回给我,如下所示:

    LOCATION West_Bengal
    GPE New_York
    PERSON John_Smith
    

    下一步怎么办?

    这应该是您所需要的全部:

    new = list()
    for chunk in nltk.ne_chunk(nltk.pos_tag(tokens)):
      try:
        if chunk.label().lower() == 'person':
          continue
        else:
          new.append('_'.join(c[0] for c in chunk))
    
      except AttributeError:
        new.append(chunk[0])
    
    print(' '.join(new))
    

    尝试
    “'.join(c[0]表示块中的c))
    这将输出为:地点西孟加拉邦GPE纽约人John_Smith您必须重新编码,捕获列表中的令牌,然后提取名称,并用列表中的原始名称替换它们tokens@YashvanderBamel... 如何做到这一点?这就是我的问题所在。