Python:从文本中提取主题及其相关短语

Python:从文本中提取主题及其相关短语,python,nlp,nltk,spacy,Python,Nlp,Nltk,Spacy,我正试着跟着线索走()。我还想从文本中提取主题及其依赖项 import spacy from textpipeliner import PipelineEngine, Context from textpipeliner.pipes import * text = 'No Offline Maps! It used to have offline maps but they disappeared. It now has a menu option to watch a video in ex

我正试着跟着线索走()。我还想从文本中提取主题及其依赖项

import spacy
from textpipeliner import PipelineEngine, Context
from textpipeliner.pipes import *

text = 'No Offline Maps! It used to have offline maps but they disappeared. It now has a menu option to watch a video in exchange for maps but it never downloads the map. Makes the app useless to me.'

pipes_structure = [
    SequencePipe([
        FindTokensPipe("VERB/nsubj/*"),
        NamedEntityFilterPipe(),
        NamedEntityExtractorPipe()
    ]),
    FindTokensPipe("VERB"),
    AnyPipe([
        SequencePipe([
            FindTokensPipe("VBD/dobj/NNP"),
            AggregatePipe([
                NamedEntityFilterPipe("GPE"),
                NamedEntityFilterPipe("PERSON")
            ]),
            NamedEntityExtractorPipe()
        ]),
        SequencePipe([
            FindTokensPipe("VBD/**/*/pobj/NNP"),
            AggregatePipe([
                NamedEntityFilterPipe("LOC"),
                NamedEntityFilterPipe("PERSON")
            ]),
            NamedEntityExtractorPipe()
        ])
    ])
]

engine = PipelineEngine(pipes_structure, Context(text), [0, 1, 2])
engine.process()
当我运行上述代码时,它抛出以下错误:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-22-5f5a5c9e8e51> in <module>()
----> 1 engine = PipelineEngine(pipes_structure, Context(text), [0, 1, 2])
      2 engine.process()

~/anaconda3/lib/python3.6/site-packages/textpipeliner/context.py in __init__(self, doc)
      4         self._current_sent_idx = -1
      5         self._paragraph = self._sents[0:9]
----> 6         for s in doc.sents:
      7             self._sents.append(s)
      8         self.doc = doc

AttributeError: 'str' object has no attribute 'sents'
---------------------------------------------------------------------------
AttributeError回溯(最近一次呼叫上次)
在()
---->1引擎=管道引擎(管道结构,上下文(文本),[0,1,2])
2.发动机工艺()
~/anaconda3/lib/python3.6/site-packages/textpipeliner/context.py in\uuuuuu init\uuuu(self,doc)
4自身。\u当前\u已发送\u idx=-1
5自我段落=自我陈述[0:9]
---->6对于文件sents中的s:
7自我陈述附加条款
8 self.doc=doc
AttributeError:'str'对象没有属性'sents'
我不确定我在哪里犯了错误。有人能帮忙解决这个问题吗?

有趣的图书馆

您的上下文需要是不同的对象。错误明确地说明了这一点。检查包裹官员:

nlp=spacy.load(“en”)
text=nlp('没有脱机地图!它以前有脱机地图,但它们消失了。现在它有一个菜单选项,可以观看视频以交换地图,但它从不下载地图。这使应用程序对我来说毫无用处。')

看起来您正在将字符串作为
文本
变量传入此行

engine = PipelineEngine(pipes_structure, Context(text), [0, 1, 2])
将第4行替换为

nlp = spacy.load("en")
text = nlp('No Offline Maps! It used to have offline maps but they disappeared. It now has a menu option to watch a video in exchange for maps but it never downloads the map. Makes the app useless to me.')
因为这是他们在你引用的帖子中所做的

这样一来,
text
不是字符串,而是nlp函数输出的任何类型,因此它在第二行到最后一行工作