Python 2.7 使用pytextrank（textrank的Python实现）时Spacy中的值错误_Python 2.7_Jupyter Notebook_Spacy_Pytextrank

Python 2.7 使用pytextrank（textrank的Python实现）时Spacy中的值错误

python-2.7 jupyter-notebook

Python 2.7 使用pytextrank（textrank的Python实现）时Spacy中的值错误,python-2.7,jupyter-notebook,spacy,pytextrank,Python 2.7,Jupyter Notebook,Spacy,Pytextrank,我曾经提取过关键词。我使用下面的命令安装了pytextrank和spacy pip install pytextrank pip install -U spacy python -m spacy download en 这是我的密码 import pytextrank import sys path_stage0 = jsonPath path_stage1 = "data/json/temp/o1.json" with open(path_stage1, 'w') as f: f

我曾经提取过关键词。我使用下面的命令安装了pytextrank和spacy

pip install pytextrank
pip install -U spacy
python -m spacy download en

这是我的密码

import pytextrank
import sys

path_stage0 = jsonPath
path_stage1 = "data/json/temp/o1.json"

with open(path_stage1, 'w') as f:
    for graf in pytextrank.parse_doc(pytextrank.json_iter(path_stage0)):
        f.write("%s\n" % pytextrank.pretty_print(graf._asdict()))
        # to view output in this notebook
        print(pytextrank.pretty_print(graf))

当我尝试执行此操作时，我得到以下错误

ValueError                                Traceback (most recent call last)
<ipython-input-12-07819fc6acea> in <module>()
  6 
  7 with open(path_stage1, 'w') as f:
  ----> 8     for graf in 
  pytextrank.parse_doc(pytextrank.json_iter(path_stage0)):
  9         f.write("%s\n" % pytextrank.pretty_print(graf._asdict()))
 10         # to view output in this notebook

 /home/sameera/anaconda2/lib/python2.7/site-
 packages/pytextrank/pytextrank.pyc in parse_doc(json_iter)
259                 print("graf_text:", graf_text)
260 
--> 261             grafs, new_base_idx = parse_graf(meta["id"], graf_text, base_idx)
262             base_idx = new_base_idx
263 

/home/sameera/anaconda2/lib/python2.7/site-packages/pytextrank/pytextrank.pyc in parse_graf(doc_id, graf_text, base_idx, spacy_nlp)
193     doc = spacy_nlp(graf_text, parse=True)
194 
--> 195     for span in doc.sents:
196         graf = []
197         digest = hashlib.sha1()

/home/sameera/anaconda2/lib/python2.7/site-packages/spacy/tokens/doc.pyx in __get__ (spacy/tokens/doc.cpp:9664)()
432 
433             if not self.is_parsed:
--> 434                 raise ValueError(
435                     "sentence boundary detection requires the dependency parse, which "
436                     "requires data to be installed. If you haven't done so, run: "

ValueError: sentence boundary detection requires the dependency parse, which 
requires data to be installed. If you haven't done so, run: 
python -m spacy download en
to install the data

ValueError回溯（最近一次调用）
在（）
6.
7开放式（路径1，w'）作为f：
---->格拉夫8号
pytextrank.parse_doc（pytextrank.json_iter（path_stage0））：
9 f.write（“%s\n”%pytextrank.pretty\u print（graf.\u asdict（）））
10#查看此笔记本中的输出
/home/sameera/anaconda2/lib/python2.7/site-
parse_doc（json_iter）中的packages/pytextrank/pytextrank.pyc
259打印（“graf_文本：”，graf_文本）
260
-->261 grafs，new_base_idx=parse_graf（meta[“id”]，graf_text，base_idx）
262 base_idx=新的base_idx
263
/home/sameera/anaconda2/lib/python2.7/site-packages/pytextrank/pytextrank.pyc in parse_graf（doc_id、graf_text、base_idx、spacy_nlp）
193 doc=spacy\u nlp（graf\u text，parse=True）
194
-->195文件中的跨度：
196格拉夫=[]
197 digest=hashlib.sha1（）
/home/sameera/anaconda2/lib/python2.7/site-packages/spacy/tokens/doc.pyx in_uuuuget_uuu（spacy/tokens/doc.cpp:9664）（）
432
433如果未解析self.u：
-->434提升值错误(
435“句子边界检测需要依赖项解析，它”
436“需要安装数据。如果尚未安装，请运行：”
ValueError：句子边界检测需要依赖项解析，这
需要安装数据。如果尚未安装，请运行：
python-mspacy下载
要安装数据

我使用的是python 2.7、anaconda 4.3、jupyter笔记本和ubuntu 14.04，这可能只是您将代码复制到StackOverflow时的一个错误，但如果不是：

确保在“with”语句下面缩进所有内容，包括for循环

基本上：

with open(path_stage1, 'w') as f:
    for graf in pytextrank.parse_doc(pytextrank.json_iter(path_stage0)):
        f.write("%s\n" % pytextrank.pretty_print(graf._asdict()))
        print(pytextrank.pretty_print(graf))

最好使用

pytextrank

包中的

requirements.txt

，而不是

pip install-U spacy

——因为

spacy

发展迅速，

-U

将安装最新版本。这些更新并不总是向后兼容的

此外，您还可以在GitHub repo上为

pytextrank

发布问题：

很高兴听到用法：）