Python 无法使用pyrouge评估我的摘要

Python 无法使用pyrouge评估我的摘要,python,Python,我想评估一下我的摘要。派鲁格在场。pyrouge是ROUGE摘要评估包的python包装器。 我按顺序成功地执行了以下命令: git clone https://github.com/bheinzerling/pyrouge cd pyrouge python setup.py install pyrouge_set_rouge_path /absolute/path/to/ROUGE-1.5.5/directory python -m pyrouge.test <pre> ---

我想评估一下我的摘要。派鲁格在场。pyrouge是ROUGE摘要评估包的python包装器。 我按顺序成功地执行了以下命令:

git clone https://github.com/bheinzerling/pyrouge
cd pyrouge
python setup.py install
pyrouge_set_rouge_path /absolute/path/to/ROUGE-1.5.5/directory
python -m pyrouge.test
<pre>
--------------------------------------------------------------------------- UnicodeDecodeError                        Traceback (most recent call last) <ipython-input-8-b3bc5a66e7f0> in <module>()
      6 r.model_filename_pattern = 'sum.[A-Z].#ID#.txt'
      7 
----> 8 output = r.convert_and_evaluate()
      9 print(output)
     10 output_dict = r.output_to_dict(output)

/home/afsharizadeh/anaconda3/lib/python3.6/site-packages/pyrouge/Rouge155.py in convert_and_evaluate(self, system_id, split_sentences, rouge_args)
    358         if split_sentences:
    359             self.split_sentences()
--> 360         self.__write_summaries()
    361         rouge_output = self.evaluate(system_id, rouge_args)
    362         return rouge_output

/home/afsharizadeh/anaconda3/lib/python3.6/site-packages/pyrouge/Rouge155.py in __write_summaries(self)
    487     def __write_summaries(self):
    488         self.log.info("Writing summaries.")
--> 489         self.__process_summaries(self.convert_summaries_to_rouge_format)
    490 
    491     @staticmethod

/home/afsharizadeh/anaconda3/lib/python3.6/site-packages/pyrouge/Rouge155.py in __process_summaries(self, process_func)
    481             "model files to {}.".format(new_system_dir, new_model_dir))
    482         process_func(self._system_dir, new_system_dir)
--> 483         process_func(self._model_dir, new_model_dir)
    484         self._system_dir = new_system_dir
    485         self._model_dir = new_model_dir

/home/afsharizadeh/anaconda3/lib/python3.6/site-packages/pyrouge/Rouge155.py in convert_summaries_to_rouge_format(input_dir, output_dir)
    200         """
    201         DirectoryProcessor.process(
--> 202             input_dir, output_dir, Rouge155.convert_text_to_rouge_format)
    203 
    204     @staticmethod

/home/afsharizadeh/anaconda3/lib/python3.6/site-packages/pyrouge/utils/file_utils.py in process(input_dir, output_dir, function)
     27             input_file = os.path.join(input_dir, input_file_name)
     28             with codecs.open(input_file, "r", encoding="UTF-8") as f:
---> 29                 input_string = f.read()
     30             output_string = function(input_string)
     31             output_file = os.path.join(output_dir, input_file_name)

/home/afsharizadeh/anaconda3/lib/python3.6/codecs.py in read(self, size)
    696     def read(self, size=-1):
    697 
--> 698         return self.reader.read(size)
    699 
    700     def readline(self, size=None):

/home/afsharizadeh/anaconda3/lib/python3.6/codecs.py in read(self, size, chars, firstline)
    499                 break
    500             try:
--> 501                 newchars, decodedbytes = self.decode(data, self.errors)
    502             except UnicodeDecodeError as exc:
    503                 if firstline:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 947: invalid continuation byte
</pre>
我的问题是当我想用pyrouge来评估我的总结时。我写了以下命令:

from pyrouge import Rouge155
r = Rouge155()
r.system_dir = "/home/afsharizadeh/Desktop/summarization/summarization_dataset/DUC_2007/2007/all_sum/system_sum/"
r.model_dir = "/home/afsharizadeh/Desktop/summarization/summarization_dataset/DUC_2007/2007/all_sum/ref_sum/"
r.system_filename_pattern = 'sum.(\d+).txt'
r.model_filename_pattern = 'sum.[A-Z].#ID#.txt'

output = r.convert_and_evaluate()
print(output)
output_dict = r.output_to_dict(output)
但我收到了这个错误:


---------------------------------------------------------------------------UnicodeDecodeError回溯(最近一次呼叫最后一次)在()
6 r.model_filename_pattern='sum.#ID#.txt'
7.
---->8输出=r.转换_和_求值()
9打印(输出)
10输出dict=r.输出dict(输出)
/home/afsharizadeh/anaconda3/lib/python3.6/site-packages/pyrouge/Rouge155.py在convert_和_evaluate中(self、system_id、split_语句、胭脂_args)
358如果分句:
359 self.分裂句子()
-->360自我。撰写总结()
361胭脂输出=自我评估(系统id,胭脂参数)
362返回胭脂输出
/home/afsharizadeh/anaconda3/lib/python3.6/site-packages/pyrouge/Rouge155.py in_uuuu编写摘要(self)
487定义-编写摘要(自我):
488 self.log.info(“编写摘要”)
-->489 self.\u流程\u摘要(self.convert\u摘要\u至\u胭脂\u格式)
490
491@staticmethod
/home/afsharizadeh/anaconda3/lib/python3.6/site-packages/pyrouge/Rouge155.py in_uuuu过程总结(自我,过程功能)
481“模型文件到{}.”格式(新系统目录,新模型目录))
482过程功能(自我系统目录、新系统目录)
-->483过程功能(自我模型目录、新模型目录)
484自我系统目录=新系统目录
485自。_model_dir=新的_model_dir
/home/afsharizadeh/anaconda3/lib/python3.6/site-packages/pyrouge/Rouge155.py,将摘要转换为胭脂格式(输入目录、输出目录)
200         """
201 DirectoryProcessor.process(
-->202输入目录、输出目录、胭脂155。将文本转换为胭脂格式)
203
204@staticmethod
/home/afsharizadeh/anaconda3/lib/python3.6/site-packages/pyrouge/utils/file\u utils.py正在处理中(输入目录、输出目录、函数)
27 input\u file=os.path.join(input\u dir,input\u file\u name)
28使用编解码器。打开(输入_文件,“r”,encoding=“UTF-8”)作为f:
--->29 input_string=f.read()
30输出字符串=函数(输入字符串)
31 output\u file=os.path.join(output\u dir,input\u file\u name)
/home/afsharizadeh/anaconda3/lib/python3.6/codecs.py读取(自身,大小)
696 def读取(自身,大小=-1):
697
-->698返回自读器读取(大小)
699
700 def读线(自身,大小=无):
/home/afsharizadeh/anaconda3/lib/python3.6/codecs.py读取(self、size、chars、firstline)
499休息
500次尝试:
-->501 newchars,decodedbytes=self.decode(数据,self.errors)
502除UNICEDECODEDEERROR作为exc外:
503如果第一行:
UnicodeDecodeError:“utf-8”编解码器无法解码位置947中的字节0xe9:无效的连续字节

我该怎么办?

源代码中有很多
编码='utf-8')
。您可以尝试检查文件是否为utf-8,或者在需要时将其转换为utf-8。应该有很多工具用于在线转换google尝试。源代码中有很多
编码='utf-8')
。您可以尝试检查文件是否为utf-8或utf-8如果需要的话,可以转换成UTF-8。应该有很多工具可以在谷歌第一次尝试在线转换