Pytorch 如何定义与拥抱面变压器管道汇总的比率？_Pytorch_Huggingface Transformers

Pytorch 如何定义与拥抱面变压器管道汇总的比率？

pytorch

Pytorch 如何定义与拥抱面变压器管道汇总的比率？,pytorch,huggingface-transformers,Pytorch,Huggingface Transformers,我使用以下代码总结了一篇使用huggingface transformer管道的文章。使用此代码：来自变压器导入管道摘要器=管道（task=“摘要”）摘要=摘要器（文本）打印（摘要[0][“摘要文本]）如何定义摘要与原文之间的比率？例如，20%的原始文章编辑1：我实现了您建议的解决方案，但出现以下错误。这是我使用的代码： summarizer(text, min_length = int(0.1 * len(text)), max_length = int(0.2 * len(te

我使用以下代码总结了一篇使用huggingface transformer管道的文章。使用此代码：

来自变压器导入管道
摘要器=管道（task=“摘要”）
摘要=摘要器（文本）
打印（摘要[0][“摘要文本]）

如何定义摘要与原文之间的比率？例如，20%的原始文章

编辑1：我实现了您建议的解决方案，但出现以下错误。这是我使用的代码：

summarizer(text, min_length = int(0.1 * len(text)), max_length = int(0.2 * len(text)))
print(summary[0]['summary_text'])

我得到的错误是：

RuntimeError                              Traceback (most recent call last)
<ipython-input-9-bc11c5d8eb66> in <module>()
----> 1 summarizer(text, min_length = int(0.1 * len(text)), max_length = int(0.2 * len(text)))
      2 print(summary[0]['summary_text'])

13 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   1482         # remove once script supports set_grad_enabled
   1483         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 1484     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
   1485 
   1486 

RuntimeError: index out of range: Tried to access index 1026 out of table with 1025 rows. at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418

运行时错误回溯（最近一次调用）
在（）
---->1个摘要器（文本，最小长度=int（0.1*len（文本）），最大长度=int（0.2*len（文本）））
2打印（摘要[0][“摘要文本]）
13帧
/嵌入中的usr/local/lib/python3.6/dist-packages/torch/nn/functional.py（输入、重量、填充idx、最大规范、规范类型、比例梯度、频率、稀疏）
1482#一旦脚本支持set#grad#enabled，则删除
1483(权重,输入,最大范数,范数类型)(无梯度)(嵌入)(renorm)
->1484返回火炬。嵌入（重量、输入、填充idx、比例、梯度、稀疏）
1485
1486
运行时错误：索引超出范围：尝试访问表外的索引1026，共1025行。at/pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418

（请注意，此答案基于变压器版本2.6的文档）

似乎到目前为止，关于管道特性的文档仍然很浅，这就是为什么我们必须深入挖掘的原因。调用Python对象时，它会在内部引用它自己的

\uuuuu call\uuuu

属性，我们可以找到它

请注意，它允许我们（类似于底层）指定

minu_length

和

max_length

，这就是为什么我们可以简单地调用

summarizer(text, min_length = 0.1 * len(text), max_length = 0.2 * len(text)

这将为您提供原始数据长度约10-20%的摘要，但您当然可以根据自己的喜好进行更改。请注意，

BartForConditionalGeneration

对于

max_length

的默认值为20（到目前为止，

min_length

未记录，但默认为0），而摘要管道的值为

min_length=21

和

max_length=142

，感谢您的回答！请检查@Denninger的编辑是否可以指定

text

的值？您运行的是哪个版本的

transformers

？以下是文本-我使用的

transformers==2.6.0

预训练Bart仅支持1024个令牌的输入长度。因此，问题显然是你提到的文本太长了。（供将来参考，由于pastebin链接过期，文本跨越了几个段落）。在这一点上，你最好的办法是分成不同的部分，然后对每个部分进行总结。啊。在BERT中，1标记的定义是什么？