Python 3.x 在python中编程discord bot时遇到错误；“字符串索引超出范围”；_Python 3.x_Nltk_Markov Chains_Discord

Python 3.x 在python中编程discord bot时遇到错误；“字符串索引超出范围”；

python-3.x discord

Python 3.x 在python中编程discord bot时遇到错误；“字符串索引超出范围”；,python-3.x,nltk,markov-chains,discord,Python 3.x,Nltk,Markov Chains,Discord,我正在制作一个discord机器人，每隔几秒钟就会在聊天中弹出随机生成的句子。我试图使用nltk模块使句子更加连贯，但我遇到了一个错误，无法理解 import asyncio import random import discord.ext.commands import markovify import nltk import re class POSifiedText(markovify.Text): def word_split(self, sentence):

我正在制作一个discord机器人，每隔几秒钟就会在聊天中弹出随机生成的句子。我试图使用nltk模块使句子更加连贯，但我遇到了一个错误，无法理解

import asyncio
import random
import discord.ext.commands
import markovify
import nltk
import re

class POSifiedText(markovify.Text):
    def word_split(self, sentence):
        words = re.split(self.word_split_pattern, sentence)
        words = ["::".join(tag) for tag in nltk.pos_tag(words) ]
        return words

    def word_join(self, words):
        sentence = " ".join(word.split("::")[0] for word in words)
        return sentence

with open("/root/sample.txt") as f:
    text = f.read()

text_model = POSifiedText(text, state_size=1)

client = discord.Client()
async def background_loop():
    await client.wait_until_ready()
    while not client.is_closed:
        channel = client.get_channel('channelid')
        messages = [(text_model.make_sentence(tries=8, max_overlap_total=10,default_max_overlap_ratio=0.5))]
        await client.send_message(channel, random.choice(messages))
        await asyncio.sleep(10)

client.loop.create_task(background_loop())
client.run("token")

以下是输出日志中的错误：

Traceback (most recent call last):
  File "/root/untitled/Loop.py", line 21, in <module>
    text_model = POSifiedText(text, state_size=1)
  File "/usr/local/lib/python3.5/dist-packages/markovify/text.py", line 24, in __init__
    runs = list(self.generate_corpus(input_text))
  File "/root/untitled/Loop.py", line 11, in word_split
    words = [": :".join(tag) for tag in nltk.pos_tag(words) ]
  File "/usr/local/lib/python3.5/dist-packages/nltk/tag/__init__.py", line 129, in pos_tag
    return _pos_tag(tokens, tagset, tagger)    
  File "/usr/local/lib/python3.5/dist-packages/nltk/tag/__init__.py", line 97, in _pos_tag
    tagged_tokens = tagger.tag(tokens)
  File "/usr/local/lib/python3.5/dist-packages/nltk/tag/perceptron.py", line 152, in tag
    context = self.START + [self.normalize(w) for w in tokens] + self.END
  File "/usr/local/lib/python3.5/dist-packages/nltk/tag/perceptron.py", line 152, in <listcomp>
    context = self.START + [self.normalize(w) for w in tokens] + self.END
  File "/usr/local/lib/python3.5/dist-packages/nltk/tag/perceptron.py", line 227, in normalize
    elif word[0].isdigit():
IndexError: string index out of range

回溯（最近一次呼叫最后一次）：
文件“/root/untitled/Loop.py”，第21行，在
text\u model=POSifiedText（text，state\u size=1）
文件“/usr/local/lib/python3.5/dist packages/markovify/text.py”，第24行，在__
运行=列表（自生成语料库（输入文本））
文件“/root/untitled/Loop.py”，第11行，word\u split
words=[“：：”。为nltk.pos_标记（words）中的标记加入（标记）]
文件“/usr/local/lib/python3.5/dist-packages/nltk/tag/__init__.py”，pos_标记中的第129行
返回标记（标记、标记集、标记器）
文件“/usr/local/lib/python3.5/dist-packages/nltk/tag/___init__.py”，第97行，位于位置标签中
taged_tokens=tagger.tag（标记）
文件“/usr/local/lib/python3.5/dist-packages/nltk/tag/perceptron.py”，第152行，在标记中
context=self.START+[self.normalize（w）表示令牌中的w]+self.END
文件“/usr/local/lib/python3.5/dist-packages/nltk/tag/perceptron.py”，第152行，在
context=self.START+[self.normalize（w）表示令牌中的w]+self.END
文件“/usr/local/lib/python3.5/dist-packages/nltk/tag/perceptron.py”，第227行，在normalize中
elif字[0]。isdigit（）：
索引器错误：字符串索引超出范围
事实上word[0].isdigit（）：
抛出错误意味着word
是一个空字符串。最可能的原因是您的正则表达式拆分有时会产生空字符串
解决办法是，在
words = re.split(self.word_split_pattern, sentence)

排队
words = [w for w in words if len(w) > 0]

如果word[0].isdigit（）：
抛出该错误，word
有时是空字符串。要添加到我以前的注释中，回溯中有很多函数调用。如果在下游的某个地方输入了空单词（在您只是调用的代码中），那么调试可能会很困难。另一方面，可能修复方法很简单，只需将空字符串从re.split（self.word\u split\u模式，句子）
words=[w for w in words if len（w）>0]
中过滤出来，然后将其传递到nltk.pos\u标记（）
。我只是在猜测，但尝试似乎没有什么坏处。这修复了错误，但使生成的句子没有空格。修复了问题。在words=[“：：”.为nltk.pos\u标记（words）]中的标记添加了一些空格。