如何在python中使用迭代器在句子边界拆分句子_Python_String_Split_Iterator

如何在python中使用迭代器在句子边界拆分句子

python string

如何在python中使用迭代器在句子边界拆分句子,python,string,split,iterator,Python,String,Split,Iterator,嗨，我必须解析一个字符串，这样我就可以在标点处拆分它，并将每个句子写在单独的一行上。有些情况下，标点符号不是句子边界，因此我不会将其拆分（出于调试目的，在出现这些情况时，我会打印一条消息）以下是我的代码（如下所示）：行是我正在读的字符串标点列表是一个预定义的列表（没有那么重要）句子边界是我试图用来知道何时拆分句子的布尔值我使用I、prev和c检查当前、下一个和下一个、下一个字符因为我是反向工作的，所以代码会找到所有而不是句子边界的条件。它检查多个大小写，并使用迭代器检查下一个字符

嗨，我必须解析一个字符串，这样我就可以在标点处拆分它，并将每个句子写在单独的一行上。有些情况下，标点符号不是句子边界，因此我不会将其拆分（出于调试目的，在出现这些情况时，我会打印一条消息）

以下是我的代码（如下所示）：

行是我正在读的字符串
标点列表是一个预定义的列表（没有那么重要）
句子边界是我试图用来知道何时拆分句子的布尔值
我使用I、prev和c检查当前、下一个和下一个、下一个字符

因为我是反向工作的，所以代码会找到所有而不是句子边界的条件。它检查多个大小写，并使用迭代器检查下一个字符。因为我使用的是迭代器，所以我决定每次使用递归传递一个较小的字符串，这样我就可以迭代搜索整个字符串。该功能正在运行
但是，目标是在标点符号实际上是句子边界的点（即，当其他情况不满足时）拆分字符串。由于我的递归函数，我陷入了一个小问题，我无法跟踪我所在列表的索引，因此不知道在哪里拆分句子。我曾考虑以某种方式使用辅助函数，但我不知道如何跟踪索引
如果您能帮助我修改此代码，我们将不胜感激。我知道我的方法有点倒退（而不是寻找在哪里拆分句子，我正在寻找在哪里不拆分句子），但如果可能的话，我仍然希望使用这段代码

def parse(line): #function sentence_boundary = True if (len(line) == 3): return t = iter(line) i = next(t) prev = next(t) c = next(t) # periods followed by a digit with no intervening whitespace are not sentence boundaries if i == "." and (prev.isdigit()): print("This is a digit") sentence_boundary = False # periods followed by certain kinds of punctuation are probably not sentence boundaries for j in punctuation_list: if i == "." and (prev == j): print("Found a punctuation") sentence_boundary = False # periods followed by a whitespace followed by a lower case letter are not sentence boundaries if (i == "." and prev == " " and c.islower()): print("This is a lower letter") sentence_boundary = False # periods internal to a sequence of letters with no adjacent whitespace are not sentence boundaries if i == '.' and prev.islower() and c.islower(): print("This is a period within a sentence") sentence_boundary = False # periods followed by a whitespace and then an uppercase letter, but preceded by any of a short list of titles are not sentence boundaries if c == '.' and prev.islower() and i.isupper(): print("This is a title") sentence_boundary = False index = line.index(i) parse(line[index+1:]) if __name__ == "__main__": parse(line)

我认为你的代码很难理解
prev
通常是“previous”的缩写，因此将其与“next”一起使用对我来说毫无意义
在递归调用之间保持额外状态（如索引）的常用方法是将其作为额外参数传递。您可以使用默认值
0

def parse(line, index=0): #function ... parse(line, index+1)