Python 基于半一致特征拆分字符串

Python 基于半一致特征拆分字符串,python,regex,Python,Regex,我有一个代表成绩单的文本文件。我需要找到一种方法来拆分这些内容,这样我就有了一个字符串列表来代表每个人所说的内容。所以这个, mystr = '''Bob: Hello there, how are you? Alice: I am fine how are you?''' 变成这样 mylist= ['Bob: Hello there, how are you?','Alice: I am fine how are you?'] 我对正则表达式不熟悉,但认识到这可

我有一个代表成绩单的文本文件。我需要找到一种方法来拆分这些内容,这样我就有了一个字符串列表来代表每个人所说的内容。所以这个,

mystr = '''Bob: Hello there, how are you? 

           Alice: I am fine how are you?'''
变成这样

mylist= ['Bob: Hello there, how are you?','Alice: I am fine how are you?']
我对正则表达式不熟悉,但认识到这可能是一条可行的道路。问题是,我想在姓名不同的情况下(例如,约翰、保罗、乔治、林戈等)重复这一点。保持一致的是出现一个单词(代表说话人),后跟冒号,后跟空格

re.findall(r"\S[^:]+.*", mystr)
#-> ['Bob: Hello there, how are you? ', 'Alice: I am fine how are you?']

如果冒号不在那里,那么这个正则表达式应该优先于前一个正则表达式

mystr = '''Bob Hello there, how are you? 

           Alice: I am fine how are you?'''
[_.group(0).strip() for _ in re.finditer(r"\w{1,}:+.*", mystr)]
#['Alice: I am fine how are you?']
mystr = '''Bob Hello there, how are you? 

           Alice: I am fine how are you?'''
[_.group(0).strip() for _ in re.finditer(r"\w{1,}:+.*", mystr)]
#['Alice: I am fine how are you?']