Python:每三个单词拆分一个字符串
我已经四处寻找了一段时间,但我似乎找不到这个小问题的答案 我有一段代码,应该在每三个单词后拆分字符串:Python:每三个单词拆分一个字符串,python,regex,python-3.x,Python,Regex,Python 3.x,我已经四处寻找了一段时间,但我似乎找不到这个小问题的答案 我有一段代码,应该在每三个单词后拆分字符串: import re def splitTextToTriplet(Text): x = re.split('^((?:\S+\s+){2}\S+).*',Text) return x print(splitTextToTriplet("Do you know how to sing")) 目前,输出如下: ['', 'Do you know', ''] 但我实际上期待
import re
def splitTextToTriplet(Text):
x = re.split('^((?:\S+\s+){2}\S+).*',Text)
return x
print(splitTextToTriplet("Do you know how to sing"))
目前,输出如下:
['', 'Do you know', '']
但我实际上期待着这样的结果:
['Do you know', 'how to sing']
如果我打印(splitTextToTriplet(“你知道怎么做吗”)),它还应该输出:
['Do you know', 'how to']
如何更改正则表达式,使其产生预期的输出?我认为
re.split
可能不是最好的方法,因为look-behind不能采用可变长度的模式
相反,您可以使用str.split
,然后将单词重新连接在一起
def splitTextToTriplet(string):
words = string.split()
grouped_words = [' '.join(words[i: i + 3]) for i in range(0, len(words), 3)]
return grouped_words
splitTextToTriplet("Do you know how to sing")
# ['Do you know', 'how to sing']
splitTextToTriplet("Do you know how to")
# ['Do you know', 'how to']
尽管建议使用此解决方案,但如果您的一些空白是换行符,则该信息将在过程中丢失。我使用
re.findall
作为您期望的输出。为了获得更通用的拆分函数,我将splitTextonWords
上的splitTextToTriplet
替换为numberOfWords
作为参数:
import re
def splitTextonWords(Text, numberOfWords=1):
if (numberOfWords > 1):
text = Text.lstrip()
pattern = '(?:\S+\s*){1,'+str(numberOfWords-1)+'}\S+(?!=\s*)'
x =re.findall(pattern,text)
elif (numberOfWords == 1):
x = Text.split()
else:
x = None
return x
print(splitTextonWords("Do you know how to sing", 3))
print(splitTextonWords("Do you know how to", 3))
print(splitTextonWords("Do you know how to sing how to dance how to", 3))
print(splitTextonWords("A sentence this code will fail at", 3))
print(splitTextonWords("A sentence this code will fail at ", 3))
print(splitTextonWords(" A sentence this code will fail at s", 3))
print(splitTextonWords(" A sentence this code will fail at s", 4))
print(splitTextonWords(" A sentence this code will fail at s", 2))
print(splitTextonWords(" A sentence this code will fail at s", 1))
print(splitTextonWords(" A sentence this code will fail at s", 0))
输出:
[“你知道吗”,“怎么唱歌”][“你知道吗”,“怎么做”]
[“你知道吗”,“怎么唱歌”,“怎么跳舞”,“怎么跳舞”]
[“一句话”,“此代码将失败”,“在”]
[“一句话”,“此代码将失败”,“在”]
[“一句话”,“此代码将失败”,“在s']
[“此代码的一句话”,“将在s'处失败]
[‘一句话’、‘此代码’、‘将失败’、‘在s’]
['A'、'句'、'this'、'code'、'will'、'fail'、'at'、's']
无
使用
grouper
:
另请参见为您实现此配方的第三方库
代码
def split_text_to_triplet(s):
"""Return strings of three words."""
return [" ".join(c) for c in grouper(3, s.split())]
split_text_to_triplet("Do you know how to sing")
# ['Do you know', 'how to sing']
解决方案必须是正则表达式吗?除了在第三个单词后将字符串一分为二之外,还需要其他逻辑吗?我同意@thesilkworm。有可能比使用正则表达式更容易做到这一点。regex是一项要求吗?不是。但是如果你有其他的建议,我也同意。XDuse
re.findall
我解决了这个问题,我想接受两个答案,但我只能按一下那个按钮。我想我要编辑标题,这样所有人都能找到它。
def split_text_to_triplet(s):
"""Return strings of three words."""
return [" ".join(c) for c in grouper(3, s.split())]
split_text_to_triplet("Do you know how to sing")
# ['Do you know', 'how to sing']