匹配python中可能最长的字符集
有一个字符为0123456789AB的字符串。 我有一个regexp:匹配python中可能最长的字符集,python,regex,Python,Regex,有一个字符为0123456789AB的字符串。 我有一个regexp: ([^1368A]+|[^2479B]+|[^0358A]+|[^1469B]+|[^0257A]+|[^1368B]+|[^02479]+|[^1358A]+|[^2469B]+|[^0357A]+|[^1468B]+|[^02579]+) 问题是它首先匹配,而不是最长。如何使其与python中最长的匹配?我不希望这在regexp中是可能的。 编辑:我需要找到所有匹配项。最好具有成功模式的索引。 输入示例: 66666A
([^1368A]+|[^2479B]+|[^0358A]+|[^1469B]+|[^0257A]+|[^1368B]+|[^02479]+|[^1358A]+|[^2469B]+|[^0357A]+|[^1468B]+|[^02579]+)
问题是它首先匹配,而不是最长。如何使其与python中最长的匹配?我不希望这在regexp中是可能的。
编辑:我需要找到所有匹配项。最好具有成功模式的索引。
输入示例:
66666A00666160666106606610666610A60661606661606066160660616A00666160666160606610666610A60661606661066066160660616A00666160666160606616066610A60661606661606066106660616A00666106666160606616066610A60661066661606066160660616A0000000000000666606A100666160666160606616066616060661606661606066106666106606610666616060661606661606066106666160606610666610660661066661606066106666160606610666616060661606661606066106666160606616066616060661606661066066160666160606616066610660661606661066066160666106606616066616060661606661066066160666106606616066616060661606661606066160666160606616066610660661066661606066106666160606610666616060661066661606066160666160606616066616060666066616060666066616060666066616060666066616060666066660606666A
另一个例子:
027027240270272402702724027027240270272402702724027027240270272402702724027027240270272402702724027027240270272402702724027027240270272402702724027027240270272402702724027027240270272402702724027027240270272402702724027027240270272402702724027027240270272402702724027027240270272402702724027027240270272402702724027027240270272427BB232B0738310A5320738310A53202735A8310A53202735A8310A53202735A8310A53202735A8310A532249A540249A540249A540249A540792A54002402702724792A540
输出示例:
'470470574704705747047057570570574704705727027B5747047057570570574704705727027B5747047057570570574704705727027B5747047057570570574704705727027B5747047057570570574704705727027B5747047057570570574704705727027B5747047057570570574704705727027B5747047057570570574704705727027B5747047057570570574704705727027B57470470574704705747047057B2727875377AA0577AA0577AA0577AA0577AA0577AA059959959959952257777225'
('1368A','470470574704705747047057570570574704705727027B5747047057570570574704705727027B5747047057570570574704705727027B5747047057570570574704705727027B5747047057570570574704705727027B5747047057570570574704705727027B5747047057570570574704705727027B5747047057570570574704705727027B5747047057570570574704705727027B57470470574704705747047057B2727'),('','8'),('1468B','75377AA0577AA0577AA0577AA0577AA0577AA059959959959952257777225')
新增:目前我使用此代码:
import sys,re
from midplay import MidiFile,NoteOn
from collections import deque
notes=("C","C#","D","Eb","E","F","F#","G","G#","A","Bb","B")
noteshex=('0','1','2','3','4','5','6','7','8','9','A','B')
major=lambda x:((x)%12,(x+2)%12,(x+4)%12,(x+5)%12,(x+7)%12,(x+9)%12,(x+11)%12,)
minor=lambda x:((x)%12,(x+2)%12,(x+3)%12,(x+5)%12,(x+7)%12,(x+8)%12,(x+10)%12,)
nomajor=lambda x:{(x+1)%12,(x+3)%12,(x+6)%12,(x+8)%12,(x+10)%12}
nominor=lambda x:{(x+1)%12,(x+4)%12,(x+6)%12,(x+9)%12,(x+11)%12}
nomajortonelist=[re.compile('([^'+''.join([noteshex[note] for note in nomajor(tonality)])+']+)') for tonality in range(12)]
nominortonelist=nomajortonelist[3:]+nomajortonelist[:3]
if len(sys.argv)!=2:
sys.exit('usage: py tonalitydetect.py [C:\path]filename.mid')
midi=MidiFile(sys.argv[1])
for num, track in enumerate(midi):
print('Track:',num,'messages:',len(track))
channelnotes=['','','','','','','','','','','','','','','','']
channeltonality=[deque(),deque(),deque(),deque(),deque(),deque(),deque(),deque(),deque(),deque(),deque(),deque(),deque(),deque(),deque(),deque()]
for msg in track:
if isinstance(msg,NoteOn):
channelnotes[msg.channel]+=(noteshex[msg.note%12])
for chnum,channel in enumerate(channelnotes):
tomatch=[channel]
matches=[]
while ''.join(tomatch)!='':
curchanmaxmatch=deque()
for string in tomatch:
for exp in nomajortonelist:
curchanmaxmatch.append((exp,max(exp.findall(string)+[''], key=len)))
matches.append(max(curchanmaxmatch+deque([('','',)]), key=lambda x:len(x[1])))
newmatch=[]
found=0
for x in tomatch:
if not found:
match=x.split(matches[-1][1],1)
if len(match)>1:
found=1
newmatch.extend(match)
else:
newmatch.append(x)
tomatch=[x for x in newmatch if x!='']
matches=sorted(matches, key=lambda x:len(x[1]))
toseek=channel
while len(matches):
for num,match in enumerate(matches):
if not toseek.find(match[1]):
channeltonality[chnum].append(match)
toseek=toseek[len(match[1]):]
del matches[num]
break
for chnum,channel in enumerate(channeltonality):
print('Channel',chnum,':',[notes[nomajortonelist.index(x[0])]+' major, '+notes[nominortonelist.index(x[0])]+' minor' for x in channel])
编辑:有关显示最长匹配位置的解决方案,请参见下文。 解决您的问题的最接近的内置工具是:“返回字符串中模式的所有非重叠匹配,作为字符串列表。” 您案例中的一个问题是不同的匹配可能会重叠,但是
findall
只返回非重叠的匹配。例如,输入字符串2B001AA
包含两个不同的匹配项:2B00
和001AA
。re.findall
函数将查找并返回第一个匹配项2B00
。然后,它继续它停止的地方——只返回1AA
作为下一个匹配
您可以通过将regexp分解为多个片段以逐个匹配来解决此问题:
import re
patterns=[
r'[^1368A]+', r'[^2479B]+', r'[^0358A]+', r'[^1469B]+',
r'[^0257A]+', r'[^1368B]+', r'[^02479]+', r'[^1358A]+',
r'[^2469B]+', r'[^0357A]+', r'[^1468B]+', r'[^02579]+'
]
def match_patterns(string):
for pattern in patterns:
for match in re.findall(pattern,string):
yield match
函数match_pattern
返回所有匹配项(但不总是按顺序)。在python3中,您可以将此函数写得更短:
def match_patterns(string):
for pattern in patterns:
yield from re.findall(pattern,string)
在任何情况下,都可以使用内置函数提取最长匹配项max
:
def find_longest_match(string):
return max(match_patterns(string), key=len)
print(find_longest_match('12A34B32A43')) # prints: A34B3
如果您还需要最长匹配的位置,请与 :“返回一个迭代器,该迭代器生成字符串中RE模式的所有非重叠匹配项。”对于每个返回的
match
,match.start()
为我们提供开始位置和match.group(0)
匹配文本
import re
patterns=[
r'[^1368A]+', r'[^2479B]+', r'[^0358A]+', r'[^1469B]+',
r'[^0257A]+', r'[^1368B]+', r'[^02479]+', r'[^1358A]+',
r'[^2469B]+', r'[^0357A]+', r'[^1468B]+', r'[^02579]+'
]
def match_patterns(string):
for pattern in patterns:
yield from re.finditer(pattern, string)
def find_longest_match(string):
match=max(match_patterns(string), key=lambda m: len(m.group(0)))
if match:
return match.start(), match.group(0)
else:
return None
print(find_longest_match('12A34B32A43')) # prints: (2, 'A34B3')
在您的案例中,“最长”是什么?不清楚,请提供一个,在这种情况下应该明确包括输入和预期输出。当然不清楚!这就是为什么我要问。你的描述也不清楚,这就是为什么我评论你能提供这些例子的预期输出吗?这是否意味着一些字符可能会被省略?我在考虑用每个模式检查字符串,找到最长的模式,用最长的模式分割字符串,在分割的部分中找到最长的,直到没有字符重叠,然后找到每个字符。我在答案中添加了对重叠问题的解释。