第一个字符出现后的Python正则表达式匹配组
第一次使用Python正则表达式,我只需要一点关于匹配字符串的提示 我有这样一个url:第一个字符出现后的Python正则表达式匹配组,python,regex,Python,Regex,第一次使用Python正则表达式,我只需要一点关于匹配字符串的提示 我有这样一个url:url="https://www.youtube.com/api/timedtext?xorp=True&xoaf=1&v=UloIw7dhnlQ&signature=C2AF3C2887A37043353A86AAAACFA796659B56CB.E736B7146447843F2D3311234744DC0D9937AF7B&asr_langs=fr%2Cru%2Ces%2Cnl%2Cit%2Cde%2C
url="https://www.youtube.com/api/timedtext?xorp=True&xoaf=1&v=UloIw7dhnlQ&signature=C2AF3C2887A37043353A86AAAACFA796659B56CB.E736B7146447843F2D3311234744DC0D9937AF7B&asr_langs=fr%2Cru%2Ces%2Cnl%2Cit%2Cde%2Cko%2Cen%2Cpt%2Cja&sparams=asr_langs%2Ccaps%2Cv%2Cxoaf%2Cxorp%2Cexpire&expire=1541769991&key=yttt1hl=&encaps=asrlang=enfmt=srv3“
除了以expire=1541769991
开头的部分(从第二行到最后一行),我正在尝试匹配所有内容。这就是我想到的:
matchebj=re.match(r'(.*)expire=(.*)和(.*),url)
问题是第三组包括上次出现的
&
之后的文本。我希望在过期=
之后第一次出现的&
之后的文本。我尝试在&
之后添加一个?
以使其不贪婪。我该如何做呢?您可以这样做:
import re
url = "https://www.youtube.com/api/timedtext?xorp=True&xoaf=1&v=UloIw7dhnlQ&signature=C2AF3C2887A37043353A86AAAACFA796659B56CB.E736B7146447843F2D3311234744DC0D9937AF7B&asr_langs=fr%2Cru%2Ces%2Cnl%2Cit%2Cde%2Cko%2Cen%2Cpt%2Cja&sparams=asr_langs%2Ccaps%2Cv%2Cxoaf%2Cxorp%2Cexpire&expire=1541769991&key=yttt1hl=&encaps=asrlang=enfmt=srv3"
match = re.match("(.+?)(expire=.+?&)(.+$)", url)
print(match.group(1) + match.group(3))
输出
https://www.youtube.com/api/timedtext?xorp=True&xoaf=1&v=UloIw7dhnlQ&signature=C2AF3C2887A37043353A86AAAACFA796659B56CB.E736B7146447843F2D3311234744DC0D9937AF7B&asr_langs=fr%2Cru%2Ces%2Cnl%2Cit%2Cde%2Cko%2Cen%2Cpt%2Cja&sparams=asr_langs%2Ccaps%2Cv%2Cxoaf%2Cxorp%2Cexpire&key=yttt1hl=&encaps=asrlang=enfmt=srv3
或者,如果您只是希望文本不带expire=
,您可以删除它:
result = re.sub("expire=\d+?&", "", url)
注意,假设expire的值都是数字。您可以执行以下操作:
import re
url = "https://www.youtube.com/api/timedtext?xorp=True&xoaf=1&v=UloIw7dhnlQ&signature=C2AF3C2887A37043353A86AAAACFA796659B56CB.E736B7146447843F2D3311234744DC0D9937AF7B&asr_langs=fr%2Cru%2Ces%2Cnl%2Cit%2Cde%2Cko%2Cen%2Cpt%2Cja&sparams=asr_langs%2Ccaps%2Cv%2Cxoaf%2Cxorp%2Cexpire&expire=1541769991&key=yttt1hl=&encaps=asrlang=enfmt=srv3"
match = re.match("(.+?)(expire=.+?&)(.+$)", url)
print(match.group(1) + match.group(3))
输出
https://www.youtube.com/api/timedtext?xorp=True&xoaf=1&v=UloIw7dhnlQ&signature=C2AF3C2887A37043353A86AAAACFA796659B56CB.E736B7146447843F2D3311234744DC0D9937AF7B&asr_langs=fr%2Cru%2Ces%2Cnl%2Cit%2Cde%2Cko%2Cen%2Cpt%2Cja&sparams=asr_langs%2Ccaps%2Cv%2Cxoaf%2Cxorp%2Cexpire&key=yttt1hl=&encaps=asrlang=enfmt=srv3
或者,如果您只是希望文本不带expire=
,您可以删除它:
result = re.sub("expire=\d+?&", "", url)
注意,假设expire的值都是数字。试试这个正则表达式
matchObj = re.match( r"(.*)expire=[^&]*(&.*)", url)
试试这个正则表达式
matchObj = re.match( r"(.*)expire=[^&]*(&.*)", url)