在Python中使用正则表达式搜索并返回值_Python_Regex

在Python中使用正则表达式搜索并返回值

python regex

在Python中使用正则表达式搜索并返回值,python,regex,Python,Regex,我正试图编写一个程序来扫描视频，找到音频和字幕可用的语言，然后使用这些结果进行输入目前，我正在使用以下内容生成输出： with open('output.txt', 'wt') as output_f: p = subprocess.Popen(command, stdout=output_f, stderr=output_f) 这是我需要的扫描文本 + audio tracks: + 1, Japanese (aac) (2.0 ch) (iso639-2: jpn)

我正试图编写一个程序来扫描视频，找到音频和字幕可用的语言，然后使用这些结果进行输入

目前，我正在使用以下内容生成输出：

with open('output.txt', 'wt') as output_f:
    p = subprocess.Popen(command, stdout=output_f, stderr=output_f)

这是我需要的扫描文本

  + audio tracks:
    + 1, Japanese (aac) (2.0 ch) (iso639-2: jpn)
  + subtitle tracks:
    + 1, English (iso639-2: eng) (Text)(SSA)

所以我需要找出日语前面的数字，但只有在“音轨”之后

同样，我需要找出英语前面的数字，但只有在它出现在“字幕曲目”之后

我很确定我需要使用正则表达式来实现这一点，但我不知道从哪里开始。

这将起作用（与.findall（）一起使用）：

（？这将起作用（与.findall（）一起使用）：
（？您可以这样做：
>>> import re
>>> audio_regex = re.compile(r'\+ audio tracks:\n\s*\+ (?P<number>\d+), (?P<lang>\w+)')
>>> subtitle_regex = re.compile(r'\+ subtitle tracks:\n\s*\+ (?P<number>\d+), (?P<lang>\w+)')
>>> text = '''
...   + audio tracks:
...     + 1, Japanese (aac) (2.0 ch) (iso639-2: jpn)
...   + subtitle tracks:
...     + 1, English (iso639-2: eng) (Text)(SSA)
... '''
>>> match = audio_regex.search(text)  #find the first match
>>> match.group('number')
'1'
>>> match.group('lang')
'Japanese'
>>> audio_regex.findall(text)   #find all matches
[('1', 'Japanese')]
>>> subtitle_regex.findall(text)
[('1', 'English')]

>>重新导入
>>>audio\u regex=re.compile（r'\+音频曲目：\n\s*\+（？P\d+），（？P\w+）
>>>subtitle\u regex=re.compile（r'\+字幕曲目：\n\s*\+（？P\d+），（？P\w+）
>>>文本='''
…+音频曲目：
…+1，日语（aac）（2.0 ch）（iso639-2:jpn）
…+字幕曲目：
英语（iso639-2:eng）（文本）（SSA）
... '''
>>>match=audio_regex.search（text）#查找第一个匹配项
>>>匹配。组（'编号'）
'1'
>>>match.group（'lang'）
“日本人”
>>>音频_regex.findall（文本）#查找所有匹配项
[（‘1’，‘日语’）]
>>>副标题_regex.findall（文本）
[（'1'，'English'）]

根据文件的格式，调整上述正则表达式，使其更灵活（例如，如果您可以使用更多的空间，而不是单个空间，则可以使用\s+
替换这些空间，以匹配一个或多个空间。
您可以执行以下操作：
>>> import re
>>> audio_regex = re.compile(r'\+ audio tracks:\n\s*\+ (?P<number>\d+), (?P<lang>\w+)')
>>> subtitle_regex = re.compile(r'\+ subtitle tracks:\n\s*\+ (?P<number>\d+), (?P<lang>\w+)')
>>> text = '''
...   + audio tracks:
...     + 1, Japanese (aac) (2.0 ch) (iso639-2: jpn)
...   + subtitle tracks:
...     + 1, English (iso639-2: eng) (Text)(SSA)
... '''
>>> match = audio_regex.search(text)  #find the first match
>>> match.group('number')
'1'
>>> match.group('lang')
'Japanese'
>>> audio_regex.findall(text)   #find all matches
[('1', 'Japanese')]
>>> subtitle_regex.findall(text)
[('1', 'English')]

>>重新导入
>>>audio\u regex=re.compile（r'\+音频曲目：\n\s*\+（？P\d+），（？P\w+）
>>>subtitle\u regex=re.compile（r'\+字幕曲目：\n\s*\+（？P\d+），（？P\w+）
>>>文本='''
…+音频曲目：
…+1，日语（aac）（2.0 ch）（iso639-2:jpn）
…+字幕曲目：
英语（iso639-2:eng）（文本）（SSA）
... '''
>>>match=audio_regex.search（text）#查找第一个匹配项
>>>匹配。组（'编号'）
'1'
>>>match.group（'lang'）
“日本人”
>>>音频_regex.findall（文本）#查找所有匹配项
[（‘1’，‘日语’）]
>>>副标题_regex.findall（文本）
[（'1'，'English'）]

根据文件的格式，调整上面的正则表达式，使其更加灵活（例如，如果您可以使用更多的空间，而不是单个空间，则可以使用\s+
替换这些空间，以匹配一个或多个空间。
您在这里并不真正需要正则表达式-无论如何，对我来说，使用其中一个似乎太复杂了
下面是一些常规解析：
with open('output.txt', 'wt') as output_f:
    parseTracks = False
    lines = tuple(output_f)
    for line in lines:
        if 'audio tracks' in line:
            parseTracks = True
        if parseTracks:
            if 'Japanese' in line:
                theNumber = int(''.join([char for char in line if char in '1234567890']))

字幕也是一样。
在这里你真的不需要正则表达式——不管怎样，对我来说使用其中一个似乎太复杂了
下面是一些常规解析：
with open('output.txt', 'wt') as output_f:
    parseTracks = False
    lines = tuple(output_f)
    for line in lines:
        if 'audio tracks' in line:
            parseTracks = True
        if parseTracks:
            if 'Japanese' in line:
                theNumber = int(''.join([char for char in line if char in '1234567890']))

字幕也是一样。
为什么要调用子流程
呢？你需要分两步来完成：用正则表达式挑出显示音频/视频曲目的文本部分，然后对较小的文本部分进行第二次遍历以提取信息。日语和英语只是示例，对吗？你实际上想找到文本前面的数字语言，但在音频曲目之后：
和字幕曲目：
。这应该不是问题，您只需查找音频曲目
或字幕曲目
或使用一些组。由于我执行命令的方式，调用子进程。不，我需要日语音频（或根据情况有时未定义）我需要英文字幕。问题源于某些视频上有双音频和多个字幕的问题。为什么要调用子进程
呢？你需要分两步完成：用正则表达式挑出显示音频/视频曲目的文本部分，然后对文本的较小部分进行第二次传递以提取信息。日语英语只是一个例子，对吗？实际上，您希望在语言前面找到数字，但在音频曲目：
和字幕曲目：
之后。这应该不是问题，您只需查找音频曲目
或字幕曲目
或使用一些组即可。调用子进程是因为他解释了我执行命令的方式。不，我需要日语作为音频（有时是未定义的），我需要英语字幕。问题源于某些视频上存在双音频和多字幕的问题。将“123456789”中的char
替换为char.isdigit（）
此外，您将获取太多的数字，因此它仍然是错误的。因此，当我运行此代码时，会出现以下错误：lines=tuple（output\f）io.UnsupportedOperation:not readableReplacechar in'123456789'
withchar.isdigit（）
此外，您将获取太多的数字，因此它仍然是错误的。因此，当我运行此代码时，会出现以下错误：lines=tuple（output\f）io。不支持操作：不可读