Python 3.x 两个正则表达式的串联_Python 3.x_Regex

Python 3.x 两个正则表达式的串联

python-3.x regex

Python 3.x 两个正则表达式的串联,python-3.x,regex,Python 3.x,Regex,我已导出以下文本文件： 14:00:01 type1 "xyz" has no relationships... ಠ_ಠ 14:00:01 type2 "xyza" has no relationships... ಠ_ಠ 14:00:01 type2 "aaaa" has no relationships... ಠ_ಠ 14:00:01 type3 "asdg" has no relationships... ಠ_ಠ

我已导出以下文本文件：

14:00:01 type1 "xyz" has no relationships... ಠ_ಠ
14:00:01 type2 "xyza" has no relationships... ಠ_ಠ
14:00:01 type2 "aaaa" has no relationships... ಠ_ಠ
14:00:01 type3 "asdg" has no relationships... ಠ_ಠ
14:00:01 type4 "dhj" has no relationships... ಠ_ಠ

我正在设法从这个文件中检索两个信息

类型（在本例中，是双引号内时间之后和之前的元素）

双引号里面是什么

预期输出：

类型1 xyz

2型xyza

第2类aaaa

类型3 asdg

类型4 dhj

使用我当前的代码，我可以获取双引号内的内容，但我不知道如何获取类型并将其与我的正则表达式合并：

import os, yaml
import argparse
import re
with open('stackoverflow.txt') as f:
    content = f.readlines()
    matches=re.findall(r'\"(.+?)\"',str(content))#get the content within the double quote
for x in matches:
    print(x)

电流输出：

xyz

xyza

aaaa

asdg

dhj

使用

re.findall

：

with open('stackoverflow.txt') as f:
    content = f.readlines()
    matches = re.findall(r'\b\d{2}:\d{2}:\d{2} (\S+) "(.*?)"', content)
    print(matches)

对于您上面提供的数据，

匹配项将包含：
[('type1', 'xyz'), ('type2', 'xyza'), ('type2', 'aaaa'), ('type3', 'asdg'), ('type4', 'dhj')]

只需向正则表达式添加第二个组：
import re
with open('stackoverflow.txt') as f:
    content = f.readlines()
    matches = re.findall(r' (\S+) "(.+?)"',content)
for x in matches:
    print(x[0], x[1])

备注：考虑使用<代码> Re.FixIts/COD>并直接在F上进行迭代，而不是<代码> R.FUNDALL 和 F.RealDeals>代码>，因为它们使用迭代器代替列表。
 < P>如果您的TXT文件总是具有该结构，我将简单地做：
with open('stackoverflow.txt') as f:
    matches = [' '.join(line.split(' ')[1:3]) for line in f.readlines()]

for x in matches:
    print(x)

输出：
type1 "xyz"
type2 "xyza"
type2 "aaaa"
type3 "asdg"
type4 "dhj"

您可以使用（\w+）（[^“]*）”
（您可以检查）在每行上获取匹配项，然后根据需要格式化输出，方法是从每个匹配项中抓取两组（如果找到）：
重新导入
匹配项=[]
rx=重新编译（r'（\w+）（[^“]*）”）
将open（'stackoverflow.txt'）作为f：
对于f中的行：
m=接收搜索（行）
如果m：
matches.append（f'{m.group（1）}{m.group（2）}'））

见：
重新导入
file=r''14:00:01 type1“xyz”没有关系。。。ಠ_ಠ
14:00:01 type2“xyza”没有关系。。。ಠ_ಠ
14:00:01类型2“aaaa”没有关系。。。ಠ_ಠ
14:00:01类型3“asdg”没有关系。。。ಠ_ಠ
14:00:01类型4“dhj”没有关系。。。ಠ_ಠ'''
匹配项=[]
rx=重新编译（r'（\w+）（[^“]*）”）
对于文件.splitlines（）中的行：
m=接收搜索（行）
如果m：
matches.append（f'{m.group（1）}{m.group（2）}'））
打印（匹配）
#=>['type1 xyz'、'type2 xyza'、'type2 aaaa'、'type3 asdg'、'type4 dhj']
我理解对了吗？你每行只有一个匹配项？而且它总是一行中的第一个匹配项？@WiktorStribiżew我明白了，我只能从双眼中得到什么。所以，我的猜测是正确的。