Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/313.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
从Pattern1到Pattern2检索文本-Python_Python_Regex_Python 2.7 - Fatal编程技术网

从Pattern1到Pattern2检索文本-Python

从Pattern1到Pattern2检索文本-Python,python,regex,python-2.7,Python,Regex,Python 2.7,我有一个输入文件如下 PATTERN1 PTR1 blah blah blah needThis blah blah blah thisOneAsWell blah blah blah PATTERN2 PATTERN1 PTR2 blah blah blah needThis blah blah blah thisOneAsWell blah blah blah PATTERN2 ............................ .....................

我有一个输入文件如下

PATTERN1 PTR1 blah blah blah
needThis  blah blah blah
thisOneAsWell  blah blah blah
PATTERN2

PATTERN1 PTR2 blah blah blah
needThis  blah blah blah
thisOneAsWell  blah blah blah
PATTERN2 

............................
............................

PATTERN1  PTRN blah blah
needThis  blah blah blah
thisOneAsWell blah blah blah
PATTERN2
我需要我的函数只返回PATTERN1到PATTERN2的第一列条目,如下所示

PTR1
needThis thisOneAsWell

PTR2
needThis thisOneAsWell

......................
......................
PTRN
needThis thisOneAsWell
PTR1,PTR2。。。。。。PTRN是每个不同的文本。模式1和模式2不同,但始终存在于文件中

如何在Python中实现这一点

我仍然是Python的初学者,我正在尝试使用re.findall()来实现此目的,但没有获得所需的o/p:

def retrieve():
    file = open("fileName","r")
    string = re.findall(r"PATTERN1",file.read())
    print string

您可以嵌套两个正则表达式:

txt='''\
PATTERN1 PTR1 blah blah blah
needThis1  blah blah blah
thisOneAsWell1  blah blah blah
PATTERN2

PATTERN1 PTR2 blah blah blah
needThis2  blah blah blah
thisOneAsWell2  blah blah blah
PATTERN2 

............................
............................

PATTERN1  PTRN blah blah
needThisN  blah blah blah
thisOneAsWellN blah blah blah
PATTERN2'''

import re

for m in re.finditer(r'^PATTERN1\s*(.*?)(?=^PATTERN2)', txt, re.M | re.S):
    print re.findall(r'(^\w+)', m.group(1), re.M)
印刷品:

['PTR1', 'needThis1', 'thisOneAsWell1']
['PTR2', 'needThis2', 'thisOneAsWell2']
['PTRN', 'needThisN', 'thisOneAsWellN']

编辑1

如果您使用的文件很容易放入内存:

with open(fn) as f:
    txt=f.read()
    for m in re.finditer(r'^PATTERN1\s*(.*?)(?=^PATTERN2)', txt, re.M | re.S):
        print re.findall(r'(^\w+)', m.group(1), re.M)
用于不容易放入内存的较大文件


编辑2

将结果合并成字符串后,只需将结果附加到列表中:

with open(fn) as f:
    results=[]
    txt=f.read()
    for m in re.finditer(r'^PATTERN1\s*(.*?)(?=^PATTERN2)', txt, re.M | re.S):
        results.append('\n'.join(re.findall(r'(^\w+)', m.group(1), re.M))
    print '\n===\n'.join(results)

谢谢,但是您的函数返回模式1和模式2之间的所有文本。谢谢,但是我的输入文本可能会有所不同,因此我必须使用file=open(),您可以对file open执行相同的操作。只需将文件内容读入字符串即可。我刚刚使用了
txt
字符串作为示例。谢谢,它成功了!最后一个问题,我想返回最终输出。以列表或字符串形式返回匹配表达式的最佳方式是什么?请说明返回最终输出是什么意思?我需要将其返回给调用方,而不是print re.findall()。
with open(fn) as f:
    results=[]
    txt=f.read()
    for m in re.finditer(r'^PATTERN1\s*(.*?)(?=^PATTERN2)', txt, re.M | re.S):
        results.append('\n'.join(re.findall(r'(^\w+)', m.group(1), re.M))
    print '\n===\n'.join(results)