一旦我';我已经用Python打印了搜索到的部分文本了吗?

一旦我';我已经用Python打印了搜索到的部分文本了吗?,python,search,text,printing,Python,Search,Text,Printing,我想搜索一个文本文件,并打印出一行及其后续3行,如果在该行中找到一个关键字,并在后续3行中找到一个不同的关键字 我的代码现在打印的信息太多了。一旦一部分已经打印出来,是否有办法前进到下一部分 text = """ here is some text 1 I want to print out this line and the following 3 lines only once keyword 2 print this line since it has a keyword2 3 prin

我想搜索一个文本文件,并打印出一行及其后续3行,如果在该行中找到一个关键字,并在后续3行中找到一个不同的关键字

我的代码现在打印的信息太多了。一旦一部分已经打印出来,是否有办法前进到下一部分

text = """

here is some text 1
I want to print out this line and the following 3 lines only once keyword 2
print this line since it has a keyword2 3
print this line keyword 4
print this line 5
I don't want to print this line but I want to start looking for more text starting at this line 6
Don't print this line 7
Not this line either 8
I want to print out this line again and the following 3 lines only once keyword 9
please print this line keyword 10
please print this line it has the keyword2 11
please print this line 12
Don't print this line 13
Start again searching here 14
etc.
"""

text2 = open("tmp.txt","w")
text2.write(text)
text2.close()

searchlines = open("tmp.txt").readlines()

data = []

for m, line in enumerate(searchlines):
    line = line.lower()
    if "keyword" in line and any("keyword2" in l.lower() for l in searchlines[m:m+4]):
        for line2 in searchlines[m:m+4]:
            data.append(line2)
print ''.join(data)
现在的输出是:

I want to print out this line and the following 3 lines only once keyword 2
print this line since it has a keyword2 3
print this line keyword 4
print this line 5
print this line since it has a keyword2 3
print this line keyword 4
print this line 5
I don't want to print this line but I want to start looking for more text starting at this line 6
I want to print out this line again and the following 3 lines only once keyword 9
please print this line keyword 10
please print this line it has the keyword2 11
please print this line 12
please print this line keyword 10
please print this line it has the keyword2 11
please print this line 12
Don't print this line 13
please print this line it has the keyword2 11
please print this line 12
Don't print this line 13
Start again searching here 14
我只希望打印出:

I want to print out this line and the following 3 lines only once keyword 2
print this line since it has a keyword2 3
print this line keyword 4
print this line 5
I want to print out this line again and the following 3 lines only once keyword 9
please print this line keyword 10
please print this line it has the keyword2 11
please print this line 12

那么您想打印出包含2个以上关键字的4行的所有块

不管怎样,这就是我刚刚想到的。也许你可以使用它:

text = """

here is some text 1
I want to print out this line and the following 3 lines only once keyword 2
print this line since it has a keyword2 3
print this line keyword 4
print this line 5
I don't want to print this line but I want to start looking for more text starting at this line 6
Don't print this line 7
Not this line either 8
I want to print out this line again and the following 3 lines only once keyword 9
please print this line keyword 10
please print this line it has the keyword2 11
please print this line 12
Don't print this line 13
Start again searching here 14
etc.
""".splitlines()

keywords = ['keyword', 'keyword2']

buffer, kw = [], set()
for line in text:
    if len(buffer) == 0:                 # first line of a block
        for k in keywords:
            if k in line:
                kw.add(k)
                buffer.append(line)
                continue
    else:                                # continuous lines
        buffer.append(line)
        for k in keywords:
            if k in line:
                kw.add(k)
        if len(buffer) > 3:
            if len(kw) >= 2:             # just print blocks with enough keywords
                print '\n'.join(buffer)
            buffer, kw = [], set()

您的关键字重叠:“关键字”是“关键字2”的子集

此外,您的数据意味着您不想看到第13行,但根据问题陈述,应该将其打印出来

我将您的第一个关键字从“关键字”更改为“第一个关键字”,就像这样,您的代码可以正常工作(第13行除外)

$diff/tmp/q/tmp/q2
4c4
<我只想打印这一行和以下三行一次关键字2
---
>我只想把这一行和下面三行打印一次firstkey 2
6c6
<打印此行关键字4
---
>第一个键4打印这行
11,12c11,12
<我想再次打印这一行,并且只打印一次以下3行关键字9
<请打印这行关键字10
---
>我想再次打印这一行,并且只打印一次下面的3行
>请先打印这一行键10
30c30
<如果第行中有“关键字”,则在搜索行[m:m+4]中有“l.lower()中的关键字2”):
---
>如果第行中有“firstkey”和任意(“l.lower()中的关键字2”表示搜索行中的l[m:m+4]):

因此,正如其他人指出的,您的第一个关键字
关键字
是第二个关键字
关键字2
的子字符串。因此,我使用regexp对象实现了这一点,因此您可以使用单词边界锚
\b

import re
from StringIO import StringIO

text = """

here is some text 1
I want to print out this line and the following 3 lines only once keyword 2
print this line since it has a keyword2 3
print this line keyword 4
print this line 5
I don't want to print this line but I want to start looking for more text starting at this line 6
Don't print this line 7
Not this line either 8
I want to print out this line again and the following 3 lines only once keyword 9
please print this line keyword 10
please print this line it has the keyword2 11
please print this line 12
Don't print this line 13
Start again searching here 14
etc.
"""


def my_scan(data,search1,search2):
  buffer = []
  for line in data:
    buffer.append(line)
    if len(buffer) > 4:
      buffer.pop(0)
    if len(buffer) == 4: # Valid search block
      if search1.search(buffer[0]) and search2.search("\n".join(buffer[1:3])):
        for item in buffer:
          yield item
        buffer = []

# First search term
s1 = re.compile(r'\bkeyword\b')
s2 = re.compile(r'\bkeyword2\b')

for row in my_scan(StringIO(text),s1,s2):
  print row.rstrip()
产生:

I want to print out this line and the following 3 lines only once keyword 2
print this line since it has a keyword2 3
print this line keyword 4
print this line 5
I want to print out this line again and the following 3 lines only once keyword 9
please print this line keyword 10
please print this line it has the keyword2 11
please print this line 12

首先,您可以这样更正代码:

text = """
0//
1// here is some text 1
A2// I want to print out this line and the following 3 lines only once keyword 2
b3// print this line since it has a keyword2 3
b4// print this line keyword 4
b5// print this line 5
6// I don't want to print this line but I want to start looking for more text starting at this line 6
7// Don't print this line 7
8// Not this line either 8
A9// I want to print out this line again and the following 3 lines only once keyword 9
b10// please print this line keyword 10
b11// please print this line it has the keyword2 11
b12// please print this line 12
13// Don't print this line 13
14// Start again searching here 14
15// etc.
"""
searchlines = map(str.lower,text.splitlines(1))
# splitlines(1) with argument 1 keeps the newlines

data,again = [],-1

for m, line in enumerate(searchlines):
    if "keyword" in line and m>again and "keyword2" in ''.join(searchlines[m:m+4]):
        data.extend(searchlines[m:m+4])
        again = m+4

print ''.join(data)

其次,需要一个简短的正则表达式解决方案

text = """
0//
1// here is some text 1
A2// I want to print out this line and the following 3 lines only once keyword 2
b3// print this line since it has a keyword2 3
b4// print this line keyword 4
b5// print this line 5
6// I don't want to print this line but I want to start looking for more text starting at this line 6
7// Don't print this line 7
8// Not this line either 8
A9// I want to print out this line again and the following 3 lines only once keyword 9
b10// please print this line keyword 10
b11// please print this line it has the keyword2 11
b12// please print this line 12
13// Don't print this line 13
14// Start again searching here 14
15// etc.
"""

import re

regx = re.compile('(^.*?(?<=[ \t]){0}(?=[ \t]).*\r?\n'
                  '.*?((?<=[ \t]){1}(?=[ \t]))?.*\r?\n'
                  '.*?((?<=[ \t]){1}(?=[ \t]))?.*\r?\n'
                  '.*?(?(1)|(?(2)|{1})).*)'.\
                  format('keyword','keyword2'),re.MULTILINE|re.IGNORECASE)

print '\n'.join(m.group(1) for m in regx.finditer(text))

我还假设,在与第二个搜索词匹配的三行块中,您没有寻找潜在的“第一行匹配”。
text = """
0//
1// here is some text 1
A2// I want to print out this line and the following 3 lines only once keyword 2
b3// print this line since it has a keyword2 3
b4// print this line keyword 4
b5// print this line 5
6// I don't want to print this line but I want to start looking for more text starting at this line 6
7// Don't print this line 7
8// Not this line either 8
A9// I want to print out this line again and the following 3 lines only once keyword 9
b10// please print this line keyword 10
b11// please print this line it has the keyword2 11
b12// please print this line 12
13// Don't print this line 13
14// Start again searching here 14
15// etc.
"""

import re

regx = re.compile('(^.*?(?<=[ \t]){0}(?=[ \t]).*\r?\n'
                  '.*?((?<=[ \t]){1}(?=[ \t]))?.*\r?\n'
                  '.*?((?<=[ \t]){1}(?=[ \t]))?.*\r?\n'
                  '.*?(?(1)|(?(2)|{1})).*)'.\
                  format('keyword','keyword2'),re.MULTILINE|re.IGNORECASE)

print '\n'.join(m.group(1) for m in regx.finditer(text))
A2// I want to print out this line and the following 3 lines only once keyword 2
b3// print this line since it has a keyword2 3
b4// print this line keyword 4
b5// print this line 5
b10// please print this line keyword 10
b11// please print this line it has the keyword2 11
b12// please print this line 12
13// Don't print this line 13