Python 在正则表达式后添加新行

Python 在正则表达式后添加新行,python,regex,python-3.x,Python,Regex,Python 3.x,每次我的程序找到正则表达式时,我都要添加一行新行。我想保留正则表达式,只在它之后有一个新行开始。文本从.txt文件中读取。 我能够找到正则表达式,但是当我尝试添加新行时,它返回如下所示,在实际输出中。 我已经试着解决这个问题好几个小时了,希望能得到帮助 下面是一个简单的例子: 输入: STLB 1234 444 text text text STLB 8796 567 text text text STLB 1234 444text text text STLB 8796 567text t

每次我的程序找到正则表达式时,我都要添加一行新行。我想保留正则表达式,只在它之后有一个新行开始。文本从
.txt
文件中读取。 我能够找到正则表达式,但是当我尝试添加新行时,它返回如下所示,在实际输出中。 我已经试着解决这个问题好几个小时了,希望能得到帮助

下面是一个简单的例子:

输入:

STLB 1234 444 text text text
STLB 8796 567 text text text
STLB 1234 444text text text

STLB 8796 567text text text
STLB 1234 444
text text text
STLB 8796 567
text text text
(STLB.*\d\d\d) 

(STLB.*\d\d\d) 
stlb_match = re.compile('|'.join(['STLB.*\d\d\d']))

with open(in_file5, 'r', encoding='utf-8') as fin5, open(out_file5, 'w', encoding='utf-8') as fout5:
    lines = fin5.read().splitlines()

    for i, line in enumerate(lines):
        matchObj1 = re.match(start_rx, line)

        if not matchObj1:
            first_two_word = (" ".join(line.split()[:2]))

            if re.match(stlb_match,line):
                line =re.sub(r'(STLB.*\d\d\d)', r'(STLB.*\d\d\d)'+' \n', line)
            elif re.match(first_two_word, line):
                line = line.replace(first_two_word, "\n" + first_two_word)

        fout5.write(line)
STLB 1234 444
 text text text
在中编辑:

STLB 1234 444 text text text
STLB 8796 567 text text text
STLB 1234 444text text text

STLB 8796 567text text text
STLB 1234 444
text text text
STLB 8796 567
text text text
(STLB.*\d\d\d) 

(STLB.*\d\d\d) 
stlb_match = re.compile('|'.join(['STLB.*\d\d\d']))

with open(in_file5, 'r', encoding='utf-8') as fin5, open(out_file5, 'w', encoding='utf-8') as fout5:
    lines = fin5.read().splitlines()

    for i, line in enumerate(lines):
        matchObj1 = re.match(start_rx, line)

        if not matchObj1:
            first_two_word = (" ".join(line.split()[:2]))

            if re.match(stlb_match,line):
                line =re.sub(r'(STLB.*\d\d\d)', r'(STLB.*\d\d\d)'+' \n', line)
            elif re.match(first_two_word, line):
                line = line.replace(first_two_word, "\n" + first_two_word)

        fout5.write(line)
STLB 1234 444
 text text text
想要的输出:

STLB 1234 444 text text text
STLB 8796 567 text text text
STLB 1234 444text text text

STLB 8796 567text text text
STLB 1234 444
text text text
STLB 8796 567
text text text
(STLB.*\d\d\d) 

(STLB.*\d\d\d) 
stlb_match = re.compile('|'.join(['STLB.*\d\d\d']))

with open(in_file5, 'r', encoding='utf-8') as fin5, open(out_file5, 'w', encoding='utf-8') as fout5:
    lines = fin5.read().splitlines()

    for i, line in enumerate(lines):
        matchObj1 = re.match(start_rx, line)

        if not matchObj1:
            first_two_word = (" ".join(line.split()[:2]))

            if re.match(stlb_match,line):
                line =re.sub(r'(STLB.*\d\d\d)', r'(STLB.*\d\d\d)'+' \n', line)
            elif re.match(first_two_word, line):
                line = line.replace(first_two_word, "\n" + first_two_word)

        fout5.write(line)
STLB 1234 444
 text text text
实际输出:

STLB 1234 444 text text text
STLB 8796 567 text text text
STLB 1234 444text text text

STLB 8796 567text text text
STLB 1234 444
text text text
STLB 8796 567
text text text
(STLB.*\d\d\d) 

(STLB.*\d\d\d) 
stlb_match = re.compile('|'.join(['STLB.*\d\d\d']))

with open(in_file5, 'r', encoding='utf-8') as fin5, open(out_file5, 'w', encoding='utf-8') as fout5:
    lines = fin5.read().splitlines()

    for i, line in enumerate(lines):
        matchObj1 = re.match(start_rx, line)

        if not matchObj1:
            first_two_word = (" ".join(line.split()[:2]))

            if re.match(stlb_match,line):
                line =re.sub(r'(STLB.*\d\d\d)', r'(STLB.*\d\d\d)'+' \n', line)
            elif re.match(first_two_word, line):
                line = line.replace(first_two_word, "\n" + first_two_word)

        fout5.write(line)
STLB 1234 444
 text text text
这是我的代码:

STLB 1234 444 text text text
STLB 8796 567 text text text
STLB 1234 444text text text

STLB 8796 567text text text
STLB 1234 444
text text text
STLB 8796 567
text text text
(STLB.*\d\d\d) 

(STLB.*\d\d\d) 
stlb_match = re.compile('|'.join(['STLB.*\d\d\d']))

with open(in_file5, 'r', encoding='utf-8') as fin5, open(out_file5, 'w', encoding='utf-8') as fout5:
    lines = fin5.read().splitlines()

    for i, line in enumerate(lines):
        matchObj1 = re.match(start_rx, line)

        if not matchObj1:
            first_two_word = (" ".join(line.split()[:2]))

            if re.match(stlb_match,line):
                line =re.sub(r'(STLB.*\d\d\d)', r'(STLB.*\d\d\d)'+' \n', line)
            elif re.match(first_two_word, line):
                line = line.replace(first_two_word, "\n" + first_two_word)

        fout5.write(line)
STLB 1234 444
 text text text

你的替换部件是错误的,你不能把正则表达式放进去。改为:

line = 'STLB 1234 444 text text text'
line = re.sub(r'(STLB.*\d\d\d)', r"\1\n", line)
print line
输出:

STLB 1234 444 text text text
STLB 8796 567 text text text
STLB 1234 444text text text

STLB 8796 567text text text
STLB 1234 444
text text text
STLB 8796 567
text text text
(STLB.*\d\d\d) 

(STLB.*\d\d\d) 
stlb_match = re.compile('|'.join(['STLB.*\d\d\d']))

with open(in_file5, 'r', encoding='utf-8') as fin5, open(out_file5, 'w', encoding='utf-8') as fout5:
    lines = fin5.read().splitlines()

    for i, line in enumerate(lines):
        matchObj1 = re.match(start_rx, line)

        if not matchObj1:
            first_two_word = (" ".join(line.split()[:2]))

            if re.match(stlb_match,line):
                line =re.sub(r'(STLB.*\d\d\d)', r'(STLB.*\d\d\d)'+' \n', line)
            elif re.match(first_two_word, line):
                line = line.replace(first_two_word, "\n" + first_two_word)

        fout5.write(line)
STLB 1234 444
 text text text
或:


如果要删除第二行开头的空格,则替换零件错误,不能将regex放入其中。改为:

line = 'STLB 1234 444 text text text'
line = re.sub(r'(STLB.*\d\d\d)', r"\1\n", line)
print line
输出:

STLB 1234 444 text text text
STLB 8796 567 text text text
STLB 1234 444text text text

STLB 8796 567text text text
STLB 1234 444
text text text
STLB 8796 567
text text text
(STLB.*\d\d\d) 

(STLB.*\d\d\d) 
stlb_match = re.compile('|'.join(['STLB.*\d\d\d']))

with open(in_file5, 'r', encoding='utf-8') as fin5, open(out_file5, 'w', encoding='utf-8') as fout5:
    lines = fin5.read().splitlines()

    for i, line in enumerate(lines):
        matchObj1 = re.match(start_rx, line)

        if not matchObj1:
            first_two_word = (" ".join(line.split()[:2]))

            if re.match(stlb_match,line):
                line =re.sub(r'(STLB.*\d\d\d)', r'(STLB.*\d\d\d)'+' \n', line)
            elif re.match(first_two_word, line):
                line = line.replace(first_two_word, "\n" + first_two_word)

        fout5.write(line)
STLB 1234 444
 text text text
或:


如果要删除第二行开头的空格

假设这些行始终采用这种格式STLB,这将起到以下作用:

密码 输入 输出 请注意正则表达式末尾的
\s*
,但是捕获组在此之前结束,后面的空格将被忽略

使用列表理解
写线

让我知道这是否适用于您

假设行始终采用这种格式STLB,这一切都可以:

密码 输入 输出 请注意正则表达式末尾的
\s*
,但是捕获组在此之前结束,后面的空格将被忽略

使用列表理解
写线

在STLB危险之后,让我知道这是否对您有效。请改为使用\s..像这样的东西(STLB\s\d*)或(STLB\d*),您需要将这一行
line=re.sub(r'(STLB.*\d\d\d')、r'(STLB.*\d\d)+'\n
更改为
line=re.sub(r'(STLB.*\d\d\d')、r'\1\n
在替换捕获的内容时,您需要使用
\1
line=re.sub+'\n'*在STLB之后是危险的。请改用\s..类似这样的东西(STLB\s\d*)或(STLB\d*)您需要更改此行
line=re.sub(r'(STLB.*\d\d\d)'、r'(STLB.*\d\d\d)+'\n
line=re.sub(r'(STLB.\d\d\d\d*)、r'\n
替换捕获的内容时,您需要使用
\1
line=re.sub(STLB.*\d\d\d.*”,r“\1”,line)+'\n'
*
可能太贪婪了,例如,如果
文本文本中有数字,则
@EricDuminil:没错,但使用
*?
将在前3位之后插入换行符(即
123
4
之间)我正在使用OP的正则表达式。
*
可能太贪婪了,例如,如果
文本中有数字,EricDuminil:你说得对,但是使用
*?
会在前3位之后插入换行符(即
123
4
)我正在使用OP的正则表达式。@Mady请记住,如果我的答案对你有效,请记住将其标记为已接受(绿色复选标记)。提前Thnx!@Mady请记住,如果我的答案对你有效,请记住将其标记为已接受(绿色复选标记)。提前Thnx!