Python 如何删除包含特定字符串但字符串内部长度不同的字符串?

Python 如何删除包含特定字符串但字符串内部长度不同的字符串?,python,python-2.7,python-3.x,Python,Python 2.7,Python 3.x,我有一个日志文件,我想删除一些特定的部分。下面显示了日志文件的一部分: I0216 10:18:04.720626 31559 solver.cpp:273] Solving I0216 10:18:04.720630 31559 solver.cpp:274] Learning Rate Policy: step I0216 10:18:05.242708 31559 solver.cpp:219] Iteration 0 (0 iter/s, 0.522037s/50 iters), lo

我有一个日志文件,我想删除一些特定的部分。下面显示了日志文件的一部分:

I0216 10:18:04.720626 31559 solver.cpp:273] Solving 
I0216 10:18:04.720630 31559 solver.cpp:274] Learning Rate Policy: step
I0216 10:18:05.242708 31559 solver.cpp:219] Iteration 0 (0 iter/s, 0.522037s/50 iters), loss = 1.60944
I0216 10:18:05.242750 31559 solver.cpp:238]     Train net output #0: accuracy = 0
I0216 10:18:05.242763 31559 solver.cpp:238]     Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss)
I0216 10:18:05.242785 31559 sgd_solver.cpp:105] Iteration 0, lr = 1e-10
I0216 10:18:22.386440 31559 solver.cpp:219] Iteration 50 (2.91648 iter/s, 17.144s/50 iters), loss = 1.60944
I0216 10:18:22.386497 31559 solver.cpp:238]     Train net output #0: accuracy = 0.643982
I0216 10:18:22.386509 31559 solver.cpp:238]     Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss)
I0216 10:18:22.386515 31559 sgd_solver.cpp:105] Iteration 50, lr = 1e-10
I0216 10:18:39.549926 31559 solver.cpp:219] Iteration 100 (2.91313 iter/s, 17.1637s/50 iters), loss = 1.60944
I0216 10:18:39.550071 31559 solver.cpp:238]     Train net output #0: accuracy = 1
I0216 10:18:39.550087 31559 solver.cpp:238]     Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss)
I0216 10:18:39.550093 31559 sgd_solver.cpp:105] Iteration 100, lr = 1e-10
I0216 10:18:56.714752 31559 solver.cpp:219] Iteration 150 (2.91292 iter/s, 17.1649s/50 iters), loss = 1.60944
I0216 10:18:56.714824 31559 solver.cpp:238]     Train net output #0: accuracy = 0.624222
I0216 10:18:56.714838 31559 solver.cpp:238]     Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss)
I0216 10:18:56.714845 31559 sgd_solver.cpp:105] Iteration 150, lr = 1e-10
I0216 10:19:13.893241 31559 solver.cpp:219] Iteration 200 (2.91059 iter/s, 17.1787s/50 iters), loss = 1.60944
I0216 10:19:13.893450 31559 solver.cpp:238]     Train net output #0: accuracy = 1
I0216 10:19:13.893467 31559 solver.cpp:238]     Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss)
I0216 10:19:13.893473 31559 sgd_solver.cpp:105] Iteration 200, lr = 1e-10
I0216 10:19:31.094591 31559 solver.cpp:219] Iteration 250 (2.90674 iter/s, 17.2014s/50 iters), loss = 1.60944
I0216 10:19:31.094650 31559 solver.cpp:238]     Train net output #0: accuracy = 0.61937
I0216 10:19:31.094662 31559 solver.cpp:238]     Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss)
I0216 10:19:31.094667 31559 sgd_solver.cpp:105] Iteration 250, lr = 1e-10
I0216 10:19:48.290045 31559 solver.cpp:219] Iteration 300 (2.90772 iter/s, 17.1956s/50 iters), loss = 1.60944
I0216 10:19:48.290187 31559 solver.cpp:238]     Train net output #0: accuracy = 0.959229
I0216 10:19:48.290205 31559 solver.cpp:238]     Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss)
I0216 10:19:48.290210 31559 sgd_solver.cpp:105] Iteration 300, lr = 1e-10
I0216 10:20:05.504201 31559 solver.cpp:219] Iteration 350 (2.90457 iter/s, 17.2142s/50 iters), loss = 1.60944
I0216 10:20:05.504257 31559 solver.cpp:238]     Train net output #0: accuracy = 0.772217
I0216 10:20:05.504271 31559 solver.cpp:238]     Train net output #1: loss = 1.60944 (* 1 = 1.60944 loss)
可以看到,有些行以
31559 solver.cpp:219]迭代开始

我希望在不更改文件其他行的情况下,只更改这些行,例如,这一行:FROM

   ... solver.cpp:219] Iteration 14750 (2.9004 iter/s, 17.239s/50 iters), loss = 1.60934

这意味着我要从包含上述行的行中删除子字符串
(2.9004 iter/s,17.239s/50 iter)
,但其他行保持不变。 谢谢

我想删除一行中包含
(2.8995 iter/s,17.2444s/50 iter)
的部分,该字符串的长度可能彼此不同。本部分以
开头,以数字结尾(可能与另一行不同,以
iter/s、
结尾,再次以数字结尾)

正如@delca85所指出的,模式如下:

p = "(\(\d*[.]?\d* iter/s\,\s\d*[.]?\d*)(s/[0-9]+)?(\siters\))"

有人有什么建议吗?提前谢谢

我对字符串的第二部分做了额外的假设,即它是一个带有
s/number
的数字。我希望我没有错,无论如何,如果是这样,请告诉我,我很乐意为您找到另一个解决方案

这是我给你的建议:

import re

string = "I0216 11:42:50.047427 31559 solver.cpp:219] Iteration 14750 (2.9004 iter/s, 17.239s/50 iters), loss = 1.60934 I0216 11:42:50.047472 31559 solver.cpp:238]     Train net output \#0: accuracy = 1\" "

p = "\(\d*[.]?\d* iter/s\, \d*[.]?\d*s/[0-9]+ iters\)"
pattern = re.compile(p)
for l in pattern.findall(string): 
    print l
我希望我在帮助你

s/50可选
如果字符串的第二部分中有
s/50
是可选的,则可以使用此解决方案:

import re

string = "I0216 11:42:50.047427 31559 solver.cpp:219] Iteration 14750 (2.9004 iter/s, 17.239s/50 iters), loss = 1.60934 I0216 11:42:50.047472 31559 solver.cpp:238]     Train net output \#0: accuracy = 1\" "
string = string + "I0216 11:42:50.047427 31559 solver.cpp:219] Iteration 14750 (2.9004 iter/s, 17.239 iters), loss = 1.60934 I0216 11:42:50.047472 31559 solver.cpp:238]     Train net output \#0: accuracy = 1\" " 
p = "(\(\d*[.]?\d* iter/s\,\s\d*[.]?\d*)(s/[0-9]+)?(\siters\))"
pattern = re.compile(p)
for l in pattern.findall(string): 
    print ''.join(l)
打开文件,读取行,匹配模式并替换文件中的行

import re

p = "(\(\d*[.]?\d* iter/s\,\s\d*[.]?\d*)(s/[0-9]+)?(\siters\))"
pattern = re.compile(p)
for line in fileinput.input("file.txt", inplace=1):
    for m in pattern.findall(line): 
        string = ''.join(m)
        if string in line:
            line = line.replace(string, "")
    sys.stdout.write(line)

您可以为此使用正则表达式模块(称为“re”),这可以帮助您快速隔离子字符串

代码如下:

import re

file = open('your_file_with_correct_path')
file_content = file.read()

#The string you provided
#No need to do the below string definition as you will use the file_content
#str = '   I0216 11:42:50.047427 31559 solver.cpp:219] Iteration 14750 (2.9004 iter/s, 17.239s/50 iters), loss = 1.60934 I0216 11:42:50.047472 31559 solver.cpp:238] Train net output #0: accuracy = 1'

sub_tring = re.findall('\(\d+.*\)', file_content)

for element in sub_string:
    #add element to the file you want

#save the file where you added the elements
sub_string是所有子字符串的列表,这些子字符串与
findall
方法的第一个参数所要求的模式相匹配

我建议您查看中使用的各种特殊字符,因为这对于清理字符串非常有用


谢谢。

谢谢您的回复,我如何打开文件并找到
p=“\([0-9\.]+iter/s\,[0-9\.]+s/[0-9]+iter\)”
并从文件中删除字符串。程序不应该读取这些行吗?Thanks@S.EB我已经在文件中添加了替换行。我希望这最终能帮助您,并且您会接受并更新我的答案。感谢您的答案。不幸的是,它不起作用,因为您在这里带来的
字符串与
在行中不相同。th迭代次数持续增加changing@S.EB:我编辑代码是为了用regex match仅替换该行。非常感谢,我们可以替换的字符串becoz
string
不一样,文件中不仅有一个字符串?但它包括以下内容,因为我添加了这一行
if'solver.cpp:219]迭代'in line和'loss='in line
感谢您的回复,此
str
只是日志文件中的一种类型的行,我们如何更改程序读取一行并处理该行是否包含此行,例如
(2.9004 iter/s,17.239s/50 iter)
,如果是,则从行中删除该部分并保存它。您可以读取整个日志文件,因此在您的情况下str将是str=log\u file.read()。然后您可以创建我在前面的代码中添加的sub\u字符串变量。这将为您提供匹配的所有模式的列表(即您的(…iter)在您的日志文件中。要保存它,您只需迭代子字符串列表,并将每个元素添加到您想要的文档中。然后在过程结束时保存文档。@S.EB我已修改了我的回复,以便您可以查看我之前评论背后的整个过程。这将帮助您获得所需内容。非常感谢您的帮助p、 但我对python不太熟悉。我编辑了我的问题。刚刚看到你对问题的修改。那么你真正想要的是没有iter部分的日志文件,对吗?
import re

file = open('your_file_with_correct_path')
file_content = file.read()

#The string you provided
#No need to do the below string definition as you will use the file_content
#str = '   I0216 11:42:50.047427 31559 solver.cpp:219] Iteration 14750 (2.9004 iter/s, 17.239s/50 iters), loss = 1.60934 I0216 11:42:50.047472 31559 solver.cpp:238] Train net output #0: accuracy = 1'

sub_tring = re.findall('\(\d+.*\)', file_content)

for element in sub_string:
    #add element to the file you want

#save the file where you added the elements