如何在Python中为特定类型的字母数字字创建正则表达式
我正在寻找在Python中创建基于正则表达式的搜索的建议。我在服务器日志文件中获得了以下类型的字符串值 2017-03-18 13:24:05791信息[STDOUT]子请求状态:重新提交到iOS\u ABZ824如何在Python中为特定类型的字母数字字创建正则表达式,python,regex,Python,Regex,我正在寻找在Python中创建基于正则表达式的搜索的建议。我在服务器日志文件中获得了以下类型的字符串值 2017-03-18 13:24:05791信息[STDOUT]子请求状态:重新提交到iOS\u ABZ824 2017-03-12 13:24:05796信息[STDOUT]子提交状态:重新提交INDROS_MSR656 2017-04-12 13:24:05991信息[STDOUT]子请求状态:重新提交到p_GSN848 我需要搜索日志并提取如下值 2017-03-18 13:24:057
2017-03-12 13:24:05796信息[STDOUT]子提交状态:重新提交INDROS_MSR656
2017-04-12 13:24:05991信息[STDOUT]子请求状态:重新提交到p_GSN848 我需要搜索日志并提取如下值 2017-03-18 13:24:05791 INBIOS_ABZ824
2017-03-12 13:24:05796因德罗斯 2017-04-12 13:24:05991印度水电站GSN848 我使用下面的代码,但它提取了出现类似字符串的完整行(INBIOS_ABZ824)。如何从上面的日志中只提取指定的值,请分享您的想法
import os
import re
# Regex used to match relevant loglines (in this case)
line_regex = re.compile(r"[A-Z]+IOS_[A-Z]+[0-9]+", re.IGNORECASE)
# Output file, where the matched loglines will be copied to
output_filename = os.path.normpath("output.log")
# Overwrites the file, ensure we're starting out with a blank file
with open(output_filename, "w") as out_file:
out_file.write("")
# Open output file in 'append' mode
with open(output_filename, "a") as out_file:
# Open input file in 'read' mode
with open("ServerError.txt", "r") as in_file:
# Loop over each log line
for line in in_file:
# If log line matches our regex, print to console, and output file
if (line_regex.search(line)):
print(line)
out_file.write(line)
一个regexp就可以了。常见的线程似乎都是大写字母alpha,从后面跟着
TEC.
、更多的alpha和一个数字开始,所以
[A-Z]+TEC.[A-Z]+[0-9]+
请参阅以获取测试。您可以匹配一个或多个大写字符
[a-Z]+
、一个下划线\u
,然后将零个或多个[a-Z]*
乘以大写字符,后跟一个或多个数字[0-9]+
Use可能使用\b
,因此它不是较长匹配的一部分
\b[A-Z]+\u[A-Z]*[0-9]+\b
我们终于找到了完美的答案。这将只提取所需的字符串,并消除与模式相关的其他值 在这里,我使用另一个re.match()调用来优化搜索结果,最后将其发送到outfile
import os
import re
# Regex used to match relevant loglines (in this case, a specific IP address)
line_regex = re.compile(r"error", re.IGNORECASE)
line_regex = re.compile(r"[A-Z]+OS_[A-Z]+[0-9]+", re.IGNORECASE)
# Output file, where the matched loglines will be copied to
output_filename = os.path.normpath("output.log")
# Overwrites the file, ensure we're starting out with a blank file
with open(output_filename, "w") as out_file:
out_file.write("")
# Open output file in 'append' mode
with open(output_filename, "a") as out_file:
# Open input file in 'read' mode
with open("ServerError.txt", "r") as in_file:
# Loop over each log line
for line in in_file:
# If log line matches our regex, print to console, and output file
if (line_regex.search(line)):
# Get index of last space
last_ndx = line.rfind(' ')
# line[:23]: The time stamp (first 23 characters)
# line[last_ndx:]: Last space and following characters
# using match object to eliminate other strings which are associated with the pattern ,
# need the string from which the request ID is in the last index
matchObj = re.match(line_regex, line[last_ndx+1:])
#print(matchObj)
#check if matchobj is not null
if matchObj:
print(line[:23] + line[last_ndx:])
out_file.write(line[:23] + line[last_ndx:])