如何在Python中为特定类型的字母数字字创建正则表达式_Python_Regex

如何在Python中为特定类型的字母数字字创建正则表达式

python regex

如何在Python中为特定类型的字母数字字创建正则表达式,python,regex,Python,Regex,我正在寻找在Python中创建基于正则表达式的搜索的建议。我在服务器日志文件中获得了以下类型的字符串值 2017-03-18 13:24:05791信息[STDOUT]子请求状态：重新提交到iOS\u ABZ824 2017-03-12 13:24:05796信息[STDOUT]子提交状态：重新提交INDROS_MSR656 2017-04-12 13:24:05991信息[STDOUT]子请求状态：重新提交到p_GSN848 我需要搜索日志并提取如下值 2017-03-18 13:24:057

我正在寻找在Python中创建基于正则表达式的搜索的建议。我在服务器日志文件中获得了以下类型的字符串值

2017-03-18 13:24:05791信息[STDOUT]子请求状态：重新提交到iOS\u ABZ824
2017-03-12 13:24:05796信息[STDOUT]子提交状态：重新提交INDROS_MSR656
2017-04-12 13:24:05991信息[STDOUT]子请求状态：重新提交到p_GSN848

我需要搜索日志并提取如下值

2017-03-18 13:24:05791 INBIOS_ABZ824
2017-03-12 13:24:05796因德罗斯 2017-04-12 13:24:05991印度水电站GSN848

我使用下面的代码，但它提取了出现类似字符串的完整行（INBIOS_ABZ824）。如何从上面的日志中只提取指定的值，请分享您的想法

import os
import re

# Regex used to match relevant loglines (in this case)

line_regex = re.compile(r"[A-Z]+IOS_[A-Z]+[0-9]+", re.IGNORECASE)


# Output file, where the matched loglines will be copied to
output_filename = os.path.normpath("output.log")
# Overwrites the file, ensure we're starting out with a blank file
with open(output_filename, "w") as out_file:
    out_file.write("")

# Open output file in 'append' mode
with open(output_filename, "a") as out_file:
    # Open input file in 'read' mode
    with open("ServerError.txt", "r") as in_file:
        # Loop over each log line
        for line in in_file:
            # If log line matches our regex, print to console, and output file
            if (line_regex.search(line)):
                print(line)
                out_file.write(line)

一个regexp就可以了。常见的线程似乎都是大写字母alpha，从后面跟着

TEC.

、更多的alpha和一个数字开始，所以

[A-Z]+TEC.[A-Z]+[0-9]+

请参阅以获取测试。

您可以匹配一个或多个大写字符

[a-Z]+

、一个下划线

\u

，然后将零个或多个

[a-Z]*

乘以大写字符，后跟一个或多个数字

[0-9]+

Use可能使用

\b

，因此它不是较长匹配的一部分

\b[A-Z]+\u[A-Z]*[0-9]+\b

我们终于找到了完美的答案。这将只提取所需的字符串，并消除与模式相关的其他值

在这里，我使用另一个re.match（）调用来优化搜索结果，最后将其发送到outfile

import os
import re

# Regex used to match relevant loglines (in this case, a specific IP address)
line_regex = re.compile(r"error", re.IGNORECASE)

line_regex = re.compile(r"[A-Z]+OS_[A-Z]+[0-9]+", re.IGNORECASE)


# Output file, where the matched loglines will be copied to
output_filename = os.path.normpath("output.log")
# Overwrites the file, ensure we're starting out with a blank file
with open(output_filename, "w") as out_file:
    out_file.write("")

# Open output file in 'append' mode
with open(output_filename, "a") as out_file:
    # Open input file in 'read' mode
    with open("ServerError.txt", "r") as in_file:
        # Loop over each log line
        for line in in_file:
            # If log line matches our regex, print to console, and output file
            if (line_regex.search(line)):

                # Get index of last space
                last_ndx = line.rfind(' ')
                # line[:23]: The time stamp (first 23 characters)
                # line[last_ndx:]: Last space and following characters
                # using match object to eliminate other strings which are associated with the pattern ,
                # need the string from which the request ID is in the last index
                matchObj = re.match(line_regex, line[last_ndx+1:])
                #print(matchObj)
                #check if matchobj is not null
                if matchObj:
                    print(line[:23] + line[last_ndx:])
                    out_file.write(line[:23] + line[last_ndx:])