(python)查找匹配的日志行

(python)查找匹配的日志行,python,pattern-matching,conditional,Python,Pattern Matching,Conditional,我有一个包含登录/注销详细信息的元组。我正在尝试匹配相应的登录和注销行。我想首先匹配包含“logon”的行,检索用户名,然后搜索下一行匹配“logoff”和用户名 log_lines =[('2014-01-28 16:54:58', 'LOGON', 'jane', 'machinename'), ('2014-01-28 17:50:18', 'LOGOFF', 'jane', 'machinename'), ('2014-01-28 19:53:02', 'LOGON', 'skip',

我有一个包含登录/注销详细信息的元组。我正在尝试匹配相应的登录和注销行。我想首先匹配包含“logon”的行,检索用户名,然后搜索下一行匹配“logoff”和用户名

log_lines =[('2014-01-28 16:54:58', 'LOGON', 'jane', 'machinename'),
('2014-01-28 17:50:18', 'LOGOFF', 'jane', 'machinename'),
('2014-01-28 19:53:02', 'LOGON', 'skip', 'machinename'),
('2014-01-28 19:54:12', 'LOGOFF', 'skip', 'machinename'),
('2014-01-29 09:41:52', 'LOGON', 'jim', 'machinename'),
('2014-01-29 09:42:45', 'LOGOFF', 'jim', 'machinename'),
('2014-01-29 11:59:20', 'LOGON', 'skip', 'machinename'),
('2014-01-29 12:00:52', 'LOGOFF', 'skip', 'machinename')]

for logon in log_lines:
    if logon[1] == 'LOGON':
        name = logon[2]
        print name
        print logon
        for logoff in log_lines:
            if logoff[1] == 'LOGOFF' and logoff[2] == name
            print logoff

我不确定嵌套的if语句是否正确。

首先,您的登录[0]将返回日期。您需要使用登录[1]来检索登录或注销。然后根据您的情况,要检索需要调用logon[3]的名称,您的算法并不糟糕。通过使用索引,您可以稍微减少它。例如:

for i in xrange(len(log_lines)):
    if log_lines[i][0] == 'LOGON':
        name = logon[1]
        for j in xrange(i,len(log_lines)):
            if log_lines[j][0] == 'LOGOFF' and loglines[j][1] == name:
                print log_lines[j]

这样做平均将算法运行时间减少一半。注意,内部循环从下一行开始,而不是从下一行开始。

尝试使用
next
和从下一行开始的
log\u行的片段:

for i, line in enumerate(log_lines):
    if line[1] == 'LOGON':
        found = next(j for j,search in enumerate(log_lines[i+1:],i+1) 
            if search[1] == 'LOGOFF' and line[2] == search[2])
        print('found {} logoff match at index {}'.format(line[2],found))
输出:

found jane logoff match at index 1
found skip logoff match at index 3
found jim logoff match at index 5
found skip logoff match at index 7
jane found log on and log off
skip found log on and log off
skip found log on and log off
jim found log on and log off
skip found log on and log off
这有效地从下一行开始搜索,而不是迭代整个列表寻找“注销”(并在找到匹配项后立即停止)<代码>下一步
提供了一定的灵活性,因为您可以为其提供默认值,以防生成器表达式在没有找到匹配项的情况下耗尽

i、 e

如果我们在列表的末尾,而用户还没有注销,我们将返回
None
,而不是错误

请注意,此方法处理同一用户多次登录/注销。你的算法处理得不太好

使用切片:

for l in log_lines:
    if l[1] == 'LOGON':
        start = log_lines.index(l)+1
        for item in log_lines[start:]:
            if (l[2]==item[2]) and (item[1]=='LOGOFF'):
                print l[2],"found log on and log off"
输出:

found jane logoff match at index 1
found skip logoff match at index 3
found jim logoff match at index 5
found skip logoff match at index 7
jane found log on and log off
skip found log on and log off
skip found log on and log off
jim found log on and log off
skip found log on and log off

嵌套循环方法意味着算法是O(N^2),即使内部起始索引变得更有效。下面是一个不使用嵌套循环的平均O(N)方法的示例

它还尝试处理一些不匹配事务的情况,假设用户登录后必须再注销一次,然后才能再次登录

log_lines =[('2014-01-28 16:54:58', 'LOGON', 'jane', 'machinename'),
('2014-01-28 17:50:18', 'LOGOFF', 'jane', 'machinename'),
('2014-01-28 19:53:02', 'LOGON', 'skip', 'machinename'),
('2014-01-28 19:54:12', 'LOGOFF', 'skip', 'machinename'),
('2014-01-29 09:41:52', 'LOGON', 'jim', 'machinename'),
('2014-01-29 09:42:45', 'LOGOFF', 'jim', 'machinename'),
('2014-01-29 11:59:20', 'LOGON', 'skip', 'machinename'),
('2014-01-29 12:00:52', 'LOGOFF', 'skip', 'machinename'),
# Following are made up, weird logs
('2014-01-29 12:00:52', 'LOGOFF', 'dooz', 'machinename'),
('2014-01-29 12:00:52', 'LOGOFF', 'booz', 'machinename'),
('2014-01-29 12:00:52', 'LOGON', 'fooz', 'machinename'),]

from pprint import pprint

logged_in = {}
transactions_matched = []
transactions_weird = []
for line in log_lines:
    action = line[1]
    user = line[2]
    if action == 'LOGON':
        if user not in logged_in:
            logged_in[user] = line
        else: # Abnormal case 1: LOGON again when the user is already LOGON
            transactions_weird.append(logged_in.pop(user))
            logged_in[user] = line
    elif action == 'LOGOFF':
        if user in logged_in:
            transactions_matched.append((logged_in.pop(user), line))
        else: # Abnormal case 2: LOGOFF when the user is never LOGIN yet
            transactions_weird.append(line)

# Dangling log-in actions, considered as abnormal
transactions_weird.extend(logged_in.values())          

print 'Matched:'
pprint(transactions_matched)
print 'Weird:'
pprint(transactions_weird)
输出:

Matched:
[(('2014-01-28 16:54:58', 'LOGON', 'jane', 'machinename'),
  ('2014-01-28 17:50:18', 'LOGOFF', 'jane', 'machinename')),
 (('2014-01-28 19:53:02', 'LOGON', 'skip', 'machinename'),
  ('2014-01-28 19:54:12', 'LOGOFF', 'skip', 'machinename')),
 (('2014-01-29 09:41:52', 'LOGON', 'jim', 'machinename'),
  ('2014-01-29 09:42:45', 'LOGOFF', 'jim', 'machinename')),
 (('2014-01-29 11:59:20', 'LOGON', 'skip', 'machinename'),
  ('2014-01-29 12:00:52', 'LOGOFF', 'skip', 'machinename'))]
Weird:
[('2014-01-29 12:00:52', 'LOGOFF', 'dooz', 'machinename'),
 ('2014-01-29 12:00:52', 'LOGOFF', 'booz', 'machinename'),
 ('2014-01-29 12:00:52', 'LOGON', 'fooz', 'machinename')]

您是否必须处理不匹配的事务,例如不登录就注销,反之亦然?目前,不匹配的事务可能会被丢弃。我将为脚本的版本0.2保存该功能。;-)是的,你说得对。我将相应地调整这个问题。提供的大多数解决方案都很有效。我选择使用这个,因为它创建了输出,我可以继续开发脚本。对于像我这样学习Python的人来说非常有用。干杯