Python 如何在一个(只有一种)有序列表中查找元素序列

Python 如何在一个(只有一种)有序列表中查找元素序列,python,algorithm,Python,Algorithm,我有一个日志条目列表,看起来有点像这样: a = [ {‘log’: ‘abc’, ‘time’: 0}, {‘log’: ‘123’, ‘time’: 1}, {‘log’: ‘def’, ‘time’: 2}, {‘log’: ‘abc’, ‘time’: 2}, {‘log’: ‘ghi’, ‘time’: 3}, {‘log’: ‘def’, ‘time’: 3} ] 其中,时间精确到秒,但标记为同时发生的事件可能以彼此相关的任何顺序发生

我有一个日志条目列表,看起来有点像这样:

a = [
    {‘log’: ‘abc’, ‘time’: 0},
    {‘log’: ‘123’, ‘time’: 1},
    {‘log’: ‘def’, ‘time’: 2},
    {‘log’: ‘abc’, ‘time’: 2},
    {‘log’: ‘ghi’, ‘time’: 3},
    {‘log’: ‘def’, ‘time’: 3}
]
其中,时间精确到秒,但标记为同时发生的事件可能以彼此相关的任何顺序发生。例如,在上面的列表中,
a[5]
可能按时间顺序发生在
a[4]
之前

现在假设有一系列我想与
a
匹配的日志:

b = [
    {‘log’: ‘abc’, ‘time’: 0},
    {‘log’: ‘def’, ‘time’: 1},
    {‘log’: ‘ghi’, ‘time’: 2}
]
我希望找到
a
的有序集,其中
子集[0]['time']
尽可能接近
子集[-1]['time']
(换句话说,子集跨越的时间尽可能短):

编辑以进一步澄清:

如果与
b
匹配的
a
的子集为:

# a[0], a[4], a[5]
a1 = [
    {‘log’: ‘abc’, ‘time’: 0},
    {‘log’: ‘ghi’, ‘time’: 3},
    {‘log’: ‘def’, ‘time’: 3}
]

# a[3], a[4], a[5]
a2 = [
    {‘log’: ‘abc’, ‘time’: 2},
    {‘log’: ‘ghi’, ‘time’: 3},
    {‘log’: ‘def’, ‘time’: 3}
]

然后,
a1
中的条目在3秒内出现,而
a2
中的条目在1秒内出现。由于
a2
中条目的持续时间短于
a1
,如果我正确理解了问题,我希望返回
a2

,此解决方案适用于提供的示例数据

总体做法是:

  • 查找匹配项

  • 寻找重复的

  • 检查是否将每个副本放回原始匹配中可以缩短所用时间

  • 重复此操作,直到没有重复项,并且经过的时间比原始匹配列表缩短。或者直到不再有重复的

  • 这有点复杂&对于更大的问题可能效率不高,但希望这些评论能帮助您朝着正确的方向前进

    # -*- coding: UTF-8 -*-
    
    
    from collections import Counter
    import copy
    
    
    def drop_repeated_logs(list_of_dicts):
        """
        drop the repeated text in logs and compute a new time range
        if the new elapsed time range is lower, then return that list of dictionaries
        """
        only_logs = [d['log'] for d in list_of_dicts]
        original_range = list_of_dicts[-1]['time'] - list_of_dicts[0]['time']
        counts = Counter(only_logs)
        original_max_count = counts[max(counts,key=lambda i:counts[i])]
        original_len = len(list_of_dicts)
    
        print(counts)
        for log_txt in only_logs:
            num_occ = counts[log_txt]
            if num_occ > 1:
                # list of matching log subsets without repeats
                new_d = [entry for entry in list_of_dicts if entry['log']!=log_txt]
                print new_d
                # repeating log subset entries
                entries_to_try = [entry for entry in list_of_dicts if entry['log']==log_txt]
                print entries_to_try
                for repeat in entries_to_try:
                    temp_d_list = copy.copy(new_d)
                    # add one of the repeated entries to the matches
                    temp_d_list.append(repeat)
                    newly_sorted = sorted(temp_d_list, key=lambda k:k["time"])
                    # check what the new "time elapsed"
                    new_range = newly_sorted[-1]['time'] - newly_sorted[0]['time']
                    print "Newly computed range of {}: {}\n".format(newly_sorted,new_range)
                    new_len = len(newly_sorted)
    
                    # we should return an updated list if the range is lower or we were able to get one repeated entry out
                    # see if the new time elapsed is an improvement from the original
                    if new_range < original_range :
                        print("Found a smaller range, returning: {}".format(new_range))
                        return (new_range,newly_sorted)
                if new_range == original_range and new_len < original_len:
                    print("The range is unchanged, but got rid of a duplicate log text")
                    return (new_range,newly_sorted)
    
        return original_range,list_of_dicts
    

    仅从提供的数据中获取日志文本 找到两个列表的交集 a组和b组数据之间的重复日志 查找a中与b中的条目匹配的所有条目 按时间键的顺序对匹配项进行排序 查找时间范围(这是要最小化的参数)

    输出
    其中一个“a[5]”是打字错误?已修复,谢谢。对于投票结束的人,您是否介意就我如何提出更好的问题给出反馈?您能否在最短的时间内明确定义
    ?我不知道你是怎么得到结果的。这是一个不清楚的投票。考虑解释更多的规则,允许从两个输入数组中导出所需的输出。还有什么是“有序列表”?期望的输出“查找元素”是如何集成的?在同一第二时间戳上重新排序的可能性有多大?b中的时间值有什么相关性?简言之,即使你不能编写代码,你也需要更详细地解释它。嗯,我相信问题很清楚。。我错过的一件事是:
    A
    中的
    日志必须是连续的吗?(例如,能否在
    a
    中的
    ghi
    def
    之间设置一个日志
    xyz
    # -*- coding: UTF-8 -*-
    
    
    from collections import Counter
    import copy
    
    
    def drop_repeated_logs(list_of_dicts):
        """
        drop the repeated text in logs and compute a new time range
        if the new elapsed time range is lower, then return that list of dictionaries
        """
        only_logs = [d['log'] for d in list_of_dicts]
        original_range = list_of_dicts[-1]['time'] - list_of_dicts[0]['time']
        counts = Counter(only_logs)
        original_max_count = counts[max(counts,key=lambda i:counts[i])]
        original_len = len(list_of_dicts)
    
        print(counts)
        for log_txt in only_logs:
            num_occ = counts[log_txt]
            if num_occ > 1:
                # list of matching log subsets without repeats
                new_d = [entry for entry in list_of_dicts if entry['log']!=log_txt]
                print new_d
                # repeating log subset entries
                entries_to_try = [entry for entry in list_of_dicts if entry['log']==log_txt]
                print entries_to_try
                for repeat in entries_to_try:
                    temp_d_list = copy.copy(new_d)
                    # add one of the repeated entries to the matches
                    temp_d_list.append(repeat)
                    newly_sorted = sorted(temp_d_list, key=lambda k:k["time"])
                    # check what the new "time elapsed"
                    new_range = newly_sorted[-1]['time'] - newly_sorted[0]['time']
                    print "Newly computed range of {}: {}\n".format(newly_sorted,new_range)
                    new_len = len(newly_sorted)
    
                    # we should return an updated list if the range is lower or we were able to get one repeated entry out
                    # see if the new time elapsed is an improvement from the original
                    if new_range < original_range :
                        print("Found a smaller range, returning: {}".format(new_range))
                        return (new_range,newly_sorted)
                if new_range == original_range and new_len < original_len:
                    print("The range is unchanged, but got rid of a duplicate log text")
                    return (new_range,newly_sorted)
    
        return original_range,list_of_dicts
    
    b = [{"log": "abc", 'time': 0},
    {'log': 'def', 'time': 1},
    {'log': 'ghi', 'time': 2}
    ]
    
    a = [
        {'log': 'abc', 'time': 0},
        {'log': '123', 'time': 1},
        {'log': 'def', 'time': 2},
        {'log': 'abc', 'time': 2},
        {'log': 'ghi', 'time': 3},
        {'log': 'def', 'time': 3}
    ]
    
    a_logs = [d['log'] for d in a]
    b_logs = [d['log'] for d in b]
    
    def intersection(a,b):
        return list(set(a)&set(b))
    
    logs_of_interest = intersection(a_logs,b_logs)
    
    matches_in_a = [entry for entry in a if entry['log'] in logs_of_interest]
    
    sorted_matches = sorted(matches_in_a, key=lambda k: k['time']) 
    print(sorted_matches)
    
    rnge = sorted_matches[-1]['time']-sorted_matches[0]['time']
    
    sorted_logs = [d['log'] for d in sorted_matches]
    
    log_counts = Counter(sorted_logs)
    max_count = log_counts[max(log_counts,key=lambda i:log_counts[i])]
    print "max count: {}".format(max_count)
    
    # intitialize a lower range to get the while loop going
    lower_range = rnge+1
    
    while lower_range > rnge or max_count > 1:
        lower_range, sorted_matches = drop_repeated_logs(sorted_matches)
        sorted_logs = [d['log'] for d in sorted_matches]
        log_counts = Counter(sorted_logs)
        print("log counts: {}".format(log_counts))
        max_count = log_counts[max(log_counts,key=lambda i:log_counts[i])]
        print "MAX COUNT: {}".format(max_count)
        print "NEW LOWER RANGE: {}".format(lower_range)
    
    
    print("FINAL ANSWER: range: {}; {}".format(lower_range,sorted_matches))
    
    > [{'log': 'abc', 'time': 2}, {'log': 'ghi', 'time': 3}, {'log': 'def', 'time': 3}]