Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/281.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 比较并合并列表中的数据_Python_Python 3.x - Fatal编程技术网

Python 比较并合并列表中的数据

Python 比较并合并列表中的数据,python,python-3.x,Python,Python 3.x,我希望比较列表中包含源IP、目标IP、数据包时间和大小的多行。我想在具有相同源IP和目标IP的所有线路之间合并数据。例如,如果有两行或更多行具有相同的源和目标IP,如何组合所有数据。我不想只比较第一行和第二行,我希望匹配列表中具有相同172.217.2.161(源)和10.247.15.39(目标)的所有行,然后将第一个时间戳和最后一个时间戳提取到新列表中 def combine_data(source, dest, time, length): CombinePacket = [(so

我希望比较列表中包含源IP、目标IP、数据包时间和大小的多行。我想在具有相同源IP和目标IP的所有线路之间合并数据。例如,如果有两行或更多行具有相同的源和目标IP,如何组合所有数据。我不想只比较第一行和第二行,我希望匹配列表中具有相同172.217.2.161(源)和10.247.15.39(目标)的所有行,然后将第一个时间戳和最后一个时间戳提取到新列表中

def combine_data(source, dest, time, length):
    CombinePacket = [(source[i], dest[i], time[i], length[i]) for i in range(len(source))]
    NewData = []
    TotalSize = 0

    for i, j in zip(CombinePacket, CombinePacket[1:]):
        if(i[0:2] == j[0:2]):
            TotalSize = TotalSize + int(i[3])+int(j[3])
            data = i[0], i[1], i[2], j[2], TotalSize
            NewData.append(data)
列表包含

[(['172.217.2.161'], ['10.247.15.39'], '13:25:31.044180', 46)]
[(['172.217.2.161'], ['10.247.15.39'], '13:25:31.044190', 29)]
[(['172.217.2.161'], ['10.247.15.39'], '13:25:31.044200' 50)]
输出应该是

[['172.217.2.161'], ['10.247.15.39'],'13:25:31.044180', '13:25:31.044200', 125]

保留一个字典,并在运行时更新这些值,然后将它们转换为列表。假设您有如下列表:

data = [[(['172.217.2.161'], ['10.247.15.39'], '13:25:31.044180', 46)],
 [(['172.217.2.161'], ['10.247.15.39'], '13:25:31.044190', 29)],
 [(['172.217.2.161'], ['10.247.15.39'], '13:25:31.044200' 50)]]
然后:

当然,为了避免中间列表,您可以首先构建此dict而不是列表。另外,如果您不需要,我建议不要为IP设置单例列表,因为它只会导致索引混乱。

我的想法如下:

data = [
(['172.217.2.161'], ['10.247.15.39'], '13:25:31.044180', 46),
(['172.217.2.161'], ['10.247.15.39'], '13:25:31.044190', 29),
(['172.217.2.161'], ['10.247.15.39'], '13:25:31.044200', 50)
]
source = [d[0] for d in data]
dest = [d[1] for d in data]
time = [d[2] for d in data]
length = [d[3] for d in data]

from collections import defaultdict
import datetime
def combine_data(source, dest, time, length):
    CombinePacket = [(source[i], dest[i], time[i], length[i]) for i in range(len(source))]
    NewData = []
    TotalSize = 0

    data = defaultdict(list)
    for package in CombinePacket:
        data[(package[0][0],package[1][0])].append((package[2],package[3]))

    result = []
    for key,value in data.items():
        value = sorted(value,key = lambda x : x[0])
        first_time = value[0][0]
        last_time = value[-1][0]
        sum_length = sum(v[1] for v in value)
        result.append([key[0],key[1],first_time,last_time,sum_length])

    return result

使用key equals
(source,dest)
将数据保存到dict中,然后对获取第一个和最后一个时间戳的时间进行排序,totalsize是该值内所有大小的总和

您可以使用

之后,您可以随心所欲地使用它(甚至将
源代码
目标代码
包装到它们自己的列表中):

输出将是

[[['a'], ['b'], '1', '5', 74], [['a'], ['c'], '3', '4', 106]]

当我尝试执行destIP=data[0][1][0]时,它表示列表索引超出范围。这是否与仅通过destp=data[1]获取列表中的第二个元素相同?可能就是这样。查看代码,在函数中附加数据的方式与在第二个代码块中表示数据的方式不同。您应该能够删除所有字段的第一个[0],因为它在我的列表中循环,其中包含元素0:source、1:dest、2:time:3:length,minTs=dat[2]和maxTs=dat[3]如何获取第一个和最后一个时间戳。它似乎只是抓取了薄荷糖的时间戳和maxts的长度谢谢你的帮助,但是我的老师要求我们使用课堂上教的东西,比如循环,将列表转换为字典等等。除了正则表达式,他不想我们进口任何东西。
from __future__ import print_function

import itertools


def key(packet):
    return packet[0], packet[1]  # source and destination


def do_combine_data(sources, destinations, times, lengths):
    packets = zip(sources, destinations, times, lengths)

    for (packet_source, packet_dest), group in itertools.groupby(
            sorted(packets, key=key), key=key):
        group = list(group)
        packet_sizes = [packet_size for (_, _, _, packet_size) in group]
        packet_times = [at for (_, _, at, _) in group]

        start_time, end_time = [func(packet_times) for func in (min, max)]
        total_size = sum(packet_sizes)

        yield packet_source, packet_dest, start_time, end_time, total_size
def combine_data(source, dest, time, length):
    return [
        ([[s], [d], b, e, t])
        for s, d, b, e, t in do_combine_data(source, dest, time, length)]


def main():
    sources = ["a", "a", "a", "a", "a"]
    destinations = ["b", "b", "b", "c", "c"]
    times = ["1", "2", "5", "3", "4"]
    lengths = [12, 11, 51, 89, 17]
    print(combine_data(sources, destinations, times, lengths))


if __name__ == '__main__':
    main()
[[['a'], ['b'], '1', '5', 74], [['a'], ['c'], '3', '4', 106]]