Pandas 按多个列分组并格式化结果为熊猫

Pandas 按多个列分组并格式化结果为熊猫,pandas,format,Pandas,Format,这是我的文本文件 No.,Time,Source,Destination,Protocol,Length,Info,SrcPort,DstPort,src_dst_pair 1401,0.397114,145.95.225.186,210.218.218.164,UDP,100,Source port: hsrp Destination port: hsrp,hsrp,1985,"('145.95.225.186', '210.218.218.164')" 8999,3.229111,145.

这是我的文本文件

No.,Time,Source,Destination,Protocol,Length,Info,SrcPort,DstPort,src_dst_pair
1401,0.397114,145.95.225.186,210.218.218.164,UDP,100,Source port: hsrp  Destination port: hsrp,hsrp,1985,"('145.95.225.186', '210.218.218.164')"
8999,3.229111,145.95.225.186,210.218.218.164,UDP,100,Source port: hsrp  Destination port: hsrp,hsrp,1985,"('145.95.225.186', '210.218.218.164')"
18504,5.877098,145.95.225.186,210.218.218.164,UDP,100,Source port: hsrp  Destination port: hsrp,hsrp,1985,"('145.95.225.186', '210.218.218.164')"
23755,8.695843,145.95.225.186,210.218.218.164,UDP,100,Source port: hsrp  Destination port: hsrp,hsrp,1985,"('145.95.225.186', '210.218.218.164')"
28027,11.24121,145.95.225.186,210.218.218.164,UDP,100,Source port: hsrp  Destination port: hsrp,hsrp,1985,"('145.95.225.186', '210.218.218.164')"
33304,14.117213,145.95.225.186,210.218.218.164,UDP,100,Source port: hsrp  Destination port: hsrp,hsrp,1985,"('145.95.225.186', '210.218.218.164')"
700443,222.305789,145.95.41.251,145.95.81.118,UDP,50,Source port: 36477  Destination port: snmp,36477,161,"('145.95.41.251', '145.95.81.118')"
700495,222.351933,145.95.41.251,145.95.81.118,UDP,50,Source port: 36477  Destination port: snmp,36477,161,"('145.95.41.251', '145.95.81.118')"
700496,222.352372,145.95.41.251,145.95.81.118,UDP,50,Source port: 36477  Destination port: snmp,36477,161,"('145.95.41.251', '145.95.81.118')"
708982,225.913385,145.95.41.251,145.95.81.118,UDP,50,Source port: 36477  Destination port: snmp,36477,161,"('145.95.41.251', '145.95.81.118')"
709797,226.130847,145.95.41.251,145.95.81.118,UDP,50,Source port: 36477  Destination port: snmp,36477,161,"('145.95.41.251', '145.95.81.118')"
710340,226.372421,145.95.41.251,145.95.81.118,UDP,50,Source port: 36477  Destination port: snmp,36477,161,"('145.95.41.251', '145.95.81.118')"
我想根据源和目标对数据进行分组,然后:

  • 在组中累积长度列

  • 查找组内最大和最小时间之间的差异

  • 我得到了结果,但我需要按照预期输出中显示的方式对其进行格式化。我还想知道是否有更好的方法来做到这一点

    下面是我的尝试

    import pandas as pd
    
    data = pd.read_csv('simple_udp.csv')
    # getting the accumulated sum for the group
    length = data.groupby(['Source','Destination']).Length.sum()
    # getting the difference in time between the max and min in the group
    time  = data.groupby(['Source','Destination']).Time.max() - data.groupby(['Source','Destination']).Time.min()
    # This is were I have problem. How can i format the result so that 
    # I can get the expected output(shown below) 
    print length, time
    
    预期产出

    Source          Destination       Length  Time
    145.95.225.186  210.218.218.164    600    13.720099
    145.95.41.251   145.95.81.118      300     4.066632
    
    使用
    agg

    data.groupby(['Source','Destination']).agg({'Length': 'sum', 'Time': lambda x: x.max() - x.min()})
    

    我的第一个猜测是

    import pandas as pd
    data = pd.read_csv('simple_udp.csv')
    # Creating a DataFramGroupBy object
    group = data.groupby(['Source','Destination'])
    df_length = g['Length'].sum()
    df_time   = g['Time'].max() - g['Time'].min()
    df = pd.DataFrame([df_length,df_time])
    
    或者,如果您想减少行数,但可读性也较低,请使用
    group
    上的
    agg
    方法