Python 如何混合groupby.sum()的结果
我得到了一些防火墙日志并对其进行了分析 我想要混合两个groupby.sum()结果 这是我的密码Python 如何混合groupby.sum()的结果,python,pandas,Python,Pandas,我得到了一些防火墙日志并对其进行了分析 我想要混合两个groupby.sum()结果 这是我的密码 def analysis(data_location, col_name): DATA_OPEN = open(data_location, "r") DATA = DATA_OPEN.readlines() DATA_OPEN.close() df = [] for data in DATA: data = data.rst
def analysis(data_location, col_name):
DATA_OPEN = open(data_location, "r")
DATA = DATA_OPEN.readlines()
DATA_OPEN.close()
df = []
for data in DATA:
data = data.rstrip("\n")
data = data.split()
df.append({"Firewall":data[0], "Gatway":data[1], "DATE":data[2],
"Rule_name":data[3], col_name:data[4], "Count":int(data[5])})
df = pd.DataFrame(df)
df = df[["Firewall", "Gatway", "DATE", "Rule_name", col_name, "Count"]]
df = df.groupby(["Firewall", "Gatway", "DATE", "Rule_name", col_name])
print(df.sum().reset_index())
这个结果呢
DST = analysis("united_temp_fw_dst_log.txt", "dst")
"""the result
Count
Firewall Gatway DATE Rule_name dst
10_1_81_34 vsys1 2019104 allow_Drop 10.1.81.255 34
10.255.63.18 16
103.226.213.30 4
129.146.178.96 282
183.177.72.201 4
183.177.72.202 4
220.133.209.243 4
8.8.8.8 597"""
SRC = analysis("united_temp_fw_src_log.txt", "src")
"""the result
Count
Firewall Gatway DATE Rule_name src
10_1_81_34 vsys1 2019104 allow_Drop 10.1.81.10 8
10.1.81.11 12
10.1.81.115 11
10.1.81.118 3
10.1.81.245 911"""
我想使用[“防火墙”、“关口”、“日期”、“规则名称”]作为索引和列,如下所示
Firewall Gatway DATE Rule_name src count dst count
10_1_81_34 vsys1 2019104 allow_Drop 10.1.81.10 8 10.1.81.255 34
10.1.81.11 12 10.255.63.18 16
10.1.81.115 11 103.226.213.30 4
10.1.81.118 3 129.146.178.96 282
10.1.81.245 911 183.177.72.201 4
183.177.72.202 4
220.133.209.243 4
8.8.8.8 597
我该怎么办?我尝试重置索引()和groupby(),但这不是我想要的答案。一个简单的连接就可以了:
DST.join(SRC)
您能否更改列的名称以避免重复列名(在您的情况下计算)?如果是,我将使用concat功能:
#generate simpler version of your dataframe
df=pd.DataFrame({'Firewall':['10_1_81_34','10_1_81_34','10_1_81_34'],
'Gatway':['vsys1','vsys1','vsys1'],
'dst':['10.1.81.255','10.255.63.18','103.226.213.30'],
'count_dst':[34,16,4]})
df.set_index(['Firewall','Gatway'],inplace=True)
df2=pd.DataFrame({'Firewall':['10_1_81_34','10_1_81_34','10_1_81_34'],
'Gatway':['vsys1','vsys1','vsys1'],
'src':['10.1.81.10','10.1.81.11','10.1.81.115'],
'count_src':[8,12,11]})
df2.set_index(['Firewall','Gatway'],inplace=True)
#Concatenate dataframes along columns
df3=pd.concat([df,df2],axis=1)
使用pd.concat,我得到以下输出:
dst count_dst src count_src
Firewall Gatway
10_1_81_34 vsys1 10.1.81.255 34 10.1.81.10 8
vsys1 10.255.63.18 16 10.1.81.11 12
vsys1 103.226.213.30 4 10.1.81.115 11
编辑以使用不同长度的数据帧:
#generate simpler version of your dataframe
df=pd.DataFrame({'Firewall':['10_1_81_34','10_1_81_34'],
'Gatway':['vsys1','vsys1'],
'dst':['10.1.81.255','10.255.63.18'],
'count_dst':[34,16]})
df2=pd.DataFrame({'Firewall':['10_1_81_34','10_1_81_34','10_1_81_34'],
'Gatway':['vsys1','vsys1','vsys1'],
'src':['10.1.81.10','10.1.81.11','10.1.81.115'],
'count_src':[8,12,11]})
#Concatenate dataframes along columns
df3=pd.concat([df,df2],axis=1)
#Remove duplicated columns
df3.Firewall=df3.Firewall.dropna(axis=1)
df3.Gatway=df3.Gatway.dropna(axis=1)
df3=df3.loc[:,~df3.columns.duplicated()]
#set index
df3.set_index(['Firewall','Gatway'],inplace=True)
这是输出:
dst count_dst src count_src
Firewall Gatway
10_1_81_34 vsys1 10.1.81.255 34.0 10.1.81.10 8
vsys1 10.255.63.18 16.0 10.1.81.11 12
vsys1 NaN NaN 10.1.81.115 11
谢谢你的回答,我刚刚尝试了这个解决方案,但是“src”值是重复的。谢谢你的想法!但它不起作用,因为如果数量不同,我们会得到“ValueError:无法处理非唯一的多索引!”我编辑了我的答案,如果在合并2个数据帧后设置索引,它应该起作用。然而,可能有一种更优雅的方法来实现您所寻求的结果。thx~是的,它可以工作,但在实际数据中,如果有规则名称和多个规则,我们会遇到相同的问题,我已经想到了一种解决方案,但他并不优雅。df._stat_axis.values.tolist()并使用loc get value,然后按dict分类