Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/298.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 聚合大数据帧_Python_Python 3.x_Pandas_Dataframe - Fatal编程技术网

Python 聚合大数据帧

Python 聚合大数据帧,python,python-3.x,pandas,dataframe,Python,Python 3.x,Pandas,Dataframe,我有一个数据框,其中列出了系统ID以及在特定日期发生的特定类型和类别的报警数量: df SystemID AlarmClass AlarmType Day AlarmCount 0 95EE8B57-6BE9-4175-B901-B6B3BEE1844D Service Unexpected Status 06/08/2018 3 1

我有一个数据框,其中列出了系统ID以及在特定日期发生的特定类型和类别的报警数量:

df
                               SystemID         AlarmClass          AlarmType         Day  AlarmCount
0  95EE8B57-6BE9-4175-B901-B6B3BEE1844D            Service  Unexpected Status  06/08/2018           3
1  95EE8B57-6BE9-4175-B901-B6B3BEE1844D            Service  Unexpected Status  05/08/2018           2
2  95EE8B57-6BE9-4175-B901-B6B3BEE1844D            Service  Unexpected Status  06/08/2018           1
3  5F891F03-3114-4E62-9A7D-CD2A04061364            Service  Unexpected Status  04/08/2018           2
4  5F891F03-3114-4E62-9A7D-CD2A04061364            Service  Unexpected Status  04/08/2018           2
5  5F891F03-3114-4E62-9A7D-CD2A04061364  Event Log Monitor    Application Log  05/08/2018           2
我想通过对SystemID和Day进行分组并列出每种类型和类别的报警数量来聚合这些数据。上述数据帧的结果如下所示:

                               SystemID         Day  AlarmClass-S  AlarmClass-ELM  AlarmType-US  AlarmType-AL
0  95EE8B57-6BE9-4175-B901-B6B3BEE1844D  06/08/2018             4               0             4             0
1  95EE8B57-6BE9-4175-B901-B6B3BEE1844D  05/08/2018             2               0             2             0
2  5F891F03-3114-4E62-9A7D-CD2A04061364  04/08/2018             4               0             4             0
3  5F891F03-3114-4E62-9A7D-CD2A04061364  05/08/2018             0               2             0             2

如何最有效地做到这一点?数据帧有数百万条记录

为了提高性能,您可以为每个AlarmClass和AlarmType透视数据,然后连接结果

i = df.pivot_table(index=['SystemID', 'Day'], 
                   columns='AlarmClass', 
                   values='AlarmCount', 
                   aggfunc='sum', 
                   fill_value=0)
j = df.pivot_table(index=['SystemID', 'Day'], 
                   columns='AlarmType', 
                   values='AlarmCount', 
                   aggfunc='sum', 
                   fill_value=0)

i.columns = i.columns.map(lambda x: 'AlarmClass-' + ''.join(y[0] for y in x.split()))
j.columns = j.columns.map(lambda x: 'AlarmType-' + ''.join(y[0] for y in x.split()))

df = pd.concat([i, j], axis=1).reset_index()

print(df)
                               SystemID         Day  AlarmClass-ELM   \
0  5F891F03-3114-4E62-9A7D-CD2A04061364  04/08/2018               0             
1  5F891F03-3114-4E62-9A7D-CD2A04061364  05/08/2018               2             
2  95EE8B57-6BE9-4175-B901-B6B3BEE1844D  05/08/2018               0             
3  95EE8B57-6BE9-4175-B901-B6B3BEE1844D  06/08/2018               0             

AlarmClass-S  AlarmType-AL  AlarmType-US
           4             0             4
           0             2             0
           2             0             2
           4             0             4