Python 将具有公共列值的行堆叠到熊猫中
将具有相同时间请求的行放在一起而不是根据错误代码分组的方法是什么。目前我正在获取所有0错误代码报告,然后是所有错误代码1报告,如下所示Python 将具有公共列值的行堆叠到熊猫中,python,pandas,Python,Pandas,将具有相同时间请求的行放在一起而不是根据错误代码分组的方法是什么。目前我正在获取所有0错误代码报告,然后是所有错误代码1报告,如下所示 >>> data.groupby([data['ErrorCode'], pd.Grouper(freq='15T')])['latency'].describe().unstack().reset_index() ErrorCode Time_req count mean
>>> data.groupby([data['ErrorCode'], pd.Grouper(freq='15T')])['latency'].describe().unstack().reset_index()
ErrorCode Time_req count mean std \
0 0 2017-03-08 04:30:00 1 603034.000000 NaN
1 0 2017-03-08 04:45:00 2 174720.000000 38101.741797
2 0 2017-03-08 05:00:00 2 674942.500000 786118.185810
3 0 2017-03-08 07:45:00 10 266653.200000 165867.496817
4 0 2017-03-08 08:00:00 23 208949.304348 124902.942685
5 0 2017-03-08 08:15:00 31 247282.064516 181780.519320
6 0 2017-03-08 08:30:00 35 249332.857143 340084.918015
7 0 2017-03-08 08:45:00 7 250066.000000 195051.871617
8 1 2017-03-08 04:45:00 4 227747.500000 148185.181566
9 1 2017-03-08 05:00:00 2 126633.000000 1337.846030
10 1 2017-03-08 07:45:00 10 421781.900000 464249.118555
11 1 2017-03-08 08:00:00 22 188122.272727 82110.336132
12 1 2017-03-08 08:15:00 32 294896.968750 229498.560222
13 1 2017-03-08 08:30:00 35 501679.628571 1353873.878385
14 1 2017-03-08 08:45:00 6 531606.000000 582290.903396
ErrorCode Time_req count
0 2017-03-08 04:30:00 1
1 NaN NaN NaN
0 2017-03-08 04:45:00 2
1 2017-03-08 04:45:00 4
AND SO ON
但我需要下面这样的替代品
>>> data.groupby([data['ErrorCode'], pd.Grouper(freq='15T')])['latency'].describe().unstack().reset_index()
ErrorCode Time_req count mean std \
0 0 2017-03-08 04:30:00 1 603034.000000 NaN
1 0 2017-03-08 04:45:00 2 174720.000000 38101.741797
2 0 2017-03-08 05:00:00 2 674942.500000 786118.185810
3 0 2017-03-08 07:45:00 10 266653.200000 165867.496817
4 0 2017-03-08 08:00:00 23 208949.304348 124902.942685
5 0 2017-03-08 08:15:00 31 247282.064516 181780.519320
6 0 2017-03-08 08:30:00 35 249332.857143 340084.918015
7 0 2017-03-08 08:45:00 7 250066.000000 195051.871617
8 1 2017-03-08 04:45:00 4 227747.500000 148185.181566
9 1 2017-03-08 05:00:00 2 126633.000000 1337.846030
10 1 2017-03-08 07:45:00 10 421781.900000 464249.118555
11 1 2017-03-08 08:00:00 22 188122.272727 82110.336132
12 1 2017-03-08 08:15:00 32 294896.968750 229498.560222
13 1 2017-03-08 08:30:00 35 501679.628571 1353873.878385
14 1 2017-03-08 08:45:00 6 531606.000000 582290.903396
ErrorCode Time_req count
0 2017-03-08 04:30:00 1
1 NaN NaN NaN
0 2017-03-08 04:45:00 2
1 2017-03-08 04:45:00 4
AND SO ON
我认为您需要添加缺少的值:
df = data.groupby([data['ErrorCode'], pd.Grouper(freq='15T')])['latency'].describe()
df = df.unstack(0).stack(dropna=False).unstack(1).reset_index()
print (df)
Time_req ErrorCode count mean std
0 2017-03-08 04:30:00 0 1.0 603034.000000 NaN
1 2017-03-08 04:30:00 1 NaN NaN NaN
2 2017-03-08 04:45:00 0 2.0 174720.000000 3.810174e+04
3 2017-03-08 04:45:00 1 4.0 227747.500000 1.481852e+05
4 2017-03-08 05:00:00 0 2.0 674942.500000 7.861182e+05
5 2017-03-08 05:00:00 1 2.0 126633.000000 1.337846e+03
6 2017-03-08 07:45:00 0 10.0 266653.200000 1.658675e+05
7 2017-03-08 07:45:00 1 10.0 421781.900000 4.642491e+05
8 2017-03-08 08:00:00 0 23.0 208949.304348 1.249029e+05
9 2017-03-08 08:00:00 1 22.0 188122.272727 8.211034e+04
10 2017-03-08 08:15:00 0 31.0 247282.064516 1.817805e+05
11 2017-03-08 08:15:00 1 32.0 294896.968750 2.294986e+05
12 2017-03-08 08:30:00 0 35.0 249332.857143 3.400849e+05
13 2017-03-08 08:30:00 1 35.0 501679.628571 1.353874e+06
14 2017-03-08 08:45:00 0 7.0 250066.000000 1.950519e+05
15 2017-03-08 08:45:00 1 6.0 531606.000000 5.822909e+05
我认为df.to_csv'my_file.csv',index=False可以完美地工作。不是csv,而是我在python终端中看到的内容。谢谢,我想你需要pd。设置选项“扩展框架报告”,错误检查。我的个人资料中有电子邮件。但请接受我的解决方案-要将答案标记为已接受,请单击答案旁边的复选标记,将其从灰色变为已填写。非常感谢。