Python groupby进程csv
我有一份netowrk路由器使用情况的每日档案。我正在尝试查找每个唯一路由器(QIN)的uIN和uOUT列的最大值及其发生时间 我对熊猫和“群比”做了很多,但似乎没有得到我需要的最终结果 以下是数据示例:Python groupby进程csv,python,pandas-groupby,Python,Pandas Groupby,我有一份netowrk路由器使用情况的每日档案。我正在尝试查找每个唯一路由器(QIN)的uIN和uOUT列的最大值及其发生时间 我对熊猫和“群比”做了很多,但似乎没有得到我需要的最终结果 以下是数据示例: Minute QIN uIN uOUT 2/14/2018 16:00 Bundle-Ether1 on (Router1.network.com) 0.10221 0.21195 2/14/2018 16:05 Bundle-Ether1 on (Router1.network.com
Minute QIN uIN uOUT
2/14/2018 16:00 Bundle-Ether1 on (Router1.network.com) 0.10221 0.21195
2/14/2018 16:05 Bundle-Ether1 on (Router1.network.com) 0.089865 0.18722
2/15/2018 16:10 Bundle-Ether1 on (Router1.network.com) 0.07482 0.1705
2/16/2018 16:15 Bundle-Ether1 on (Router1.network.com) 0.09176 0.18846
2/17/2018 16:20 Bundle-Ether1 on (Router1.network.com) 0.11816 0.11785
2/14/2018 16:00 Bundle-Ether1 on (Router2.network.com) 0.08786 0.15235
2/14/2018 16:05 Bundle-Ether1 on (Router2.network.com) 0.07777 0.19253
2/15/2018 16:10 Bundle-Ether1 on (Router2.network.com) 0.07552 0.14232
2/16/2018 16:15 Bundle-Ether1 on (Router2.network.com) 0.1291 0.18758
2/17/2018 16:20 Bundle-Ether1 on (Router2.network.com) 0.13361 0.11747
这是我的密码:
import pandas as pd
df = pd.read_csv('c://router_data.csv')
df['Minute'] = pd.todatetime(df['Minute'])
df.set_index('Minute').groupby('QIN')['uIN'].resample("M").max()
结果:
Bundle-Ether1 on (Router2.network.com) 0.13361
Bundle-Ether1 on (Router1.network.com) 0.11816
我需要的结果是:
2/17/2018 16:20 Bundle-Ether1 on (Router2.network.com) 0.13361
2/17/2018 16:20 Bundle-Ether1 on (Router1.network.com) 0.11816
我建议合并。如有必要,您可以删除“uOUT”
import pandas as pd
df = pd.read_csv('C:\\router.csv', parse_dates=['Minute'], index_col='Minute')
df1 = df.groupby('QIN')['uIN'].max().reset_index()
df1 = df1.merge(df.reset_index(), on=['QIN', 'uIN']).set_index(['Minute', 'QIN'])
Out[191]:
uIN uOUT
Minute QIN
2018-02-17 16:20:00 Bundle-Ether1 on (Router1.network.com) 0.11816 0.11785
Bundle-Ether1 on (Router2.network.com) 0.13361 0.11747
是的,那会有用的。嗯,我想把这件事弄得比需要的更复杂。谢谢您。