Python 如何计算pandas中的分类timeseries数据
这周我决定潜入一点熊猫。我有一个包含历史IRC日志的pandas数据框架,如下所示:Python 如何计算pandas中的分类timeseries数据,python,pandas,Python,Pandas,这周我决定潜入一点熊猫。我有一个包含历史IRC日志的pandas数据框架,如下所示: timestamp action nick message 2005-11-04 01:44:33 False hack-cclub lex, hey! 2005-11-04 01:44:43 False hack-cclub lol, yea thats broke 2005-11-04 01:44:56 False lex Slas
timestamp action nick message
2005-11-04 01:44:33 False hack-cclub lex, hey!
2005-11-04 01:44:43 False hack-cclub lol, yea thats broke
2005-11-04 01:44:56 False lex Slashdot - Updated 2005-11-04 00:23:00 | Micro...
2005-11-04 01:44:56 False hack-cclub lex slashdot
2005-11-04 01:45:12 False lex port 666 is doom - doom Id Software (or mdqs o..
2005-11-04 01:45:12 False hack-cclub lex, port 666
2005-11-04 01:45:21 False hitokiri lex, port 23485
2005-11-04 01:45:45 False hitokiri lex, port 1024
2005-11-04 01:45:46 True hack-cclub slaps lex around with a wet fish
df['nick'].value_counts()[:25]
hack-cclub lex hitokiri
1 0 0
2 0 0
2 1 0
3 1 0
3 2 0
4 2 0
4 2 1
4 2 2
5 2 2
hack-cclub lex hitokiri
1 2 2
1 2 2
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 2
1 2 2
大约有550万行,我正在尝试做一些基本的可视化,比如排名前25位之类的。我知道我可以得到像这样的前25个刻痕:
timestamp action nick message
2005-11-04 01:44:33 False hack-cclub lex, hey!
2005-11-04 01:44:43 False hack-cclub lol, yea thats broke
2005-11-04 01:44:56 False lex Slashdot - Updated 2005-11-04 00:23:00 | Micro...
2005-11-04 01:44:56 False hack-cclub lex slashdot
2005-11-04 01:45:12 False lex port 666 is doom - doom Id Software (or mdqs o..
2005-11-04 01:45:12 False hack-cclub lex, port 666
2005-11-04 01:45:21 False hitokiri lex, port 23485
2005-11-04 01:45:45 False hitokiri lex, port 1024
2005-11-04 01:45:46 True hack-cclub slaps lex around with a wet fish
df['nick'].value_counts()[:25]
hack-cclub lex hitokiri
1 0 0
2 0 0
2 1 0
3 1 0
3 2 0
4 2 0
4 2 1
4 2 2
5 2 2
hack-cclub lex hitokiri
1 2 2
1 2 2
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 2
1 2 2
我想要的是这样的滚动计数:
timestamp action nick message
2005-11-04 01:44:33 False hack-cclub lex, hey!
2005-11-04 01:44:43 False hack-cclub lol, yea thats broke
2005-11-04 01:44:56 False lex Slashdot - Updated 2005-11-04 00:23:00 | Micro...
2005-11-04 01:44:56 False hack-cclub lex slashdot
2005-11-04 01:45:12 False lex port 666 is doom - doom Id Software (or mdqs o..
2005-11-04 01:45:12 False hack-cclub lex, port 666
2005-11-04 01:45:21 False hitokiri lex, port 23485
2005-11-04 01:45:45 False hitokiri lex, port 1024
2005-11-04 01:45:46 True hack-cclub slaps lex around with a wet fish
df['nick'].value_counts()[:25]
hack-cclub lex hitokiri
1 0 0
2 0 0
2 1 0
3 1 0
3 2 0
4 2 0
4 2 1
4 2 2
5 2 2
hack-cclub lex hitokiri
1 2 2
1 2 2
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 2
1 2 2
因此,我可以绘制一个从时间开始的前25个缺口的信息面积图。我知道我可以通过迭代整个数据帧并保持计数来做到这一点,但因为这样做的全部目的是学习使用熊猫,我希望有一种更惯用的方法来做到这一点。拥有相同的数据也很好,但是使用排名,而不是像这样运行计数:
timestamp action nick message
2005-11-04 01:44:33 False hack-cclub lex, hey!
2005-11-04 01:44:43 False hack-cclub lol, yea thats broke
2005-11-04 01:44:56 False lex Slashdot - Updated 2005-11-04 00:23:00 | Micro...
2005-11-04 01:44:56 False hack-cclub lex slashdot
2005-11-04 01:45:12 False lex port 666 is doom - doom Id Software (or mdqs o..
2005-11-04 01:45:12 False hack-cclub lex, port 666
2005-11-04 01:45:21 False hitokiri lex, port 23485
2005-11-04 01:45:45 False hitokiri lex, port 1024
2005-11-04 01:45:46 True hack-cclub slaps lex around with a wet fish
df['nick'].value_counts()[:25]
hack-cclub lex hitokiri
1 0 0
2 0 0
2 1 0
3 1 0
3 2 0
4 2 0
4 2 1
4 2 2
5 2 2
hack-cclub lex hitokiri
1 2 2
1 2 2
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 2
1 2 2
IIUC您需要并且:
我不明白最后一列数据帧。你能解释一下吗?