Python 熊猫—;匹配最后一行并计算差异
使用如下所示的数据帧:Python 熊猫—;匹配最后一行并计算差异,python,pandas,Python,Pandas,使用如下所示的数据帧: timestamp value 0 2012-01-01 3.0 1 2012-01-05 3.0 2 2012-01-06 6.0 3 2012-01-09 3.0 4 2012-01-31 1.0 5 2012-02-09 3.0 6 2012-02-11 1.0 7 2012-0
timestamp value
0 2012-01-01 3.0
1 2012-01-05 3.0
2 2012-01-06 6.0
3 2012-01-09 3.0
4 2012-01-31 1.0
5 2012-02-09 3.0
6 2012-02-11 1.0
7 2012-02-13 3.0
8 2012-02-15 2.0
9 2012-02-18 5.0
添加一个time\u-since\u-last\u-idential
列的优雅而有效的方法是什么,这样前面的示例将导致:
timestamp value time_since_last_identical
0 2012-01-01 3.0 NaT
1 2012-01-05 3.0 5 days
2 2012-01-06 6.0 NaT
3 2012-01-09 3.0 4 days
4 2012-01-31 1.0 NaT
5 2012-02-09 3.0 31 days
6 2012-02-11 1.0 10 days
7 2012-02-13 3.0 4 days
8 2012-02-15 2.0 NaT
9 2012-02-18 5.0 NaT
问题的重要部分不一定是时间延迟的使用。任何将一个特定行与具有相同值的前一行匹配,并从这两行中计算出某些内容(此处为差异)的解决方案都是有效的
注意:对
应用或基于循环的方法不感兴趣。以下是使用熊猫的解决方案:
这将产生以下输出:
timestamp value time_since_last_identical
0 2012-01-01 3.0 NaT
1 2012-01-05 3.0 4 days
2 2012-01-06 6.0 NaT
3 2012-01-09 3.0 4 days
4 2012-01-31 1.0 NaT
5 2012-02-09 3.0 31 days
6 2012-02-11 1.0 11 days
7 2012-02-13 3.0 4 days
8 2012-02-15 2.0 NaT
9 2012-02-18 5.0 NaT
它与您期望的输出不完全匹配,但我想这是一个惯例问题(例如,是否包括当天)。如果您提供更多详细信息,我们很乐意改进。一个简单、干净、优雅的groupby
就可以做到:
df['time_since_last_identical'] = df.groupby('value').diff()
给出:
timestamp value time_since_last_identical
0 2012-01-01 3.0 NaT
1 2012-01-05 3.0 4 days
2 2012-01-06 6.0 NaT
3 2012-01-09 3.0 4 days
4 2012-01-31 1.0 NaT
5 2012-02-09 3.0 31 days
6 2012-02-11 1.0 11 days
7 2012-02-13 3.0 4 days
8 2012-02-15 2.0 NaT
9 2012-02-18 5.0 NaT
您确定样本输出吗?为什么2012-01-05是从2012-01-01开始的5天,而2012-01-09是从2012-01-05开始的4天?
timestamp value time_since_last_identical
0 2012-01-01 3.0 NaT
1 2012-01-05 3.0 4 days
2 2012-01-06 6.0 NaT
3 2012-01-09 3.0 4 days
4 2012-01-31 1.0 NaT
5 2012-02-09 3.0 31 days
6 2012-02-11 1.0 11 days
7 2012-02-13 3.0 4 days
8 2012-02-15 2.0 NaT
9 2012-02-18 5.0 NaT