Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/338.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 重复组上的不同分组累积和_Python_Pandas_Grouping - Fatal编程技术网

Python 重复组上的不同分组累积和

Python 重复组上的不同分组累积和,python,pandas,grouping,Python,Pandas,Grouping,我有以下数据帧: Title start_time Duration Match 0 Item#1 2019-12-13 00:00:00.000 819.01 True 2 Item#1 2019-12-13 00:13:39.010 1205.25 True 4 Item#1 2019-12-13 00:33:44.260 972.80 True 6 Item#1 2019-12-13 0

我有以下数据帧:

    Title             start_time     Duration  Match
0   Item#1 2019-12-13 00:00:00.000    819.01   True   
2   Item#1 2019-12-13 00:13:39.010   1205.25   True   
4   Item#1 2019-12-13 00:33:44.260    972.80   True   
6   Item#1 2019-12-13 00:49:57.060    602.23  False   
9   Item#2 2019-12-13 00:59:59.290   1800.00  False   
14  Item#2 2019-12-13 01:29:59.290    533.79   True   
17  Item#2 2019-12-13 01:38:53.080    537.11   True   
20  Item#2 2019-12-13 01:47:50.190    729.10  False   
24  Item#3 2019-12-13 01:59:59.290    726.97   True   
26  Item#3 2019-12-13 02:12:06.260    569.01   True   
28  Item#3 2019-12-13 02:21:35.270    504.02  False   
32  Item#4 2019-12-13 02:29:59.290   1800.00  False   
36  Item#1 2019-12-13 02:59:59.290    776.98   True   
38  Item#1 2019-12-13 03:12:56.270   1045.81   True   
40  Item#1 2019-12-13 03:30:22.080    988.20   True   
43  Item#1 2019-12-13 03:46:50.280    789.01  False    
我想在duration列上运行一个累积和,到目前为止,我使用了以下代码行:

df.groupby(['Title'])['Duration'].cumsum()


但是,我不想对时间上分开的标题项进行分组。看看上面的例子,我不想把第#1项分成两组。我该怎么做

我认为您需要按连续组分组,这意味着
Item#1
的处理过程类似于两组:

g = df['Title'].ne(df['Title'].shift()).cumsum()
df['new'] = df.groupby(g)['Duration'].cumsum()

print (df)
     Title               start_time  Duration  Match      new
0   Item#1  2019-12-13 00:00:00.000    819.01   True   819.01
2   Item#1  2019-12-13 00:13:39.010   1205.25   True  2024.26
4   Item#1  2019-12-13 00:33:44.260    972.80   True  2997.06
6   Item#1  2019-12-13 00:49:57.060    602.23  False  3599.29
9   Item#2  2019-12-13 00:59:59.290   1800.00  False  1800.00
14  Item#2  2019-12-13 01:29:59.290    533.79   True  2333.79
17  Item#2  2019-12-13 01:38:53.080    537.11   True  2870.90
20  Item#2  2019-12-13 01:47:50.190    729.10  False  3600.00
24  Item#3  2019-12-13 01:59:59.290    726.97   True   726.97
26  Item#3  2019-12-13 02:12:06.260    569.01   True  1295.98
28  Item#3  2019-12-13 02:21:35.270    504.02  False  1800.00
32  Item#4  2019-12-13 02:29:59.290   1800.00  False  1800.00
36  Item#1  2019-12-13 02:59:59.290    776.98   True   776.98
38  Item#1  2019-12-13 03:12:56.270   1045.81   True  1822.79
40  Item#1  2019-12-13 03:30:22.080    988.20   True  2810.99
43  Item#1  2019-12-13 03:46:50.280    789.01  False  3600.00
详细信息

您可以按列比较,对于不相等组,可以按列比较,对于累积组,可以按添加:

print (df[['Title']].assign(shifted = df['Title'].shift(),
                            not_equal=df['Title'].ne(df['Title'].shift()),
                            g = df['Title'].ne(df['Title'].shift()).cumsum()))
     Title shifted  not_equal  g
0   Item#1     NaN       True  1
2   Item#1  Item#1      False  1
4   Item#1  Item#1      False  1
6   Item#1  Item#1      False  1
9   Item#2  Item#1       True  2
14  Item#2  Item#2      False  2
17  Item#2  Item#2      False  2
20  Item#2  Item#2      False  2
24  Item#3  Item#2       True  3
26  Item#3  Item#3      False  3
28  Item#3  Item#3      False  3
32  Item#4  Item#3       True  4
36  Item#1  Item#4       True  5
38  Item#1  Item#1      False  5
40  Item#1  Item#1      False  5
43  Item#1  Item#1      False  5

你能解释得更详细些吗?@SimonBreton-不确定是否理解这个问题,补充了一些解释和细节。是的。听起来不错。你能解释一下
.ne
移位
吗?