Python 将函数应用于一系列特定行_Python_Pandas_Dataframe

Python 将函数应用于一系列特定行

python pandas dataframe

Python 将函数应用于一系列特定行,python,pandas,dataframe,Python,Pandas,Dataframe,我有以下数据帧df： bucket_value is_new_bucket dates 2019-03-07 0 1 2019-03-08 1 0 2019-03-09 2 0 2019-03-10 3 0

我有以下数据帧

df

：

            bucket_value  is_new_bucket
dates                                  
2019-03-07             0              1
2019-03-08             1              0
2019-03-09             2              0
2019-03-10             3              0
2019-03-11             4              0
2019-03-12             5              1
2019-03-13             6              0
2019-03-14             7              1

我想对每个

bucket\u值

数据组应用一个特定的函数（比方说平均函数），其中列

是新的\u bucket

等于零，这样生成的数据帧如下所示：

            mean_values
dates             
2019-03-08     2.5
2019-03-13     6.0

            max_values
dates             
2019-03-11     4.0
2019-03-13     6.0

换言之，将函数应用于

为_new_bucket=0

的连续行，该行将

bucket_值

作为输入

例如，如果我想应用max函数，则生成的数据帧如下所示：

            mean_values
dates             
2019-03-08     2.5
2019-03-13     6.0

            max_values
dates             
2019-03-11     4.0
2019-03-13     6.0

与

过滤器一起使用cumsum

df.reset_index(inplace=True)
s=df.loc[df.is_new_bucket==0].groupby(df.is_new_bucket.cumsum()).agg({'date':'first','bucket_value':['mean','max']})
s
                    date bucket_value    
                   first         mean max
is_new_bucket                            
1             2019-03-08          2.5   4
2             2019-03-13          6.0   6

更新
df.loc[df.loc[df.is_new_bucket==0].groupby(df.is_new_bucket.cumsum())['bucket_value'].idxmax()]
        date  bucket_value  is_new_bucket
4 2019-03-11             4              0
6 2019-03-13             6              0

更新2使用cumsum
创建组密钥Newkey后，您可以根据组密钥执行任何需要的操作
df['Newkey']=df.is_new_bucket.cumsum()
df
        date  bucket_value  is_new_bucket  Newkey
0 2019-03-07             0              1       1
1 2019-03-08             1              0       1
2 2019-03-09             2              0       1
3 2019-03-10             3              0       1
4 2019-03-11             4              0       1
5 2019-03-12             5              1       2
6 2019-03-13             6              0       2
7 2019-03-14             7              1       3

如果您指定这是否是pandas数据帧、pyspark等，您可能会更幸运。我添加了pandas
作为附加标记。谢谢谢谢你的回答！但是，我确实需要max
和mean
具有一致的日期，即第一个存储桶的最大值出现在日期2019-03-11
，而不是2019-03-08
。我想我应该有两次约会？像date\u的意思是和date\u max
？@JejeBelfort是的，你需要两个ok，但是我应该先用什么来代替参数，然后是max的日期呢？@JejeBelfort check idxmax方法：-）好的，谢谢，但是我需要应用的函数比max更复杂。那么我该怎么做呢，要在不使用idxmax
的情况下提取日期
？