Python Groupby，只计算在特定时间点调用客户的次数_Python_Pandas_Dataframe_Group By_Pandas Groupby

Python Groupby，只计算在特定时间点调用客户的次数

python pandas dataframe

Python Groupby，只计算在特定时间点调用客户的次数,python,pandas,dataframe,group-by,pandas-groupby,Python,Pandas,Dataframe,Group By,Pandas Groupby,我的问题与我的工作密切相关假设我有一个数据帧，它按调用的非唯一id和日期进行排序 rng = pd.date_range('2015-02-24', periods=8, freq='D') df = pd.DataFrame( { "unique_id": ["K0", "K1", "K2", "K3", "K4", "K5", &quo

我的问题与我的工作密切相关

假设我有一个数据帧，它按调用的

非唯一id

和

日期进行排序
rng = pd.date_range('2015-02-24', periods=8, freq='D')
df = pd.DataFrame(
    {
    "unique_id": ["K0", "K1", "K2", "K3", "K4", "K5", "K6","K7"],
    "not_unique_id": ["A000", "A111", "A222", "A222", "A222", "A222", "A222","A333"],
    "date_of_call": rng,
    "customer_reached": [1,0,0,1,1,1,1,1],
    }
    ) 
df.sort_values(['not_unique_id','date_of_call'], inplace=True, ascending=False)
df.reset_index(drop=True, inplace=True) # reset index

df.head(10)


现在我想添加一个新专栏，告诉我过去成功呼叫客户的频率。换句话说：计算过去到达客户的次数，并将结果保存在新列中。对于所提供的示例，结果如下：

如何操作？
如果所有日期时间都是唯一的且已排序的，则可以通过以下方式更改顺序：首先索引，然后使用shift
和“累积求和方式”聚合自定义函数，最后替换缺少的值并将列转换为整数：
df['new'] = (df.iloc[::-1]
               .groupby('not_unique_id')['customer_reached']
               .apply(lambda x: x.shift().cumsum())
               .fillna(0)
               .astype(int))
print (df)
  unique_id not_unique_id date_of_call  customer_reached  new
0        K7          A333   2015-03-03                 1    0
1        K6          A222   2015-03-02                 1    3
2        K5          A222   2015-03-01                 1    2
3        K4          A222   2015-02-28                 1    1
4        K3          A222   2015-02-27                 1    0
5        K2          A222   2015-02-26                 0    0
6        K1          A111   2015-02-25                 0    0
7        K0          A000   2015-02-24                 1    0

或如有可能，变更单：
df.sort_values(['not_unique_id','date_of_call'], inplace=True)


df['new'] = (df.groupby('not_unique_id')['customer_reached']
               .apply(lambda x: x.shift().cumsum())
               .fillna(0)
               .astype(int))
print (df)
  unique_id not_unique_id date_of_call  customer_reached  new
0        K0          A000   2015-02-24                 1    0
1        K1          A111   2015-02-25                 0    0
2        K2          A222   2015-02-26                 0    0
3        K3          A222   2015-02-27                 1    0
4        K4          A222   2015-02-28                 1    1
5        K5          A222   2015-03-01                 1    2
6        K6          A222   2015-03-02                 1    3
7        K7          A333   2015-03-03                 1    0

非常感谢你的帮助！