仅获取月末日期前一个月的值,否则填写pandas/python

仅获取月末日期前一个月的值,否则填写pandas/python,python,pandas,numpy,data-science,Python,Pandas,Numpy,Data Science,假设我有一个如下所示的数据帧 effective_date,ent_id,value,domain,month_end_date,max_avl_date 2021-03-30,101,61,G,2021-03-30,Y 2021-03-30,103,64,G,2021-03-30,Y 2021-03-30,101,21,R,2021-03-30,Y 2021-03-30,103,46,R,2021-03-30,Y 2021-03-29,101,32,G,

假设我有一个如下所示的数据帧

effective_date,ent_id,value,domain,month_end_date,max_avl_date
    2021-03-30,101,61,G,2021-03-30,Y
    2021-03-30,103,64,G,2021-03-30,Y
    2021-03-30,101,21,R,2021-03-30,Y
    2021-03-30,103,46,R,2021-03-30,Y
    2021-03-29,101,32,G,2021-03-30,N
    2021-03-29,103,39,G,2021-03-30,N
    2021-03-29,101,21,R,2021-03-30,N
    2021-03-29,103,32,R,2021-03-30,N
    2021-03-15,101,54,G,2021-03-30,N
    2021-03-15,103,72,G,2021-03-30,N
    2021-03-15,101,24,R,2021-03-30,N
    2021-03-15,103,22,R,2021-03-30,N
    2021-03-14,101,61,G,2021-03-30,N
    2021-03-14,103,64,G,2021-03-30,N
    2021-03-14,101,31,R,2021-03-30,N
    2021-03-14,103,34,R,2021-03-30,N
    2021-03-13,101,32,G,2021-03-30,N
    2021-03-13,103,39,G,2021-03-30,N
    2021-03-13,101,33,R,2021-03-30,N
    2021-03-13,103,33,R,2021-03-30,N
    2021-03-05,101,82,G,2021-03-30,N
    2021-03-05,103,15,G,2021-03-30,N
    2021-03-05,101,27,R,2021-03-30,N
    2021-03-05,103,19,R,2021-03-30,N
    2021-03-04,101,41,G,2021-03-30,N
    2021-03-04,103,28,G,2021-03-30,N
    2021-03-04,101,94,R,2021-03-30,N
    2021-03-04,103,88,R,2021-03-30,N
    2021-03-03,101,98,G,2021-03-30,N
    2021-03-03,103,72,G,2021-03-30,N
    2021-03-03,101,87,R,2021-03-30,N
    2021-03-03,103,43,R,2021-03-30,N
    2021-03-02,101,53,G,2021-03-30,N
    2021-03-02,103,12,G,2021-03-30,N
    2021-03-02,101,35,R,2021-03-30,N
    2021-03-02,103,42,R,2021-03-30,N
    2021-03-01,101,71,G,2021-03-30,N
    2021-03-01,103,72,G,2021-03-30,N
    2021-03-01,101,31,R,2021-03-30,N
    2021-03-01,103,32,R,2021-03-30,N
    2021-02-28,101,51,G,2021-02-28,Y
    2021-02-28,103,27,G,2021-02-28,Y
    2021-02-28,101,51,R,2021-02-28,Y
    2021-02-28,103,57,R,2021-02-28,Y
    2021-02-27,101,54,G,2021-02-28,N
    2021-02-27,103,72,G,2021-02-28,N
    2021-02-27,101,74,R,2021-02-28,N
    2021-02-27,103,77,R,2021-02-28,N
我想为下一个月的结束日期获取上一个月的结束值,对于当天的剩余时间,它被填充为NaN

输出结果如下:

    effective_date,ent_id,value,domain,month_end_date,max_avl_date,1m_prev_value
2021-03-30,101,61,G,2021-03-30,Y,51
2021-03-30,103,64,G,2021-03-30,Y,27
2021-03-30,101,21,R,2021-03-30,Y,51
2021-03-30,103,46,R,2021-03-30,Y,57
2021-03-29,101,32,G,2021-03-30,N,
2021-03-29,103,39,G,2021-03-30,N,
2021-03-29,101,21,R,2021-03-30,N,
2021-03-29,103,32,R,2021-03-30,N,
2021-03-15,101,54,G,2021-03-30,N,
2021-03-15,103,72,G,2021-03-30,N,
2021-03-15,101,24,R,2021-03-30,N,
2021-03-15,103,22,R,2021-03-30,N,
2021-03-14,101,61,G,2021-03-30,N,
2021-03-14,103,64,G,2021-03-30,N,
2021-03-14,101,31,R,2021-03-30,N,
2021-03-14,103,34,R,2021-03-30,N,
2021-03-13,101,32,G,2021-03-30,N,
2021-03-13,103,39,G,2021-03-30,N,
2021-03-13,101,33,R,2021-03-30,N,
2021-03-13,103,33,R,2021-03-30,N,
2021-03-05,101,82,G,2021-03-30,N,
2021-03-05,103,15,G,2021-03-30,N,
2021-03-05,101,27,R,2021-03-30,N,
2021-03-05,103,19,R,2021-03-30,N,
2021-03-04,101,41,G,2021-03-30,N,
2021-03-04,103,28,G,2021-03-30,N,
2021-03-04,101,94,R,2021-03-30,N,
2021-03-04,103,88,R,2021-03-30,N,
2021-03-03,101,98,G,2021-03-30,N,
2021-03-03,103,72,G,2021-03-30,N,
2021-03-03,101,87,R,2021-03-30,N,
2021-03-03,103,43,R,2021-03-30,N,
2021-03-02,101,53,G,2021-03-30,N,
2021-03-02,103,12,G,2021-03-30,N,
2021-03-02,101,35,R,2021-03-30,N,
2021-03-02,103,42,R,2021-03-30,N,
2021-03-01,101,71,G,2021-03-30,N,
2021-03-01,103,72,G,2021-03-30,N,
2021-03-01,101,31,R,2021-03-30,N,
2021-03-01,103,32,R,2021-03-30,N,
2021-02-28,101,51,G,2021-02-28,Y,
2021-02-28,103,27,G,2021-02-28,Y,
2021-02-28,101,51,R,2021-02-28,Y,
2021-02-28,103,57,R,2021-02-28,Y,
2021-02-27,101,54,G,2021-02-28,N,
2021-02-27,103,72,G,2021-02-28,N,
2021-02-27,101,74,R,2021-02-28,N,
2021-02-27,103,77,R,2021-02-28,N,
我知道如何在sql中实现这一点,因此下面是sql中相同逻辑的副本

DECODE(max_avl_date,Y,LEAD(VALUE,1) OVER (PARTITION BY ent_id,domain order by max_avl_date desc,effective_date desc),null)
我在padas中尝试了同样的逻辑,但没有成功

我的python/pandas工作:

df['1m_prev_value']= np.where((df['max_avl_date']=='Y',(df.sort_values(by=['domain','ent_id','max_avl_date','effective_date'],ascending=False)).groupby(['domain','ent_id']).shift(-1)),np.nan)

嗨,通过这个我已经达到了我的要求

df_ss.loc[df_ss['max_avl_date']=='Y','1m_prev_value']=((df_ss[df_ss['max_avl_date']=='Y']).sort_values(by=['domain','ent_id','max_avl_date','effective_date'],ascending=True).groupby(["domain","max_avl_date",'ent_id'])['value'].shift(1))

我想您正在寻找
shift
@sammywemmy我已经尝试了上面的shift(),但它没有满足我的要求