Python 基于ID列减去行-行
我有一个数据框,看起来像这样:Python 基于ID列减去行-行,python,pandas,numpy,pandas-groupby,data-analysis,Python,Pandas,Numpy,Pandas Groupby,Data Analysis,我有一个数据框,看起来像这样: UserId Date_watched Days_not_watch 1 2010-09-11 5 1 2010-10-01 8 1 2010-10-28 1 2 2010-05-06 12 2 2010-05-18 5 3 2010-08-09 10
UserId Date_watched Days_not_watch
1 2010-09-11 5
1 2010-10-01 8
1 2010-10-28 1
2 2010-05-06 12
2 2010-05-18 5
3 2010-08-09 10
3 2010-09-25 5
UserId Date_watched Days_not_watch Gap(2nd watch_date - 1st watch_date - days_not_watch)
1 2010-09-11 5 0 (First gap will be 0 for all users)
1 2010-10-01 8 15 (11th Sept+5=16th Sept; 1st Oct - 16th Sept=15days)
1 2010-10-28 1 9
2 2010-05-06 12 0
2 2010-05-18 5 0 (because 6th May+12 days=18th May)
3 2010-08-09 10 0
3 2010-09-25 4 36
3 2010-10-01 2 2
我想找出用户作为间隔给出的天数,因此我想为每个用户的每行设置一列,我的数据框应该如下所示:
UserId Date_watched Days_not_watch
1 2010-09-11 5
1 2010-10-01 8
1 2010-10-28 1
2 2010-05-06 12
2 2010-05-18 5
3 2010-08-09 10
3 2010-09-25 5
UserId Date_watched Days_not_watch Gap(2nd watch_date - 1st watch_date - days_not_watch)
1 2010-09-11 5 0 (First gap will be 0 for all users)
1 2010-10-01 8 15 (11th Sept+5=16th Sept; 1st Oct - 16th Sept=15days)
1 2010-10-28 1 9
2 2010-05-06 12 0
2 2010-05-18 5 0 (because 6th May+12 days=18th May)
3 2010-08-09 10 0
3 2010-09-25 4 36
3 2010-10-01 2 2
我已经提到了计算数据帧列名旁边间隙的公式。这里有一种使用
groupby
+shift
的方法:
# sort by date first
df['Date_watched'] = pd.to_datetime(df['Date_watched'])
df = df.sort_values(['UserId', 'Date_watched'])
# calculate groupwise start dates, shifted
grp = df.groupby('UserId')
starts = grp['Date_watched'].shift() + \
pd.to_timedelta(grp['Days_not_watch'].shift(), unit='d')
# calculate timedelta gaps
df['Gap'] = (df['Date_watched'] - starts).fillna(pd.Timedelta(0))
# convert to days and then integers
df['Gap'] = (df['Gap'] / pd.Timedelta('1 day')).astype(int)
print(df)
UserId Date_watched Days_not_watch Gap
0 1 2010-09-11 5 0
1 1 2010-10-01 8 15
2 1 2010-10-28 1 19
3 2 2010-05-06 12 0
4 2 2010-05-18 5 0
5 3 2010-08-09 10 0
6 3 2010-09-25 5 37
有一个问题,我的日期没有排序,在这里我按升序发布它们,但实际上没有排序,当我对其排序时,最终的数据帧索引不匹配。如何修复此问题?@DebadriDutta,然后首先按用户和日期排序,请参阅更新。我的解决方案在任何地方都不使用数据帧索引。我已将其排序。谢谢你的回答,它工作得很好:)