Pandas 查找中最新快照的计数总和

Pandas 查找中最新快照的计数总和,pandas,Pandas,我有一个下面的df,希望通过拍摄最后一张快照来计算组的总和: product desc id month_year count car ford 1 2019-01 20 car ford 1 2019-02 20 car ford 1 2019-04 40 car ford 2 2019-04 30 car ford 2 2019-04 30 car ford 2

我有一个下面的df,希望通过拍摄最后一张快照来计算组的总和:

product  desc   id month_year count

car      ford   1 2019-01     20
car      ford   1 2019-02     20
car      ford   1 2019-04     40
car      ford   2 2019-04     30
car      ford   2 2019-04     30
car      ford   2 2019-04     60
并将输出查找为

df.groupby(["product", "desc"]. ?

product  desc  count_overall
car      ford  100
对于id 1,按Dec month_UYear(按Dec month_UYear)取最后一个计数顺序,即40,同样地,对于2,则为60,使总数为100

IIUC

我们可以使用
groupby
agg
以及sort\u值来获取计数的最后一次出现

首先,我们将您的日期转换为适当的日期时间

df['month_year'] = pd.to_datetime(df['month_year'],format='%Y-%m')

new_df = df.sort_values("count").groupby(["product", "desc", "id"]).agg(
    date_max=("month_year", max), count=("count", "last")
)

从这里你可以做一个简单的求和

print(new_df.groupby(level=[0,1]).sum())


              count
product desc       
car     ford    100

IIUC您还需要id来获取计数的
last

s=df.groupby(["product", "desc","id"])['count'].last().sum(level=[0,1]).to_frame('count_overall').reset_index()
Out[171]: 
  product  desc  count_overall
0     car  ford            100

如果数据已按日期排序,您也可以使用
drop_duplicates

(df.drop_duplicates(['product','desc','id'], keep='last')
   .groupby(['product','desc'])['count'].sum()
)
输出:

product  desc
car      ford    100
Name: count, dtype: int64
product  desc
car      ford    100
Name: count, dtype: int64