Pandas 查找中最新快照的计数总和
我有一个下面的df,希望通过拍摄最后一张快照来计算组的总和:Pandas 查找中最新快照的计数总和,pandas,Pandas,我有一个下面的df,希望通过拍摄最后一张快照来计算组的总和: product desc id month_year count car ford 1 2019-01 20 car ford 1 2019-02 20 car ford 1 2019-04 40 car ford 2 2019-04 30 car ford 2 2019-04 30 car ford 2
product desc id month_year count
car ford 1 2019-01 20
car ford 1 2019-02 20
car ford 1 2019-04 40
car ford 2 2019-04 30
car ford 2 2019-04 30
car ford 2 2019-04 60
并将输出查找为
df.groupby(["product", "desc"]. ?
product desc count_overall
car ford 100
对于id 1,按Dec month_UYear(按Dec month_UYear)取最后一个计数顺序,即40,同样地,对于2,则为60,使总数为100IIUC
我们可以使用groupby
和agg
以及sort\u值来获取计数的最后一次出现
首先,我们将您的日期转换为适当的日期时间
df['month_year'] = pd.to_datetime(df['month_year'],format='%Y-%m')
new_df = df.sort_values("count").groupby(["product", "desc", "id"]).agg(
date_max=("month_year", max), count=("count", "last")
)
从这里你可以做一个简单的求和
print(new_df.groupby(level=[0,1]).sum())
count
product desc
car ford 100
IIUC您还需要id来获取计数的
last
值
s=df.groupby(["product", "desc","id"])['count'].last().sum(level=[0,1]).to_frame('count_overall').reset_index()
Out[171]:
product desc count_overall
0 car ford 100
如果数据已按日期排序,您也可以使用
drop_duplicates
:
(df.drop_duplicates(['product','desc','id'], keep='last')
.groupby(['product','desc'])['count'].sum()
)
输出:
product desc
car ford 100
Name: count, dtype: int64
product desc
car ford 100
Name: count, dtype: int64