Python Pandas Dataframe-如何获得按值分组的滚动和?

Python Pandas Dataframe-如何获得按值分组的滚动和?,python,pandas,Python,Pandas,使用一些新冠病毒-19数据,我应该如何计算14天的病例数滚动总和 以下是我现有的代码: import pandas as pd import matplotlib.pyplot as plt url = 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv' all_counties = pd.read_csv(url, dtype={"fips": str}) all

使用一些新冠病毒-19数据,我应该如何计算14天的病例数滚动总和

以下是我现有的代码:

import pandas as pd
import matplotlib.pyplot as plt

url = 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv'
all_counties = pd.read_csv(url, dtype={"fips": str})
all_counties.date = pd.to_datetime(all_counties.date)
oregon = all_counties.loc[all_counties['state'] == 'Oregon']

oregon.set_index('date', inplace=True)
oregon['delta']=oregon.groupby(['state','county'])['cases'].diff().fillna(0)
oregon.head()
此代码计算每日增量案例计数(感谢前面问题的答案)

下一步是计算滚动14天的总和,我尝试了以下步骤:

oregon['rolling_14']=oregon.groupby(['state','county'])['delta'].rolling(min_periods=1, window=14).sum()
不幸的是,它失败了。如果我有一个县的数据,这是可行的:

county['rolling_14']=county.rolling(min_periods=1, window=14).sum()
但不幸的是,当数据帧包含多个县的数据集时,这是不可行的。

groupby().rolling()有两个额外的索引级别,即
state,country
。移除它们,任务就会生效

oregon['rolling_14'] = (oregon.groupby(['state','county'])['delta']
                            .rolling(min_periods=1, window=14).sum()
                            .reset_index(level=['state','county'])
                       )
此外,由于您正在使用多个groupby函数,因此lazy groupby将有助于稍微改进运行时/代码库:

groups = oregon.groupby(['state','county'])
oregon['delta'] = groups['cases'].diff().fillna(0)

oregon['rolling_14'] = (groups['delta']
                            .rolling(min_periods=1, window=14).sum()
                            .reset_index(level=['state','county'])
                       )