Python Pandas Dataframe-如何获得按值分组的滚动和?
使用一些新冠病毒-19数据,我应该如何计算14天的病例数滚动总和 以下是我现有的代码:Python Pandas Dataframe-如何获得按值分组的滚动和?,python,pandas,Python,Pandas,使用一些新冠病毒-19数据,我应该如何计算14天的病例数滚动总和 以下是我现有的代码: import pandas as pd import matplotlib.pyplot as plt url = 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv' all_counties = pd.read_csv(url, dtype={"fips": str}) all
import pandas as pd
import matplotlib.pyplot as plt
url = 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv'
all_counties = pd.read_csv(url, dtype={"fips": str})
all_counties.date = pd.to_datetime(all_counties.date)
oregon = all_counties.loc[all_counties['state'] == 'Oregon']
oregon.set_index('date', inplace=True)
oregon['delta']=oregon.groupby(['state','county'])['cases'].diff().fillna(0)
oregon.head()
此代码计算每日增量案例计数(感谢前面问题的答案)
下一步是计算滚动14天的总和,我尝试了以下步骤:
oregon['rolling_14']=oregon.groupby(['state','county'])['delta'].rolling(min_periods=1, window=14).sum()
不幸的是,它失败了。如果我有一个县的数据,这是可行的:
county['rolling_14']=county.rolling(min_periods=1, window=14).sum()
但不幸的是,当数据帧包含多个县的数据集时,这是不可行的。groupby().rolling()有两个额外的索引级别,即state,country
。移除它们,任务就会生效
oregon['rolling_14'] = (oregon.groupby(['state','county'])['delta']
.rolling(min_periods=1, window=14).sum()
.reset_index(level=['state','county'])
)
此外,由于您正在使用多个groupby函数,因此lazy groupby将有助于稍微改进运行时/代码库:
groups = oregon.groupby(['state','county'])
oregon['delta'] = groups['cases'].diff().fillna(0)
oregon['rolling_14'] = (groups['delta']
.rolling(min_periods=1, window=14).sum()
.reset_index(level=['state','county'])
)