Python 如何在pandas中对数据帧的子组执行操作?
我试图根据周数计算数据帧特定子集的百分比变化。数据框如下所示:Python 如何在pandas中对数据帧的子组执行操作?,python,pandas,dataframe,Python,Pandas,Dataframe,我试图根据周数计算数据帧特定子集的百分比变化。数据框如下所示: ref_dt week_name county_name state_name county_fips_code cmi 0 2020-01-01 2020-W01 Broward Florida 12011 3.651278 1 2020-01-02 2020-W01. Broward Florida 12011
ref_dt week_name county_name state_name county_fips_code cmi
0 2020-01-01 2020-W01 Broward Florida 12011 3.651278
1 2020-01-02 2020-W01. Broward Florida 12011 3.851842
2 2020-01-03 2020-W01. Broward Florida 12011 3.868523
3 2020-01-04 2020-W01. Broward Florida 12011 3.748446
4 2020-01-05 2020-W01. Broward Florida 12011 3.650769
5 2020-01-06 2020-W02. Broward Florida 12011 3.878860
6 2020-01-07 2020-W02. Broward Florida 12011 3.899171
7 2020-01-08 2020-W02. Broward Florida 12011 3.907816
8 2020-01-09 2020-W02. Broward Florida 12011 3.913623
9 2020-01-10 2020-W02. Broward Florida 12011 3.919010
它包含佛罗里达州每个县的信息(此处仅显示以布劳沃德为例的子集)以及在cmi列中计算的流动性指数。百分比变化是通过比较一周中某一天(ref_dt)的移动与一周中同一天的平均值来计算的。这是我在《熊猫》中所做的一个子选择之后week1和Browward的一个例子
df = counties[counties['county_name']=='Broward']
week1 = df[df['week_name'] == '2020-W01']
cmi_mean = week1['cmi'].mean()
week1['percent_change'] = week1['cmi']/cmi_mean * 100
csv的最终输出如下所示(我删除了state和country_fips_代码):
我希望每周(1到14天)为每个县应用相同的逻辑。最好的方法是什么?我是否需要使用pivot或stack重塑数据框,并根据其周名称为每周生成列,或者我是否可以计算数据框当前结构的百分比变化
注:必须每周计算每个平均值 使用
df.groupby
和transform
并让Panda使用索引处理计算对齐:
df['percent_change'] = df['cmi'] / df.groupby(['county_name', 'week_name'])['cmi'].transform('mean') * 100
输出:
ref_dt week_name county_name state_name county_fips_code cmi percent_change
0 2020-01-01 2020-W01. Broward Florida 12011 3.651278 97.259220
1 2020-01-02 2020-W01. Broward Florida 12011 3.851842 102.601650
2 2020-01-03 2020-W01. Broward Florida 12011 3.868523 103.045982
3 2020-01-04 2020-W01. Broward Florida 12011 3.748446 99.847487
4 2020-01-05 2020-W01. Broward Florida 12011 3.650769 97.245661
5 2020-01-06 2020-W02. Broward Florida 12011 3.878860 99.363782
6 2020-01-07 2020-W02. Broward Florida 12011 3.899171 99.884084
7 2020-01-08 2020-W02. Broward Florida 12011 3.907816 100.105541
8 2020-01-09 2020-W02. Broward Florida 12011 3.913623 100.254297
9 2020-01-10 2020-W02. Broward Florida 12011 3.919010 100.392295
将
df.groupby
与transform
一起使用,并让Panda使用索引处理计算对齐:
df['percent_change'] = df['cmi'] / df.groupby(['county_name', 'week_name'])['cmi'].transform('mean') * 100
输出:
ref_dt week_name county_name state_name county_fips_code cmi percent_change
0 2020-01-01 2020-W01. Broward Florida 12011 3.651278 97.259220
1 2020-01-02 2020-W01. Broward Florida 12011 3.851842 102.601650
2 2020-01-03 2020-W01. Broward Florida 12011 3.868523 103.045982
3 2020-01-04 2020-W01. Broward Florida 12011 3.748446 99.847487
4 2020-01-05 2020-W01. Broward Florida 12011 3.650769 97.245661
5 2020-01-06 2020-W02. Broward Florida 12011 3.878860 99.363782
6 2020-01-07 2020-W02. Broward Florida 12011 3.899171 99.884084
7 2020-01-08 2020-W02. Broward Florida 12011 3.907816 100.105541
8 2020-01-09 2020-W02. Broward Florida 12011 3.913623 100.254297
9 2020-01-10 2020-W02. Broward Florida 12011 3.919010 100.392295
分组方式为“周名”和“县名”。分组方式为“周名”和“县名”。谢谢你,斯科特!我花了一上午的时间试着把它弄碎。我需要做得更好,练习链接语句@阿隆索:不客气,自2016年以来,我几乎每天都在练习。:)谢谢你,斯科特!我花了一上午的时间试着把它弄碎。我需要做得更好,练习链接语句@阿隆索:不客气,自2016年以来,我几乎每天都在练习。:)