Python 在数据帧字典上生成平均值
我有以下数据帧:Python 在数据帧字典上生成平均值,python,pandas,dataframe,Python,Pandas,Dataframe,我有以下数据帧: phreatic_level_l2n1_28w_df.head() Fecha Hora PORVL2N1 # PORVLxNx column change their name in each data frame 0 2012-01-12 01:37:47 0.65 1 2012-01-12 02:37:45 0.65 2 2012-01-12 03:37:50 0.64 3 2012-01-12 04:
phreatic_level_l2n1_28w_df.head()
Fecha Hora PORVL2N1 # PORVLxNx column change their name in each data frame
0 2012-01-12 01:37:47 0.65
1 2012-01-12 02:37:45 0.65
2 2012-01-12 03:37:50 0.64
3 2012-01-12 04:37:44 0.63
4 2012-01-12 05:37:45 0.61
phreatic_level_l2n2_28w_df.head()
Fecha Hora PORVL2N2 # PORVLxNx column change their name in each data frame
0 2018-01-12 01:58:22 0.71
1 2018-01-12 02:58:22 0.71
2 2018-01-12 03:58:23 0.71
3 2018-01-12 04:58:23 0.71
4 2018-01-12 05:58:24 0.71
phreatic_level_l4n1_28w_df.head()
Fecha Hora PORVL4N1 # PORVLxNx column change their name in each data frame
0 2018-01-12 01:28:49 0.96
1 2018-01-12 02:28:49 0.96
2 2018-01-12 03:28:50 0.96
3 2018-01-12 04:28:52 0.95
4 2018-01-12 05:28:48 0.94
因此,依次有25个潜水_level_l24n2_28w_df类型的数据帧
每行包含PORVLxNx列上的数据帧,从2018-01-12到2018-08-03,日期范围Fecha列上每天都有值,每天都有许多PORVLxNx列的值
我的目标是获取每个数据帧并生成每天的平均PORVLxNx,如下所示:
Fecha PORVL2N1
0 2018-01-12 0.519130
1 2018-01-13 0.138750
2 2018-01-14 0.175417
3 2018-01-15 0.111667
4 2018-01-16 0.291250
我有以下方法:
我将数据帧放在dict中,并使用字符串引用它:
dfs = {
'phreatic_level_l2n1_28w_df': phreatic_level_l2n1_28w_df,
# FOR THE MOMENT I ONLY TEST with the first dataframe
# 'phreatic_level_l2n2_28w_df': phreatic_level_l2n2_28w_df,
# 'phreatic_level_l4n1_28w_df': phreatic_level_l4n1_28w_df,
# 'phreatic_level_l5n1_28w_df': phreatic_level_l5n1_28w_df,
# 'phreatic_level_l6n1_28w_df': phreatic_level_l6n1_28w_df,
# 'phreatic_level_l7n1_28w_df': phreatic_level_l7n1_28w_df,
# 'phreatic_level_l8n1_28w_df': phreatic_level_l8n1_28w_df,
# 'phreatic_level_l9n1_28w_df': phreatic_level_l9n1_28w_df,
# 'phreatic_level_l10n1_28w_df': phreatic_level_l10n1_28w_df,
# 'phreatic_level_l13n1_28w_df': phreatic_level_l13n1_28w_df,
# 'phreatic_level_l14n1_28w_df': phreatic_level_l14n1_28w_df,
# 'phreatic_level_l15n1_28w_df': phreatic_level_l15n1_28w_df,
# 'phreatic_level_l16n1_28w_df': phreatic_level_l16n1_28w_df,
# 'phreatic_level_l16n2_28w_df': phreatic_level_l16n2_28w_df,
# 'phreatic_level_l18n1_28w_df': phreatic_level_l18n1_28w_df,
# 'phreatic_level_l18n2_28w_df': phreatic_level_l18n2_28w_df,
# 'phreatic_level_l18n3_28w_df': phreatic_level_l18n3_28w_df,
# 'phreatic_level_l18n4_28w_df': phreatic_level_l18n4_28w_df,
# 'phreatic_level_l21n1_28w_df': phreatic_level_l21n1_28w_df,
# 'phreatic_level_l21n2_28w_df': phreatic_level_l21n2_28w_df,
# 'phreatic_level_l21n3_28w_df': phreatic_level_l21n3_28w_df,
# 'phreatic_level_l21n4_28w_df': phreatic_level_l21n4_28w_df,
# 'phreatic_level_l21n5_28w_df': phreatic_level_l21n5_28w_df,
# 'phreatic_level_l24n1_28w_df': phreatic_level_l24n1_28w_df,
# 'phreatic_level_l24n2_28w_df': phreatic_level_l24n2_28w_df
}
我在此刻迭代数据帧,就在潜水层上
我每天的l2_n1_平均值的输出是:
在这里之前,我的想法是可行的
当我想应用此解决方案时,很可能是dfs字典中包含的其他数据帧没有更优的解决方案
dfs = {
'phreatic_level_l2n1_28w_df': phreatic_level_l2n1_28w_df,
'phreatic_level_l2n2_28w_df': phreatic_level_l2n2_28w_df,
# I've added the L2N2 phreatic_level_l2n2_28w_df dataframe item
}
我又重复了一遍
在我的输出中,找不到PORVL2N2
----------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-161-fbe6eaf8a824> in <module>()
11 print(phreatic_level_l2_n1_average_per_day.tail())
12 # To N2
---> 13 phreatic_level_l2_n2_average_per_day = (df.groupby(pd.Grouper(key='Fecha', freq='D'))['PORVL{}N{}'.format(i,i)].mean().reset_index())
14 phreatic_level_l2_n2_average_per_day.to_csv('L{}N{}_average_per-day.csv'.format(i,i), sep=',', header=True, index=False)
15
~/anaconda3/envs/sioma/lib/python3.6/site-packages/pandas/core/base.py in __getitem__(self, key)
265 else:
266 if key not in self.obj:
--> 267 raise KeyError("Column not found: {key}".format(key=key))
268 return self._gotitem(key, ndim=1)
269
KeyError: 'Column not found: PORVL2N2'
是否有可能,在我的迭代中,我覆盖了数据帧或发生了其他事情?您的数据帧似乎具有良好且一致的结构,因此您可以做的是获取您希望PORVLxNy使用df.columns和最后一个元素[-1]获取平均值的列的名称。然后,要将结果保存到具有正确名称的csv文件中,只需保留列名称的最后4个字符:
for name, df in dfs.items():
df['Fecha'] = pd.to_datetime(df['Fecha'])
col = df.columns[-1] #here col = PORVLxNx with the right x depending on df
# no need of loop for anymore
lx_ny_average_per_day = (df.groupby(pd.Grouper(key='Fecha', freq='D'))[col]
.mean().reset_index())
lx_ny_average_per_day.to_csv( '{}_average_per-day.csv'.format(col[-4:]),
sep=',', header=True, index=False)
我同意@Ben.T关于只使用数据帧列df.columns[-1]的最后一个条目进行索引的说法,假设您的数据帧的结构适合于此。 如果没有,另一种方法是只使用dict键的相应子字符串进行索引:
'PORV{}'.format(name.split('_')[2].upper())
或者干脆
'PORV' + name.split('_')[2].upper()
但是,在我看来,如果您将正确的列提取为一个系列,并使用Fecha(即日期)作为索引,您还可以简化groupby部分,这使您能够使用重采样函数,这正是您希望实现的基于时间的数据分组:
sr = df.set_index('Fecha')['PORVL2N1'] # for indexing, the same like above applies again here
sr.index = pd.to_datetime(sr.index)
avg_per_day = sr.resample('D').mean()
问题是在您的第一次迭代中,在名称的循环中,df in。。。当df为潜水_标高_l2n1_28w_df时,您可以查找列PORVL2N1,它可以工作,但在您也查找该df中的PORVL2N2之后,因为没有任何东西可以阻止运行线l2_n2_平均值_/天=。。。当df=潜水\水位\ l2n1 \ 28w \带有col=df.columns[-1]的df.时,我在正确意义上浏览df上的所有柱。。。这与切片概念相似?@bgarcial是的,这是相同的想法,知道-符号意味着你从末端开始:
----------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-161-fbe6eaf8a824> in <module>()
11 print(phreatic_level_l2_n1_average_per_day.tail())
12 # To N2
---> 13 phreatic_level_l2_n2_average_per_day = (df.groupby(pd.Grouper(key='Fecha', freq='D'))['PORVL{}N{}'.format(i,i)].mean().reset_index())
14 phreatic_level_l2_n2_average_per_day.to_csv('L{}N{}_average_per-day.csv'.format(i,i), sep=',', header=True, index=False)
15
~/anaconda3/envs/sioma/lib/python3.6/site-packages/pandas/core/base.py in __getitem__(self, key)
265 else:
266 if key not in self.obj:
--> 267 raise KeyError("Column not found: {key}".format(key=key))
268 return self._gotitem(key, ndim=1)
269
KeyError: 'Column not found: PORVL2N2'
phreatic_level_l2n2_28w_df.head()
Fecha Hora PORVL2N2
0 2018-01-12 01:58:22 0.71
1 2018-01-12 02:58:22 0.71
2 2018-01-12 03:58:23 0.71
3 2018-01-12 04:58:23 0.71
4 2018-01-12 05:58:24 0.71
for name, df in dfs.items():
df['Fecha'] = pd.to_datetime(df['Fecha'])
col = df.columns[-1] #here col = PORVLxNx with the right x depending on df
# no need of loop for anymore
lx_ny_average_per_day = (df.groupby(pd.Grouper(key='Fecha', freq='D'))[col]
.mean().reset_index())
lx_ny_average_per_day.to_csv( '{}_average_per-day.csv'.format(col[-4:]),
sep=',', header=True, index=False)
'PORV{}'.format(name.split('_')[2].upper())
'PORV' + name.split('_')[2].upper()
sr = df.set_index('Fecha')['PORVL2N1'] # for indexing, the same like above applies again here
sr.index = pd.to_datetime(sr.index)
avg_per_day = sr.resample('D').mean()