Python 在数据帧字典上生成平均值_Python_Pandas_Dataframe

Python 在数据帧字典上生成平均值

python pandas dataframe

Python 在数据帧字典上生成平均值,python,pandas,dataframe,Python,Pandas,Dataframe,我有以下数据帧： phreatic_level_l2n1_28w_df.head() Fecha Hora PORVL2N1 # PORVLxNx column change their name in each data frame 0 2012-01-12 01:37:47 0.65 1 2012-01-12 02:37:45 0.65 2 2012-01-12 03:37:50 0.64 3 2012-01-12 04:

我有以下数据帧：

phreatic_level_l2n1_28w_df.head()
       Fecha    Hora    PORVL2N1  # PORVLxNx column change their name in each data frame
0   2012-01-12  01:37:47    0.65
1   2012-01-12  02:37:45    0.65
2   2012-01-12  03:37:50    0.64
3   2012-01-12  04:37:44    0.63
4   2012-01-12  05:37:45    0.61

phreatic_level_l2n2_28w_df.head()
       Fecha    Hora    PORVL2N2 # PORVLxNx column change their name in each data frame
0   2018-01-12  01:58:22    0.71
1   2018-01-12  02:58:22    0.71
2   2018-01-12  03:58:23    0.71
3   2018-01-12  04:58:23    0.71
4   2018-01-12  05:58:24    0.71

phreatic_level_l4n1_28w_df.head()
       Fecha    Hora    PORVL4N1 # PORVLxNx column change their name in each data frame
0   2018-01-12  01:28:49    0.96
1   2018-01-12  02:28:49    0.96
2   2018-01-12  03:28:50    0.96
3   2018-01-12  04:28:52    0.95
4   2018-01-12  05:28:48    0.94

因此，依次有25个潜水_level_l24n2_28w_df类型的数据帧

每行包含PORVLxNx列上的数据帧，从2018-01-12到2018-08-03，日期范围Fecha列上每天都有值，每天都有许多PORVLxNx列的值

我的目标是获取每个数据帧并生成每天的平均PORVLxNx，如下所示： Fecha PORVL2N1 0 2018-01-12 0.519130 1 2018-01-13 0.138750 2 2018-01-14 0.175417 3 2018-01-15 0.111667 4 2018-01-16 0.291250

我有以下方法：

我将数据帧放在dict中，并使用字符串引用它：

dfs = {
    'phreatic_level_l2n1_28w_df': phreatic_level_l2n1_28w_df,
    # FOR THE MOMENT I ONLY TEST with the first dataframe 

    # 'phreatic_level_l2n2_28w_df': phreatic_level_l2n2_28w_df,
    # 'phreatic_level_l4n1_28w_df': phreatic_level_l4n1_28w_df,
    # 'phreatic_level_l5n1_28w_df': phreatic_level_l5n1_28w_df,
    # 'phreatic_level_l6n1_28w_df': phreatic_level_l6n1_28w_df,
    # 'phreatic_level_l7n1_28w_df': phreatic_level_l7n1_28w_df,
    # 'phreatic_level_l8n1_28w_df': phreatic_level_l8n1_28w_df,
    # 'phreatic_level_l9n1_28w_df': phreatic_level_l9n1_28w_df,
    # 'phreatic_level_l10n1_28w_df': phreatic_level_l10n1_28w_df,
    # 'phreatic_level_l13n1_28w_df': phreatic_level_l13n1_28w_df,
    # 'phreatic_level_l14n1_28w_df': phreatic_level_l14n1_28w_df,
    # 'phreatic_level_l15n1_28w_df': phreatic_level_l15n1_28w_df,
    # 'phreatic_level_l16n1_28w_df': phreatic_level_l16n1_28w_df,
    # 'phreatic_level_l16n2_28w_df': phreatic_level_l16n2_28w_df,
    # 'phreatic_level_l18n1_28w_df': phreatic_level_l18n1_28w_df,
    # 'phreatic_level_l18n2_28w_df': phreatic_level_l18n2_28w_df,
    # 'phreatic_level_l18n3_28w_df': phreatic_level_l18n3_28w_df,
    # 'phreatic_level_l18n4_28w_df': phreatic_level_l18n4_28w_df,
    # 'phreatic_level_l21n1_28w_df': phreatic_level_l21n1_28w_df,
    # 'phreatic_level_l21n2_28w_df': phreatic_level_l21n2_28w_df,
    # 'phreatic_level_l21n3_28w_df': phreatic_level_l21n3_28w_df,
    # 'phreatic_level_l21n4_28w_df': phreatic_level_l21n4_28w_df,
    # 'phreatic_level_l21n5_28w_df': phreatic_level_l21n5_28w_df,
    # 'phreatic_level_l24n1_28w_df': phreatic_level_l24n1_28w_df,
    # 'phreatic_level_l24n2_28w_df': phreatic_level_l24n2_28w_df  

}

我在此刻迭代数据帧，就在潜水层上

我每天的l2_n1_平均值的输出是：

在这里之前，我的想法是可行的

当我想应用此解决方案时，很可能是dfs字典中包含的其他数据帧没有更优的解决方案

dfs = {
        'phreatic_level_l2n1_28w_df': phreatic_level_l2n1_28w_df,
        'phreatic_level_l2n2_28w_df': phreatic_level_l2n2_28w_df,
        # I've added the L2N2  phreatic_level_l2n2_28w_df dataframe item       
    }

我又重复了一遍

在我的输出中，找不到PORVL2N2

----------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-161-fbe6eaf8a824> in <module>()
     11             print(phreatic_level_l2_n1_average_per_day.tail())
     12             # To N2
---> 13             phreatic_level_l2_n2_average_per_day = (df.groupby(pd.Grouper(key='Fecha', freq='D'))['PORVL{}N{}'.format(i,i)].mean().reset_index())
     14             phreatic_level_l2_n2_average_per_day.to_csv('L{}N{}_average_per-day.csv'.format(i,i), sep=',', header=True, index=False)
     15 

~/anaconda3/envs/sioma/lib/python3.6/site-packages/pandas/core/base.py in __getitem__(self, key)
    265         else:
    266             if key not in self.obj:
--> 267                 raise KeyError("Column not found: {key}".format(key=key))
    268             return self._gotitem(key, ndim=1)
    269 

KeyError: 'Column not found: PORVL2N2'

是否有可能，在我的迭代中，我覆盖了数据帧或发生了其他事情？

您的数据帧似乎具有良好且一致的结构，因此您可以做的是获取您希望PORVLxNy使用df.columns和最后一个元素[-1]获取平均值的列的名称。然后，要将结果保存到具有正确名称的csv文件中，只需保留列名称的最后4个字符：

for name, df in dfs.items():
    df['Fecha'] = pd.to_datetime(df['Fecha'])
    col = df.columns[-1] #here col = PORVLxNx with the right x depending on df
    # no need of loop for anymore
    lx_ny_average_per_day = (df.groupby(pd.Grouper(key='Fecha', freq='D'))[col]
                               .mean().reset_index())
    lx_ny_average_per_day.to_csv( '{}_average_per-day.csv'.format(col[-4:]), 
                                  sep=',', header=True, index=False)

我同意@Ben.T关于只使用数据帧列df.columns[-1]的最后一个条目进行索引的说法，假设您的数据帧的结构适合于此。如果没有，另一种方法是只使用dict键的相应子字符串进行索引：

'PORV{}'.format(name.split('_')[2].upper())

或者干脆

'PORV' + name.split('_')[2].upper()

但是，在我看来，如果您将正确的列提取为一个系列，并使用Fecha（即日期）作为索引，您还可以简化groupby部分，这使您能够使用重采样函数，这正是您希望实现的基于时间的数据分组：

sr = df.set_index('Fecha')['PORVL2N1']   # for indexing, the same like above applies again here
sr.index = pd.to_datetime(sr.index)
avg_per_day = sr.resample('D').mean()

问题是在您的第一次迭代中，在名称的循环中，df in。。。当df为潜水_标高_l2n1_28w_df时，您可以查找列PORVL2N1，它可以工作，但在您也查找该df中的PORVL2N2之后，因为没有任何东西可以阻止运行线l2_n2_平均值_/天=。。。当df=潜水\水位\ l2n1 \ 28w \带有col=df.columns[-1]的df.时，我在正确意义上浏览df上的所有柱。。。这与切片概念相似？@bgarcial是的，这是相同的想法，知道-符号意味着你从末端开始：

----------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-161-fbe6eaf8a824> in <module>()
     11             print(phreatic_level_l2_n1_average_per_day.tail())
     12             # To N2
---> 13             phreatic_level_l2_n2_average_per_day = (df.groupby(pd.Grouper(key='Fecha', freq='D'))['PORVL{}N{}'.format(i,i)].mean().reset_index())
     14             phreatic_level_l2_n2_average_per_day.to_csv('L{}N{}_average_per-day.csv'.format(i,i), sep=',', header=True, index=False)
     15 

~/anaconda3/envs/sioma/lib/python3.6/site-packages/pandas/core/base.py in __getitem__(self, key)
    265         else:
    266             if key not in self.obj:
--> 267                 raise KeyError("Column not found: {key}".format(key=key))
    268             return self._gotitem(key, ndim=1)
    269 

KeyError: 'Column not found: PORVL2N2'

phreatic_level_l2n2_28w_df.head()
       Fecha    Hora    PORVL2N2
0   2018-01-12  01:58:22    0.71
1   2018-01-12  02:58:22    0.71
2   2018-01-12  03:58:23    0.71
3   2018-01-12  04:58:23    0.71
4   2018-01-12  05:58:24    0.71

for name, df in dfs.items():
    df['Fecha'] = pd.to_datetime(df['Fecha'])
    col = df.columns[-1] #here col = PORVLxNx with the right x depending on df
    # no need of loop for anymore
    lx_ny_average_per_day = (df.groupby(pd.Grouper(key='Fecha', freq='D'))[col]
                               .mean().reset_index())
    lx_ny_average_per_day.to_csv( '{}_average_per-day.csv'.format(col[-4:]), 
                                  sep=',', header=True, index=False)

'PORV{}'.format(name.split('_')[2].upper())

'PORV' + name.split('_')[2].upper()

sr = df.set_index('Fecha')['PORVL2N1']   # for indexing, the same like above applies again here
sr.index = pd.to_datetime(sr.index)
avg_per_day = sr.resample('D').mean()