Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/flash/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 在数据帧字典上生成平均值_Python_Pandas_Dataframe - Fatal编程技术网

Python 在数据帧字典上生成平均值

Python 在数据帧字典上生成平均值,python,pandas,dataframe,Python,Pandas,Dataframe,我有以下数据帧: phreatic_level_l2n1_28w_df.head() Fecha Hora PORVL2N1 # PORVLxNx column change their name in each data frame 0 2012-01-12 01:37:47 0.65 1 2012-01-12 02:37:45 0.65 2 2012-01-12 03:37:50 0.64 3 2012-01-12 04:

我有以下数据帧:

phreatic_level_l2n1_28w_df.head()
       Fecha    Hora    PORVL2N1  # PORVLxNx column change their name in each data frame
0   2012-01-12  01:37:47    0.65
1   2012-01-12  02:37:45    0.65
2   2012-01-12  03:37:50    0.64
3   2012-01-12  04:37:44    0.63
4   2012-01-12  05:37:45    0.61

phreatic_level_l2n2_28w_df.head()
       Fecha    Hora    PORVL2N2 # PORVLxNx column change their name in each data frame
0   2018-01-12  01:58:22    0.71
1   2018-01-12  02:58:22    0.71
2   2018-01-12  03:58:23    0.71
3   2018-01-12  04:58:23    0.71
4   2018-01-12  05:58:24    0.71

phreatic_level_l4n1_28w_df.head()
       Fecha    Hora    PORVL4N1 # PORVLxNx column change their name in each data frame
0   2018-01-12  01:28:49    0.96
1   2018-01-12  02:28:49    0.96
2   2018-01-12  03:28:50    0.96
3   2018-01-12  04:28:52    0.95
4   2018-01-12  05:28:48    0.94
因此,依次有25个潜水_level_l24n2_28w_df类型的数据帧

每行包含PORVLxNx列上的数据帧,从2018-01-12到2018-08-03,日期范围Fecha列上每天都有值,每天都有许多PORVLxNx列的值

我的目标是获取每个数据帧并生成每天的平均PORVLxNx,如下所示: Fecha PORVL2N1 0 2018-01-12 0.519130 1 2018-01-13 0.138750 2 2018-01-14 0.175417 3 2018-01-15 0.111667 4 2018-01-16 0.291250

我有以下方法:

我将数据帧放在dict中,并使用字符串引用它:

dfs = {
    'phreatic_level_l2n1_28w_df': phreatic_level_l2n1_28w_df,
    # FOR THE MOMENT I ONLY TEST with the first dataframe 

    # 'phreatic_level_l2n2_28w_df': phreatic_level_l2n2_28w_df,
    # 'phreatic_level_l4n1_28w_df': phreatic_level_l4n1_28w_df,
    # 'phreatic_level_l5n1_28w_df': phreatic_level_l5n1_28w_df,
    # 'phreatic_level_l6n1_28w_df': phreatic_level_l6n1_28w_df,
    # 'phreatic_level_l7n1_28w_df': phreatic_level_l7n1_28w_df,
    # 'phreatic_level_l8n1_28w_df': phreatic_level_l8n1_28w_df,
    # 'phreatic_level_l9n1_28w_df': phreatic_level_l9n1_28w_df,
    # 'phreatic_level_l10n1_28w_df': phreatic_level_l10n1_28w_df,
    # 'phreatic_level_l13n1_28w_df': phreatic_level_l13n1_28w_df,
    # 'phreatic_level_l14n1_28w_df': phreatic_level_l14n1_28w_df,
    # 'phreatic_level_l15n1_28w_df': phreatic_level_l15n1_28w_df,
    # 'phreatic_level_l16n1_28w_df': phreatic_level_l16n1_28w_df,
    # 'phreatic_level_l16n2_28w_df': phreatic_level_l16n2_28w_df,
    # 'phreatic_level_l18n1_28w_df': phreatic_level_l18n1_28w_df,
    # 'phreatic_level_l18n2_28w_df': phreatic_level_l18n2_28w_df,
    # 'phreatic_level_l18n3_28w_df': phreatic_level_l18n3_28w_df,
    # 'phreatic_level_l18n4_28w_df': phreatic_level_l18n4_28w_df,
    # 'phreatic_level_l21n1_28w_df': phreatic_level_l21n1_28w_df,
    # 'phreatic_level_l21n2_28w_df': phreatic_level_l21n2_28w_df,
    # 'phreatic_level_l21n3_28w_df': phreatic_level_l21n3_28w_df,
    # 'phreatic_level_l21n4_28w_df': phreatic_level_l21n4_28w_df,
    # 'phreatic_level_l21n5_28w_df': phreatic_level_l21n5_28w_df,
    # 'phreatic_level_l24n1_28w_df': phreatic_level_l24n1_28w_df,
    # 'phreatic_level_l24n2_28w_df': phreatic_level_l24n2_28w_df  

}
我在此刻迭代数据帧,就在潜水层上

我每天的l2_n1_平均值的输出是:

在这里之前,我的想法是可行的

当我想应用此解决方案时,很可能是dfs字典中包含的其他数据帧没有更优的解决方案

dfs = {
        'phreatic_level_l2n1_28w_df': phreatic_level_l2n1_28w_df,
        'phreatic_level_l2n2_28w_df': phreatic_level_l2n2_28w_df,
        # I've added the L2N2  phreatic_level_l2n2_28w_df dataframe item       
    }
我又重复了一遍

在我的输出中,找不到PORVL2N2

----------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-161-fbe6eaf8a824> in <module>()
     11             print(phreatic_level_l2_n1_average_per_day.tail())
     12             # To N2
---> 13             phreatic_level_l2_n2_average_per_day = (df.groupby(pd.Grouper(key='Fecha', freq='D'))['PORVL{}N{}'.format(i,i)].mean().reset_index())
     14             phreatic_level_l2_n2_average_per_day.to_csv('L{}N{}_average_per-day.csv'.format(i,i), sep=',', header=True, index=False)
     15 

~/anaconda3/envs/sioma/lib/python3.6/site-packages/pandas/core/base.py in __getitem__(self, key)
    265         else:
    266             if key not in self.obj:
--> 267                 raise KeyError("Column not found: {key}".format(key=key))
    268             return self._gotitem(key, ndim=1)
    269 

KeyError: 'Column not found: PORVL2N2'

是否有可能,在我的迭代中,我覆盖了数据帧或发生了其他事情?

您的数据帧似乎具有良好且一致的结构,因此您可以做的是获取您希望PORVLxNy使用df.columns和最后一个元素[-1]获取平均值的列的名称。然后,要将结果保存到具有正确名称的csv文件中,只需保留列名称的最后4个字符:

for name, df in dfs.items():
    df['Fecha'] = pd.to_datetime(df['Fecha'])
    col = df.columns[-1] #here col = PORVLxNx with the right x depending on df
    # no need of loop for anymore
    lx_ny_average_per_day = (df.groupby(pd.Grouper(key='Fecha', freq='D'))[col]
                               .mean().reset_index())
    lx_ny_average_per_day.to_csv( '{}_average_per-day.csv'.format(col[-4:]), 
                                  sep=',', header=True, index=False)

我同意@Ben.T关于只使用数据帧列df.columns[-1]的最后一个条目进行索引的说法,假设您的数据帧的结构适合于此。 如果没有,另一种方法是只使用dict键的相应子字符串进行索引:

'PORV{}'.format(name.split('_')[2].upper())
或者干脆

'PORV' + name.split('_')[2].upper()
但是,在我看来,如果您将正确的列提取为一个系列,并使用Fecha(即日期)作为索引,您还可以简化groupby部分,这使您能够使用重采样函数,这正是您希望实现的基于时间的数据分组:

sr = df.set_index('Fecha')['PORVL2N1']   # for indexing, the same like above applies again here
sr.index = pd.to_datetime(sr.index)
avg_per_day = sr.resample('D').mean()

问题是在您的第一次迭代中,在名称的循环中,df in。。。当df为潜水_标高_l2n1_28w_df时,您可以查找列PORVL2N1,它可以工作,但在您也查找该df中的PORVL2N2之后,因为没有任何东西可以阻止运行线l2_n2_平均值_/天=。。。当df=潜水\水位\ l2n1 \ 28w \带有col=df.columns[-1]的df.时,我在正确意义上浏览df上的所有柱。。。这与切片概念相似?@bgarcial是的,这是相同的想法,知道-符号意味着你从末端开始:
----------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-161-fbe6eaf8a824> in <module>()
     11             print(phreatic_level_l2_n1_average_per_day.tail())
     12             # To N2
---> 13             phreatic_level_l2_n2_average_per_day = (df.groupby(pd.Grouper(key='Fecha', freq='D'))['PORVL{}N{}'.format(i,i)].mean().reset_index())
     14             phreatic_level_l2_n2_average_per_day.to_csv('L{}N{}_average_per-day.csv'.format(i,i), sep=',', header=True, index=False)
     15 

~/anaconda3/envs/sioma/lib/python3.6/site-packages/pandas/core/base.py in __getitem__(self, key)
    265         else:
    266             if key not in self.obj:
--> 267                 raise KeyError("Column not found: {key}".format(key=key))
    268             return self._gotitem(key, ndim=1)
    269 

KeyError: 'Column not found: PORVL2N2'
phreatic_level_l2n2_28w_df.head()
       Fecha    Hora    PORVL2N2
0   2018-01-12  01:58:22    0.71
1   2018-01-12  02:58:22    0.71
2   2018-01-12  03:58:23    0.71
3   2018-01-12  04:58:23    0.71
4   2018-01-12  05:58:24    0.71
for name, df in dfs.items():
    df['Fecha'] = pd.to_datetime(df['Fecha'])
    col = df.columns[-1] #here col = PORVLxNx with the right x depending on df
    # no need of loop for anymore
    lx_ny_average_per_day = (df.groupby(pd.Grouper(key='Fecha', freq='D'))[col]
                               .mean().reset_index())
    lx_ny_average_per_day.to_csv( '{}_average_per-day.csv'.format(col[-4:]), 
                                  sep=',', header=True, index=False)
'PORV{}'.format(name.split('_')[2].upper())
'PORV' + name.split('_')[2].upper()
sr = df.set_index('Fecha')['PORVL2N1']   # for indexing, the same like above applies again here
sr.index = pd.to_datetime(sr.index)
avg_per_day = sr.resample('D').mean()