Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/list/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何对混合类型的数据帧重新采样?_Python_Numpy_Pandas_Time Series - Fatal编程技术网

Python 如何对混合类型的数据帧重新采样?

Python 如何对混合类型的数据帧重新采样?,python,numpy,pandas,time-series,Python,Numpy,Pandas,Time Series,我使用以下Python代码生成数据帧df3的混合类型(浮点和字符串): df1 = pd.DataFrame(np.random.randn(dates.shape[0],2),index=dates,columns=list('AB')) df1['C'] = 'A' df1['D'] = 'Pickles' df2 = pd.DataFrame(np.random.randn(dates.shape[0], 2),index=dates,columns=list('AB')) df2['C'

我使用以下Python代码生成数据帧df3的混合类型(浮点和字符串):

df1 = pd.DataFrame(np.random.randn(dates.shape[0],2),index=dates,columns=list('AB'))
df1['C'] = 'A'
df1['D'] = 'Pickles'
df2 = pd.DataFrame(np.random.randn(dates.shape[0], 2),index=dates,columns=list('AB'))
df2['C'] = 'B'
df2['D'] = 'Ham'
df3 = pd.concat([df1, df2], axis=0)
当我将df3重采样到更高的频率时,我不会将帧重采样到更高的速率,但会忽略如何进行,我只会得到缺少的值:

df4 = df3.groupby(['C']).resample('M',  how={'A': 'mean', 'B': 'mean',  'D': 'ffill'})
df4.head()
结果:

                      B          A        D
C                                          
A 2014-03-31 -0.4640906 -0.2435414  Pickles
  2014-04-30        NaN        NaN      NaN
  2014-05-31        NaN        NaN      NaN
  2014-06-30 -0.5626360  0.6679614  Pickles
  2014-07-31        NaN        NaN      NaN
                      B          A        D
C                                          
A 2014-03-31        NaN        NaN  Pickles
  2014-06-30        NaN        NaN  Pickles
  2014-09-30        NaN        NaN  Pickles
  2014-12-31 -0.7429617 -0.1065645  Pickles
  2015-03-31        NaN        NaN  Pickles
                     B          A
  C                                 
  A 2014-12-31 -0.7429617 -0.1065645
    2015-12-31 -0.6245030 -0.3101057
  B 2014-12-31  0.4213621 -0.0708263
    2015-12-31 -0.0607028  0.0110456
当我将df3重采样到较低的频率时,我根本没有得到任何重采样:

df5 = df3.groupby(['C']).resample('A',  how={'A': np.mean, 'B': np.mean,  'D': 'ffill'})
df5.head()
结果:

                      B          A        D
C                                          
A 2014-03-31 -0.4640906 -0.2435414  Pickles
  2014-04-30        NaN        NaN      NaN
  2014-05-31        NaN        NaN      NaN
  2014-06-30 -0.5626360  0.6679614  Pickles
  2014-07-31        NaN        NaN      NaN
                      B          A        D
C                                          
A 2014-03-31        NaN        NaN  Pickles
  2014-06-30        NaN        NaN  Pickles
  2014-09-30        NaN        NaN  Pickles
  2014-12-31 -0.7429617 -0.1065645  Pickles
  2015-03-31        NaN        NaN  Pickles
                     B          A
  C                                 
  A 2014-12-31 -0.7429617 -0.1065645
    2015-12-31 -0.6245030 -0.3101057
  B 2014-12-31  0.4213621 -0.0708263
    2015-12-31 -0.0607028  0.0110456
我很确定这与混合类型有关,因为如果我用数字列重新进行年度向下采样,一切都会按预期进行:

df5b = df3[['A', 'B', 'C']].groupby(['C']).resample('A',  how={'A': np.mean, 'B': np.mean})
df5b.head()
df4b = df3[['A', 'B', 'C']].groupby(['C']).resample('M',  how={'A': 'mean', 'B': 'mean'})
df4b.head()
结果:

                      B          A        D
C                                          
A 2014-03-31 -0.4640906 -0.2435414  Pickles
  2014-04-30        NaN        NaN      NaN
  2014-05-31        NaN        NaN      NaN
  2014-06-30 -0.5626360  0.6679614  Pickles
  2014-07-31        NaN        NaN      NaN
                      B          A        D
C                                          
A 2014-03-31        NaN        NaN  Pickles
  2014-06-30        NaN        NaN  Pickles
  2014-09-30        NaN        NaN  Pickles
  2014-12-31 -0.7429617 -0.1065645  Pickles
  2015-03-31        NaN        NaN  Pickles
                     B          A
  C                                 
  A 2014-12-31 -0.7429617 -0.1065645
    2015-12-31 -0.6245030 -0.3101057
  B 2014-12-31  0.4213621 -0.0708263
    2015-12-31 -0.0607028  0.0110456
但是,即使我切换到数字类型,重新采样到更高频率仍然不能像我预期的那样工作:

df5b = df3[['A', 'B', 'C']].groupby(['C']).resample('A',  how={'A': np.mean, 'B': np.mean})
df5b.head()
df4b = df3[['A', 'B', 'C']].groupby(['C']).resample('M',  how={'A': 'mean', 'B': 'mean'})
df4b.head()
结果:

                      B          A
C                                 
A 2014-03-31 -0.4640906 -0.2435414
  2014-04-30        NaN        NaN
  2014-05-31        NaN        NaN
  2014-06-30 -0.5626360  0.6679614
  2014-07-31        NaN        NaN
这给我留下了两个问题:

  • 对混合类型的数据帧重新采样的正确方法是什么
  • 当从较低频率重新采样到较高频率时,进行重新采样以插入新值的正确方法是什么

  • 即使您不能提供两部分的完整答案,也欢迎您提供部分解决方案或任何一个问题的答案

    当从较低频率重新采样到较高频率时,我意识到当我想要指定填充方法时,我是在指定方式。当我这么做的时候,事情似乎起了作用

    df4c = df3.groupby(['C']).resample('M',  fill_method='ffill')
    df4c.head()
                         A          B        D
    C                                          
    A 2014-03-31 -0.2435414 -0.4640906  Pickles
      2014-04-30 -0.2435414 -0.4640906  Pickles
      2014-05-31 -0.2435414 -0.4640906  Pickles
      2014-06-30  0.6679614 -0.5626360  Pickles
      2014-07-31  0.6679614 -0.5626360  Pickles
    
    您得到的插值选择集非常有限,但它确实可以处理混合类型

    当使用nohow选项(我相信它的默认值是指)重新采样到较低的频率时,下采样确实起作用:

       df5c =df3.groupby(['C']).resample('A')
       df5c.head()
                      A          B
    C                                 
    A 2014-12-31 -0.1065645 -0.7429617
      2015-12-31 -0.3101057 -0.6245030
    B 2014-12-31 -0.0708263  0.4213621
      2015-12-31  0.0110456 -0.0607028
    
    因此,问题似乎在于如何传递选项字典或其中一个选项选项,可能是ffill,但我不确定。

    使用
    重采样
    agg
    自pandas-1.0.0以来。 此外,
    resample
    方法现在就可以了

    解决方案是使用与每列关联的函数或函数名定义聚合规则

    df.resample(period.agg)(聚合规则)
    
    更多关于聚合规则的示例

    工作示例 准备测试数据:

    将numpy导入为np
    作为pd进口熊猫
    日期=pd.日期范围(“2021-02-09”,“2021-04-09”,freq=“1D”)
    df1=pd.DataFrame(np.random.randn(dates.shape[0],2),index=dates,columns=list('AB'))
    df1['C']='A'
    df1['D']='Pickles'
    df2=pd.DataFrame(np.random.randn(dates.shape[0],2),index=dates,columns=list('AB'))
    df2['C']='B'
    df2['D']='Ham'
    df3=pd.concat([df1,df2],轴=0)
    打印(df3)
    
    输出:

                       A         B  C        D
    2021-02-09  2.591285  2.455686  A  Pickles
    2021-02-10  0.753461 -0.072643  A  Pickles
    2021-02-11 -0.351667 -0.025511  A  Pickles
    2021-02-12 -0.896730  0.004512  A  Pickles
    2021-02-13 -0.493139 -0.770514  A  Pickles
    ...              ...       ... ..      ...
    2021-04-05  1.615935  1.152517  B      Ham
    2021-04-06 -0.067654 -0.858186  B      Ham
    2021-04-07  0.085587 -0.848542  B      Ham
    2021-04-08 -0.371983  0.088441  B      Ham
    2021-04-09  0.681501  0.235328  B      Ham
    
    [120 rows x 4 columns]
    
                       A         B  C    D
    2021-02-28  0.025987  3.886781  A  Ham
    2021-03-31  0.081423 -5.492928  A  Ham
    2021-04-30  0.239309 -3.344334  A  Ham
    
    每月重新取样:

    agg_规则={“A”:“平均”、“B”:“总和”、“C”:“第一”、“D”:“最后一个”,}
    df4=df3.重采样(“M”).agg(agg_规则)
    打印(df4)
    
    输出:

                       A         B  C        D
    2021-02-09  2.591285  2.455686  A  Pickles
    2021-02-10  0.753461 -0.072643  A  Pickles
    2021-02-11 -0.351667 -0.025511  A  Pickles
    2021-02-12 -0.896730  0.004512  A  Pickles
    2021-02-13 -0.493139 -0.770514  A  Pickles
    ...              ...       ... ..      ...
    2021-04-05  1.615935  1.152517  B      Ham
    2021-04-06 -0.067654 -0.858186  B      Ham
    2021-04-07  0.085587 -0.848542  B      Ham
    2021-04-08 -0.371983  0.088441  B      Ham
    2021-04-09  0.681501  0.235328  B      Ham
    
    [120 rows x 4 columns]
    
                       A         B  C    D
    2021-02-28  0.025987  3.886781  A  Ham
    2021-03-31  0.081423 -5.492928  A  Ham
    2021-04-30  0.239309 -3.344334  A  Ham