Python 针对每种样本类型的多指标ewma滚动_Python_Pandas

Python 针对每种样本类型的多指标ewma滚动

python pandas

Python 针对每种样本类型的多指标ewma滚动,python,pandas,Python,Pandas,假设我想测量一个器官的长度，比如说几种动物的胃，按类型排序，我从一个带有重复值的.csv创建了一个多索引数据框，我每天都采集一个样本，使我的度量值变得嘈杂如何对包含在多索引数据框中的每种物种的最后60个样本应用滚动ewma 数据帧示例： arrays = [['mamal', 'mamal','mamal', 'mamal', 'mamal', 'mamal', 'mamal','mamal', 'mamal', 'mamal','bird', 'bird','bird', 'bird',

假设我想测量一个器官的长度，比如说几种动物的胃，按类型排序，我从一个带有重复值的.csv创建了一个多索引数据框，我每天都采集一个样本，使我的度量值变得嘈杂

如何对包含在多索引数据框中的每种物种的最后60个样本应用滚动ewma

数据帧示例：

arrays = [['mamal', 'mamal','mamal', 'mamal', 'mamal', 'mamal', 'mamal','mamal', 'mamal', 'mamal','bird', 'bird','bird', 'bird', 'reptile', 'reptile'],
          ['whale','whale','whale','whale', 'dolphin', 'dolphin', 'dolphin', 'dolphin', 'cat', 'cat', 'canary', 'canary', 'eagle', 'eagle', 'boa', 'turtle'],
          ['2017-03-01','2017-03-02','2017-03-03','2017-03-04','2017-03-01','2017-03-02','2017-03-03','2017-03-04','2017-03-03','2017-03-04','2017-03-01','2017-03-02','2017-03-03','2017-03-01','2017-03-02','2017-03-03','2017-03-01','2017-03-02','2017-03-03']]

tuples = list(zip(*arrays))

index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])

s = pd.Series(np.random.randn(13), index=index)
print(s) :

type     species  measure_date
mamal    whale    2017-03-01      0.913916
                  2017-03-02      0.860045
                  2017-03-03      1.166217
                  2017-03-04     -0.439948
         dolphin  2017-03-01      0.590208
                  2017-03-02      0.297475
                  2017-03-03      0.067966
                  2017-03-04     -0.477495
         cat      2017-03-03     -1.261023
                  2017-03-04     -0.931671
bird     canary   2017-03-01     -1.367815
                  2017-03-02     -0.820792
         eagle    2017-03-03     -0.532935
                  2017-03-01     -0.152090
reptile  boa      2017-03-02     -2.070819
         turtle   2017-03-03      1.329004
dtype: float64

假设我现在有了更长的测量历史，保持每天的测量，那么对于每一个物种，执行滚动ewma的syntaxc是什么，保持每一个物种都是独立的（我不想滚动所有的测量，但只对海豚或鲸鱼中的一个进行滚动）

我试过了

b = s.groupby(level=2,group_keys=False).apply(lambda x: pd.ewma(x,ignore_na=True,min_periods=2,adjust=True,com=0.030927835051546))

但它只会覆盖所有物种，不会对其进行区分

type     species  measure_date
mamal    whale    2017-03-01           NaN
                  2017-03-02           NaN
                  2017-03-03           NaN
                  2017-03-04           NaN
         dolphin  2017-03-01      0.599637
                  2017-03-02      0.313861
                  2017-03-03      0.099954
                  2017-03-04     -0.476401
         cat      2017-03-03     -1.220229
                  2017-03-04     -0.918025
bird     canary   2017-03-01     -1.308843
                  2017-03-02     -0.786782
         eagle    2017-03-03     -0.553554
                  2017-03-01     -0.186791
reptile  boa      2017-03-02     -2.032299
         turtle   2017-03-03      1.272527

正如@ScottBoston所指出的，您只需纠正

级别

：

s.groupby(level="second").apply(lambda x: pd.ewma(x,ignore_na=True,min_periods=2,adjust=True,com=0.030927835051546))

first    second   third     
mamal    whale    2017-03-01         NaN
                  2017-03-02    0.661551
                  2017-03-03   -0.726871
                  2017-03-04   -1.873301
         dolphin  2017-03-01         NaN
                  2017-03-02    0.242347
                  2017-03-03    0.276082
                  2017-03-04    0.071822
         cat      2017-03-03         NaN
                  2017-03-04    0.441826
bird     canary   2017-03-01         NaN
                  2017-03-02    1.426628
         eagle    2017-03-03         NaN
                  2017-03-01    0.382538
reptile  boa      2017-03-02         NaN
         turtle   2017-03-03         NaN
dtype: float64

正如@ScottBoston所指出的，您只需纠正

级别

：

s.groupby(level="second").apply(lambda x: pd.ewma(x,ignore_na=True,min_periods=2,adjust=True,com=0.030927835051546))

first    second   third     
mamal    whale    2017-03-01         NaN
                  2017-03-02    0.661551
                  2017-03-03   -0.726871
                  2017-03-04   -1.873301
         dolphin  2017-03-01         NaN
                  2017-03-02    0.242347
                  2017-03-03    0.276082
                  2017-03-04    0.071822
         cat      2017-03-03         NaN
                  2017-03-04    0.441826
bird     canary   2017-03-01         NaN
                  2017-03-02    1.426628
         eagle    2017-03-03         NaN
                  2017-03-01    0.382538
reptile  boa      2017-03-02         NaN
         turtle   2017-03-03         NaN
dtype: float64

多级索引从外部的0开始，向内计数，所以我认为您希望物种的级别为1。在本例中，度量值日期为2级，物种级别为1，类型为0级。非常感谢@scottBostonMultilevel indexes从外部的0开始，向内计数，因此我认为您希望物种级别为1。在本例中，测量日期为2级，物种级别为1级，类型为0级。非常感谢@scottBostonhi，非常感谢您的回答！我的意思是ewma，它是附在pd中的；这是一个指数加权移动平均值，试图去除噪声度量，我有几个部分不工作，或者我没有得到，可能是因为我的熊猫版本：-在执行结果时，我有以下错误：AttributeError:“DataFrame”对象没有属性“rolling”@BillyBobJocko2223我终于得到了；-）嗨，非常感谢你的回答！我的意思是ewma，它是附在pd中的；这是一个指数加权移动平均值，试图去除噪声度量，我有几个部分不工作，或者我没有得到，可能是因为我的熊猫版本：-在执行结果时，我有以下错误：AttributeError:“DataFrame”对象没有属性“rolling”@BillyBobJocko2223我终于得到了；-）