Python 为重采样分位数值创建单独的分位数_Python_Pandas

Python 为重采样分位数值创建单独的分位数

python pandas

Python 为重采样分位数值创建单独的分位数,python,pandas,Python,Pandas,如果我有一些数据： import pandas as pd import numpy as np from numpy.random import randint np.random.seed(10) # added for reproductibility

如果我有一些数据：

import pandas as pd 
import numpy as np 
from numpy.random import randint


np.random.seed(10)  # added for reproductibility                                                                                                                                                                 

import numpy as np
import pandas as pd
np.random.seed(11)

rows,cols = 50000,2
data = np.random.rand(rows,cols) 
tidx = pd.date_range('2019-01-01', periods=rows, freq='T') 
df = pd.DataFrame(data, columns=['Temperature','Value'], index=tidx)

r = df['Temperature'].resample('D')

print (r.apply(lambda x: x.quantile(0.95)))
print (r.apply(lambda x: x.quantile(0.05)))

是否有一种简单的方法可以创建一个单独的字段，其中一列用于每日重采样的95%上百分位值，另一列用于每日重采样的5%下百分位值？

由两个百分位使用，通过以下方式重塑和更改列名：

或者可以使用cunstom函数重命名带有

f-strings

的列：

df = (df['Temperature'].resample('D')
                       .agg(lambda x: x.quantile([.05, 0.95]))
                       .unstack()
                       .rename(columns=lambda x: f'Q_{str(x)[2:]}'))

print (df.head())
                Q_05      Q_95
2019-01-01  0.052827  0.938153
2019-01-02  0.047346  0.945900
2019-01-03  0.051418  0.940610
2019-01-04  0.042772  0.954205
2019-01-05  0.047322  0.947836

您可以使用命名聚合并取消其堆栈：

resul = r.agg({'Q05': lambda x: x.quantile(.05), 'Q95': lambda x: x.quantile(.95)}
              ).unstack(level=0)

它给出：

                 Q05       Q95
2019-01-01  0.052827  0.938153
2019-01-02  0.047346  0.945900
2019-01-03  0.051418  0.940610
2019-01-04  0.042772  0.954205
2019-01-05  0.047322  0.947836
2019-01-06  0.045774  0.945841
2019-01-07  0.051923  0.953116
2019-01-08  0.053432  0.940223
2019-01-09  0.053840  0.956237
2019-01-10  0.047259  0.951156
2019-01-11  0.041143  0.951512
2019-01-12  0.041922  0.947111
2019-01-13  0.055583  0.956318
2019-01-14  0.052682  0.955975
2019-01-15  0.058370  0.957171
2019-01-16  0.056496  0.948921
2019-01-17  0.046263  0.948594
2019-01-18  0.045747  0.951233
2019-01-19  0.064161  0.952095
2019-01-20  0.048360  0.943699
2019-01-21  0.042276  0.953994
2019-01-22  0.053079  0.948949
2019-01-23  0.048329  0.949080
2019-01-24  0.049742  0.956341
2019-01-25  0.043853  0.952652
2019-01-26  0.046229  0.957636
2019-01-27  0.042207  0.948257
2019-01-28  0.058255  0.938285
2019-01-29  0.057104  0.951995
2019-01-30  0.052108  0.942105
2019-01-31  0.051148  0.952520
2019-02-01  0.048918  0.954099
2019-02-02  0.046793  0.948523
2019-02-03  0.051317  0.947725
2019-02-04  0.048069  0.949687

我认为您也可以使用groupby：

df['Temperature'].groupby(df.index.floor('d')).quantile(0.05)

2019-01-01    0.052827
2019-01-02    0.047346
2019-01-03    0.051418
2019-01-04    0.042772
2019-01-05    0.047322
2019-01-06    0.045774
2019-01-07    0.051923
2019-01-08    0.053432
2019-01-09    0.053840
   ...

df['Temperature'].groupby(df.index.floor('d')).quantile(0.05)

2019-01-01    0.052827
2019-01-02    0.047346
2019-01-03    0.051418
2019-01-04    0.042772
2019-01-05    0.047322
2019-01-06    0.045774
2019-01-07    0.051923
2019-01-08    0.053432
2019-01-09    0.053840
   ...