Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/292.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Pandas-具有可变长度滚动窗口的聚合值_Python_Pandas_Numpy_Dataframe_Rolling Computation - Fatal编程技术网

Python Pandas-具有可变长度滚动窗口的聚合值

Python Pandas-具有可变长度滚动窗口的聚合值,python,pandas,numpy,dataframe,rolling-computation,Python,Pandas,Numpy,Dataframe,Rolling Computation,以下数据框用作输入: import pandas as pd import numpy as np json_string = '{"datetime":{"0":1528955662000,"1":1528959255000,"2":1528965487000,"3":1528966204000,"4":1528966289000,"5":1528971

以下数据框用作输入:

import pandas as pd
import numpy as np

json_string = '{"datetime":{"0":1528955662000,"1":1528959255000,"2":1528965487000,"3":1528966204000,"4":1528966289000,"5":1528971637000,"6":1528974438000,"7":1528975251000,"8":1528982200000,"9":1528992569000,"10":1528994282000},"hit":{"0":1,"1":0,"2":0,"3":0,"4":0,"5":1,"6":1,"7":0,"8":1,"9":0,"10":1}}'
df = pd.read_json(json_string)
本练习要求您计算每个时刻(
datetime
)的
hit
列的平均值。然而,目前的观察结果不应包括在平均值中。例如,第一个观测值(index=0)得到
np.NaN
,因为除了我们计算平均值的观测值之外,没有其他观测值。由于1/1=1,第二个观察值(索引=1)得到1(不包括第二个观察值的0)。由于(1+0)/2=0.5,第三个观察值(指数=2)得到0.5

我的代码提供了一个正确的答案(在数字方面),但并不优雅。我想知道你能否用不同的东西来完成这个练习。是否可以使用
pandas.api.indexer.variableOffsetWindExer
pandas.api.indexer.BaseIndexer
然后使用
get\u window\u bounds()
方法

我的解决方案:

def add_hr(df):
    """
    Generate a feature `mean_hr` which represents the average hit rate
    at the moment of making the offer (`datetime`).

    Parameters
    ----------
    df : pandas.DataFrame
        The `hit` column must be present. Ascending/descending order in the `datetime`
        column is not assumed.

        hit : int
        datetime : string (format='%Y-%m-%d %H:%M:%S')

    Returns
    ----------
    df_expanded : pandas.DataFrame
        A (deep) copy of the input pandas.DataFrame.
    """

    df_expanded = df.copy(deep=True)

    df_expanded.sort_values(by=['datetime'], ascending=True, inplace=True)

    df_expanded['mean_hr'] = df_expanded['hit'].expanding().mean()

    srs = df_expanded['mean_hr']

    srs = srs[:len(srs)-1]
    srs = pd.concat([pd.Series([np.nan]), srs])
    df_expanded['mean_hr'] = srs.tolist()

    return df_expanded

完整免责声明:这是一个月前招聘过程的一部分。招聘现已结束,我无法再提交代码了。

似乎可以通过子类化
BaseIndexer
类来解决此问题:

from pandas.api.indexers import BaseIndexer

class CustomIndexer(BaseIndexer):
    
    def get_window_bounds(self, num_values, min_periods, center, closed):
        
        start = np.zeros(num_values, dtype='int64')
        end = np.arange(0, num_values, dtype='int64')
        
        return start, end  
    
indexer = CustomIndexer(window_size=0)

df_expanded = df.copy(deep=True)

df_expanded = df_expanded.rolling(indexer).mean()

一个更简单的版本,你试图实现的是简单地改变指数的扩大意味着如下

df.sort_值(by=['datetime'],inplace=True)
df['mean_hit']=df.expansing().mean().shift(1)