Python 将滚动和累积z分数函数合并为一个

Python 将滚动和累积z分数函数合并为一个,python,pandas,dataframe,Python,Pandas,Dataframe,我有两个职能: 首先(z_分数)计算给定df列的滚动z分数值 第二个(z_得分_cum)计算无前瞻性偏差的累积z得分 我想以某种方式将这两个函数组合成一个z-score函数,这样,无论我将时间窗口作为参数传递,它都会基于窗口计算z-score,另外,它将包含一个具有累积z-score的列。目前,我正在创建一个时间窗口列表(此处以天为单位),在调用函数并单独加入此附加列时,我将其传递到循环中,我认为这不是最佳的处理方式 d_list = [n * 21 for n in range(1,13)]

我有两个职能:

  • 首先(z_分数)计算给定df列的滚动z分数值
  • 第二个(z_得分_cum)计算无前瞻性偏差的累积z得分
  • 我想以某种方式将这两个函数组合成一个z-score函数,这样,无论我将时间窗口作为参数传递,它都会基于窗口计算z-score,另外,它将包含一个具有累积z-score的列。目前,我正在创建一个时间窗口列表(此处以天为单位),在调用函数并单独加入此附加列时,我将其传递到循环中,我认为这不是最佳的处理方式

    d_list = [n * 21 for n in range(1,13)]
    
    df_zscore = df.copy()
    for i in d_list:
        df_zscore = z_score(df_zscore, i)
        
        
    df_zscore_cum = z_score_cum(df)
    df_z_scores = pd.concat([df_zscore, df_zscore_cum], axis=1)
    

    最终,我做到了这一点:

    def calculate_z_scores(self, list_of_windows, freq_flag='D'):
            """
            Calculates rolling z-scores and cumulative z-scores based on given list
            of time windows
    
            Parameters
            ----------
            list_of_windows : list
                a list of time windows.
            freq_flag : string
                frequency flag. The default is 'D' (daily)
    
            Returns
            -------
            data frame
                a data frame with calculated rolling & cumulative z-score.
            """
            z_scores_data_frame = self.original_data_frame.copy()
            # get column with values (1st column)
            val_column = z_scores_data_frame.columns[0]
            len_ = len(z_scores_data_frame)
            # calculating statistics for cumulative_zscore
            z_scores_data_frame['mean_past'] = [np.mean(z_scores_data_frame[val_column][0:lv+1]) for lv in range(0,len_)]
            z_scores_data_frame['std_past'] = [np.std(z_scores_data_frame[val_column][0:lv+1]) for lv in range(0,len_)]
            z_scores_data_frame['zscore_cum'] = (z_scores_data_frame[val_column] - z_scores_data_frame['mean_past']) / z_scores_data_frame['std_past']
            # taking care of rolling z_scores
            for i in list_of_windows:
                col_mean = z_scores_data_frame[val_column].rolling(window=i).mean()
                col_std = z_scores_data_frame[val_column].rolling(window=i).std()
                z_scores_data_frame['zscore' + '_' + str(i)+ freq_flag] = (z_scores_data_frame[val_column] - col_mean)/col_std
            cols_to_leave = [c for c in z_scores_data_frame.columns if 'zscore' in c]
            self.z_scores_data_frame = z_scores_data_frame[cols_to_leave]
            return self.z_scores_data_frame
    
    只是附带说明:这是我的类方法,但经过一些小的修改后,可以作为一个独立的函数使用

    def calculate_z_scores(self, list_of_windows, freq_flag='D'):
            """
            Calculates rolling z-scores and cumulative z-scores based on given list
            of time windows
    
            Parameters
            ----------
            list_of_windows : list
                a list of time windows.
            freq_flag : string
                frequency flag. The default is 'D' (daily)
    
            Returns
            -------
            data frame
                a data frame with calculated rolling & cumulative z-score.
            """
            z_scores_data_frame = self.original_data_frame.copy()
            # get column with values (1st column)
            val_column = z_scores_data_frame.columns[0]
            len_ = len(z_scores_data_frame)
            # calculating statistics for cumulative_zscore
            z_scores_data_frame['mean_past'] = [np.mean(z_scores_data_frame[val_column][0:lv+1]) for lv in range(0,len_)]
            z_scores_data_frame['std_past'] = [np.std(z_scores_data_frame[val_column][0:lv+1]) for lv in range(0,len_)]
            z_scores_data_frame['zscore_cum'] = (z_scores_data_frame[val_column] - z_scores_data_frame['mean_past']) / z_scores_data_frame['std_past']
            # taking care of rolling z_scores
            for i in list_of_windows:
                col_mean = z_scores_data_frame[val_column].rolling(window=i).mean()
                col_std = z_scores_data_frame[val_column].rolling(window=i).std()
                z_scores_data_frame['zscore' + '_' + str(i)+ freq_flag] = (z_scores_data_frame[val_column] - col_mean)/col_std
            cols_to_leave = [c for c in z_scores_data_frame.columns if 'zscore' in c]
            self.z_scores_data_frame = z_scores_data_frame[cols_to_leave]
            return self.z_scores_data_frame