Python 计算预测区间_Python_Numpy_Machine Learning_Regression_Prediction

Python 计算预测区间

python numpy machine-learning

Python 计算预测区间,python,numpy,machine-learning,regression,prediction,Python,Numpy,Machine Learning,Regression,Prediction,数据帧示例： new_host split sequence expression FALSE train AQVPYGVS 0.039267878 FALSE train ASVPYGVSI 0.039267878 FALSE train STNLYGSGR 0.261456561 FALSE valid NLYGSGLVR 0.265188519 FALSE valid SLGP

数据帧示例：

new_host  split     sequence    expression
FALSE     train     AQVPYGVS    0.039267878
FALSE     train     ASVPYGVSI   0.039267878
FALSE     train     STNLYGSGR   0.261456561
FALSE     valid     NLYGSGLVR   0.265188519
FALSE     valid     SLGPSNLYG   0.419680588
FALSE     valid     ATSLGTTNG   0.145710993

我试图计算我的模型的保形预测（PLS回归模型），这是基于基于校准数据的保形函数计算预测间隔（目标是我序列的表达式）。我的算法基于以下内容：

基本上，我有：

划分我的数据集

使我的模型符合培训数据

将我的一致性函数定义为预测标签和真实标签之间的绝对误差

将一致性函数应用于我的校准数据集

从第4步开始计算我的伤口

现在我需要根据显著性水平确定预测间隔。我在计算数据集的间隔时遇到问题。我不断地遇到不同的numpy错误，我不确定如何继续在我上一个类方法conformal_predictions中发现的这种情况。我在哪里计算预测间隔

下面是我的代码片段，我希望这不是太模糊，如果需要，我仍然可以提供额外的信息

def data_split(df):
    train = df.loc[df['split'] == 'train']
    valid = df.loc[df['split'] == 'valid']
    X_test = valid.iloc[:,:-1]
    y_test = valid.iloc[:,-1] 
    X_train = train.iloc[:,:-1] 
    y_train = train.iloc[:,-1] 
    X_train, X_cal, y_train, y_cal = train_test_split(X_train, y_train, test_size =0.2)
    print("Data has been split")
    print("X_train and y_train shape: "+ str(X_train.shape) + str(y_train.shape))
    print("X_cal and y_cal shape: "+ str(X_cal.shape) + str(y_cal.shape))
    print('{} instances, {} features, {} classes'.format(y_train.size,
                                                   X_train.shape[1],
                                                   np.unique(y_train).size))
    return X_test, y_test, X_train, y_train, X_cal, y_cal

我的

不符合类

：

class NonConformist():

def __init__(self, model):
    self.model = model

def underlying_fit (self, X_train, y_train):
    '''
        Train underlying model on proper training data
        
        @Params
        X_train: has shape (n_train, n_features)
        y_train: has shape (n_train)
    '''
    
    self.model.fit(X_train,y_train)
    print("Model has been fitted")
    
def calibration_predictions(self, X_cal):
    '''
        Obtain predictions from the underlying model using X_cal data. 
        Returns an output of predicted real values as numpy.array of shape (n_test)
        
    @params
    X_cal: numpy array has shape (n_train, n_features)
    '''
    calibration_predictions = self.model.predict(X_cal)

    print("Calibration Predictions Established")
    return calibration_predictions
            
def test_predictions(self, X_test):
    '''
        Obtain predictions from the underlying model using X_test data. 
        Returns an output of predicted real values as numpy.array of shape (n_test)
        
    @params
    X_test: numpy array has shape (n_train, n_features)
    '''
    test_predictions = self.model.predict(X_test)

    print("Calibration Predictions Established")
    return test_predictions
  
def calibration_scores(calibration_predictions, y_cal):
    '''
    Calculates absolute error nonconformity for calibration set.
   For each correct output in ``y``, nonconformity is defined as
   math::
   | y_i (predicted labels) - y^_i (true labels)|     
   
   @params
    true_labels is a numpy array of (true) labels 
    predictions is a numpy array of predicted labels'''
           
    true_labels = np.array(y_cal)
    
    calibration_scores = np.abs(calibration_predictions - true_labels)
    calibration_scores = np.sort(calibration_scores)[::-1] #sort in descending order
    print("Calibration Scores Obtained")
    
    return calibration_scores

def partial_inverse(self, calibration_scores, significance):
    ''' 
    This function is the partial inverse of the nonconformity function (calibration_scores) in order to
    calculate the prediction intervals where:
    
    apply_inverse(...)[0] is subtracted from the prediction of the
            underlying model to create the lower boundary of the
            prediction interval
        apply_inverse(...)[1] is added to the prediction of the
            underlying model to create the upper boundary of the
            prediction interval
    
    @params
    Significance is a float between 0-1 (i.e. 0.05)
    '''
    
    border = int(np.floor(significance * (calibration_scores.size + 1))) - 1
    border = min(max(border, 0), calibration_scores.size - 1)
    
    return np.vstack([calibration_scores[border], calibration_scores[border]])

def conformal_predictions(self, X_test, calibration_scores, significance, test_predictions):
    """This function creates the prediction intervals based from a set of test examples.
    This takes the predictions for each test pattern with the underlying model
    and applies the conformity function to each prediction, resulting in 
    a final prediction interval for each test pattern. 
    
    Predicts the output of each test pattern using the underlying model,
    and applies the (partial) inverse nonconformity function to each
    prediction, resulting in a prediction interval for each test pattern.
    
    @params
    ----------
    X_test: consists of a numpy array of shape [n_samples, n_features]

    significance level : is a float between 0 and 1; determimned as 
    the maximum allowed error rate of predictions.
    
    Returns
    -------
    p : numpy array of shape [n_samples, 2] or [n_samples, 2, 99]
    If significance is ``None``, then p contains the interval (minimum
    and maximum boundaries) for each test pattern, and each significance
    level (0.01, 0.02, ..., 0.99). 
           
    If significance value is a float between
    0 and 1, then p contains the prediction intervals (minimum and
    maximum boundaries) for the set of test patterns at the chosen
    significance level.
    """
    n_test = X_test.shape[0]
    prediction = self.model.predict(x)
    norm = np.ones(n_test)

    if significance:
        intervals = np.zeros((x.shape[0], 2)) #creates empty 2D numpy array
        err_dist = self.err_func.partial_inverse(calibration_scores, significance)
        err_dist = np.hstack([err_dist] * n_test)
        err_dist *= norm

        intervals[:, 0] = test_predictions - err_dist[0, :]
        intervals[:, 1] = test_predictions + err_dist[1, :]

        return intervals
    
    else:
        significance = np.arange(0.01, 1.0, 0.01)
        intervals = np.zeros((x.shape[0], 2, significance.size))

        for i, s in enumerate(significance):
            err_dist = self.err_func.apply_inverse(nc, s)
            err_dist = np.hstack([err_dist] * n_test)
            err_dist *= norm

            intervals[:, 0, i] = prediction - err_dist[0, :]
            intervals[:, 1, i] = prediction + err_dist[0, :]

        return intervals

请注意，“我经常遇到不同的numpy错误”，而不提及任何具体错误，这对愿意提供帮助的潜在受访者来说几乎没有帮助。“请花点时间阅读，并说明原因。”谢谢，我会修改我的答案。我仍然是这个论坛的新手，试图找出它的用处。请注意，“我一直遇到各种各样的错误”，而不提及任何具体的错误，对于愿意提供帮助的潜在受访者来说几乎没有帮助。“请花点时间阅读，并说明原因。”谢谢，我会修改我的答案。我仍然是这个论坛的新手，并试图找出它的用途