Python 3.x 多方向时间序列数据拟合模型_Python 3.x_Scikit Learn_Time Series_Classification_Multivariate Testing

Python 3.x 多方向时间序列数据拟合模型

python-3.x scikit-learn

Python 3.x 多方向时间序列数据拟合模型,python-3.x,scikit-learn,time-series,classification,multivariate-testing,Python 3.x,Scikit Learn,Time Series,Classification,Multivariate Testing,我的主要问题是如何塑造数据以适应模型上的多方向时间序列数据。我当前的代码如下所示： def diff_stats_mod (X_train, X_test, y_train, y_test): ################init######################## score_dict = {} n=0 ################Create a list of models to evaluate################

我的主要问题是如何塑造数据以适应模型上的多方向时间序列数据。我当前的代码如下所示：

def diff_stats_mod (X_train, X_test, y_train, y_test):
    ################init########################
    score_dict = {}
    n=0

    ################Create a list of models to evaluate################
    models, names = list(), list()

    models.append(LogisticRegression())
    names.append('LR')

    models.append(DecisionTreeClassifier())
    names.append('DTC')

    models.append(SVC())
    names.append('SVM')

    models.append(RandomForestClassifier())
    names.append('RF')

    models.append(GradientBoostingClassifier())
    names.append('GBM')

    ################evaluate models################
    for i in range(len(models)):
        model = models[i]
        model.fit(X_train, y_train)
        pred = model.predict(X_test)
        not_include = 0

        ###############Ensure that the prediction is not all positive or all neg###############
        while len(set(pred)) == 1:
            model = models[i]
            model.fit(X_train, y_train)
            pred = model.predict(X_test)
            if n == 10:
                not_include = 1
                break
            n+=1

        ###############Exclude all models whos predictions are off the same class only###############
        if not_include != 1:
            confu_mat = confusion_matrix(y_test, pred)
            fb_score = fbeta_score(y_test, pred, 0.9) * 100
            score_dict['{}'.format(names[i])] = fb_score
            score_dict['{} confusion matrix'.format(names[i])] = confu_mat

        else:
            fb_score = NaN
            score_dict['{}'.format(names[i])] = fb_score

    ################try a range of k values################
    for k in range(1, 11):
        ################Load and evaluate knn models################
        not_include = 0

        model = KNeighborsClassifier(n_neighbors=k)
        model.fit(X_train, y_train)
        pred = model.predict(X_test)

        ###############Ensure that the prediction is not all positive or all neg###############
        while len(set(pred)) == 1:
            model = KNeighborsClassifier(n_neighbors=k)
            model.fit(X_train, y_train)
            pred = model.predict(X_test)
            if n == 10:
                not_include = 1
                break
            n += 1

        ###############Exclude all models whos predictions are off the same class only###############
        if not_include != 1:
            confu_mat = confusion_matrix(y_test, pred)
            fb_score = fbeta_score(y_test, pred, 0.9) * 100
            score_dict['KNN{}'.format(k)] = fb_score
            score_dict['KNN{} confusion matrix'.format(k)] = confu_mat

        else:
            fb_score = NaN
            score_dict['KNN{}'.format(k)] = fb_score

    return score_dict

基本上，此函数返回测试集中每个模型的fbeta分数。它将重新训练提供所有相同类别预测的模型（最多十次），如果在十次之后，该特定模型仍将所有预测输出为同一类别，则将其排除

这是我的数据片段：

time_stamp          pxID    act                 hr
2015-06-06 17:00:00 7983    8.466666666666667   97.46555633544922
2015-06-06 17:30:00 7983    10.413333333333332  99.16444473266601
2015-06-06 18:00:00 7983    5.400000000000001   94.62666702270508
2015-06-06 18:30:00 7983    14.759999999999998  95.76777776082356
2015-06-06 19:00:00 7983    17.026666666666667  100.43111089070638
2015-08-04 10:30:00 8005    4.774020720186061   18.555715289243377
2015-08-04 11:00:00 8005    7.1056325549244574  20.01443100917877
2015-08-04 11:30:00 8005    9.088101464843694   24.019171214407546
2015-08-04 12:00:00 8005    4.32230745513258    20.9444548661983
2015-08-04 12:30:00 8005    4.464612178539353   18.433279992371574
2015-08-16 19:00:00 8026    1.4452551387583383  9.943809217794078
2015-08-16 19:30:00 8026    2.7265866427381216  13.206866297538518
2015-08-16 20:00:00 8026    2.2795014957992974  9.11883132666883
2015-08-16 20:30:00 8026    1.536946186246722   10.04255596582319
2015-08-16 21:00:00 8026    2.0673098515634667  9.219173212211949

基本上，有许多ID和OberSave。当我试图将这些数据输入模型时，数据的维度出现了一个错误。我知道logistic回归等模型可以接受多维输入，但我不确定如何为此设置输入格式，也不确定logistic回归和其他模型中需要包含哪些参数，以便它能够处理多维数据。对于这个分类问题，我想使用HR和Act数据

我对如何解决这个问题感到困惑，因为我习惯于处理每行反映一个观察结果的数据。然而，这些数据表明，多行反映了一个观察结果

我这里的主要问题是：如何格式化数据以用作SKlearn模型的输入