Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/297.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 通过sklearn模型后将ID列重新附加到数据_Python_Python 3.x_Pandas_Scikit Learn - Fatal编程技术网

Python 通过sklearn模型后将ID列重新附加到数据

Python 通过sklearn模型后将ID列重新附加到数据,python,python-3.x,pandas,scikit-learn,Python,Python 3.x,Pandas,Scikit Learn,我正在努力创建一个预测回归模型,预测一些订单的完成日期 我的数据集看起来像: | ORDER_NUMBER | Feature1 | Feature2 | Feature3 | Feature4 | Feature5 | Feature6 | TOTAL_DAYS_TO_COMPLETE | Feature8 | Feature9 | Feature10 | Feature11 | Feature12 | Feature13 | Feature14 | Feature15 | Feature16

我正在努力创建一个预测回归模型,预测一些订单的完成日期

我的数据集看起来像:

| ORDER_NUMBER | Feature1 | Feature2 | Feature3 | Feature4 | Feature5 | Feature6 | TOTAL_DAYS_TO_COMPLETE | Feature8 | Feature9 | Feature10 | Feature11 | Feature12 | Feature13 | Feature14 | Feature15 | Feature16 | Feature17 | Feature18 | Feature19 | Feature20 | Feature21 | Feature22 | Feature23 | Feature24 | Feature25 | Feature26 | Feature27 | Feature28 | Feature29 | Feature30 | Feature31 |
|:------------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:----------------------:|:--------:|:--------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|
|   102203591  |    12    |   2014   |    10    |   2014   |     1    |   2015   |           760          |    50    |    83    |     5     |     6     |     12    |     18    |     31    |     8     |     0     |     1     |     0     |     1     |     16    |   131.29  |  24.3768  |   158.82  |    1.13   |    6.52   |     10    |     51    |     39    |     27    |     88    |  1084938  |
|   102231010  |     2    |   2015   |     1    |   2015   |     2    |   2015   |           706          |    35    |    34    |     2     |     1     |     4     |     3     |     3     |     3     |     0     |     0     |     0     |     1     |     2     |   11.95   |   5.162   |   17.83   |    1.14   |    3.45   |     1     |     4     |     20    |     16    |     25    |   367140  |
|   102251893  |     6    |   2015   |     4    |   2015   |     3    |   2015   |          1143          |    36    |    43    |     1     |     2     |     4     |     5     |     6     |     3     |     1     |     0     |     0     |     1     |     5     |    8.55   |   5.653   |   34.51   |    4.59   |    6.1    |     0     |     1     |     17    |     30    |     12    |   103906  |
|   102287793  |     4    |   2015   |     2    |   2015   |     4    |   2015   |           733          |    45    |    71    |     4     |     1     |     6     |     35    |    727    |     6     |     0     |     3     |     15    |     0     |     19    |   174.69  |   97.448  |   319.98  |    1.49   |    3.28   |     20    |    113    |     71    |     59    |     71    |  1005041  |
|   102288060  |     6    |   2015   |     5    |   2015   |     4    |   2015   |          1092          |    26    |    21    |     1     |     1     |     3     |     2     |     2     |     1     |     0     |     0     |     0     |     0     |     2     |    4.73   |   4.5363  |   18.85   |    3.11   |    4.16   |     0     |     1     |     16    |     8     |     16    |   69062   |
|   102308069  |     8    |   2015   |     6    |   2015   |     5    |   2015   |           676          |    41    |    34    |     2     |     0     |     3     |     2     |     2     |     1     |     0     |     0     |     0     |     0     |     2     |    2.98   |   6.1173  |    11.3   |    1.36   |    1.85   |     0     |     1     |     17    |     12    |     3     |   145887  |
|   102319918  |     8    |   2015   |     7    |   2015   |     6    |   2015   |           884          |    25    |    37    |     1     |     1     |     3     |     2     |     3     |     2     |     0     |     0     |     1     |     0     |     2     |    5.57   |   3.7083  |    9.18   |    0.97   |    2.48   |     0     |     1     |     14    |     5     |     7     |   45243   |
|   102327578  |     6    |   2015   |     4    |   2015   |     6    |   2015   |           595          |    49    |    68    |     3     |     5     |     9     |     11    |     13    |     5     |     4     |     2     |     0     |     1     |     10    |   55.41   |  24.3768  |   104.98  |    2.03   |    4.31   |     10    |     51    |     39    |     26    |     40    |   418266  |
|   102337989  |     7    |   2015   |     5    |   2015   |     7    |   2015   |           799          |    50    |    66    |     5     |     6     |     12    |     21    |     29    |     12    |     0     |     0     |     0     |     1     |     20    |   138.79  |  24.3768  |   172.56  |    1.39   |    7.08   |     10    |     51    |     39    |     34    |    101    |  1229299  |
|   102450069  |     8    |   2015   |     7    |   2015   |    11    |   2015   |           456          |    20    |    120   |     2     |     1     |     3     |     12    |     14    |     8     |     0     |     0     |     0     |     0     |     7     |    2.92   |   6.561   |    12.3   |    1.43   |    1.87   |     2     |     1     |     15    |     6     |     6     |   142805  |
|   102514564  |     5    |   2016   |     3    |   2016   |     2    |   2016   |           639          |    25    |    35    |     1     |     2     |     4     |     3     |     6     |     3     |     0     |     0     |     0     |     0     |     3     |    4.83   |   4.648   |   14.22   |    2.02   |    3.06   |     0     |     1     |     15    |     5     |     13    |   62941   |
|   102528121  |    10    |   2015   |     9    |   2015   |     3    |   2016   |           413          |    15    |    166   |     1     |     1     |     3     |     2     |     3     |     2     |     0     |     0     |     0     |     0     |     2     |    4.23   |   1.333   |   15.78   |    8.66   |   11.84   |     1     |     4     |     8     |     6     |     3     |   111752  |
|   102564376  |     1    |   2016   |    12    |   2015   |     4    |   2016   |           802          |    27    |    123   |     2     |     1     |     4     |     3     |     3     |     3     |     0     |     1     |     0     |     0     |     3     |    1.27   |   2.063   |    6.9    |    2.73   |    3.34   |     1     |     4     |     14    |     20    |     6     |   132403  |
|   102564472  |     1    |   2016   |    12    |   2015   |     4    |   2016   |           817          |    27    |    123   |     0     |     1     |     2     |     1     |     1     |     1     |     0     |     0     |     0     |     0     |     1     |    1.03   |   2.063   |    9.86   |    4.28   |    4.78   |     1     |     4     |     14    |     22    |     4     |   116907  |
|   102599569  |     2    |   2016   |    12    |   2015   |     5    |   2016   |           425          |    47    |    151   |     1     |     2     |     4     |     3     |     4     |     3     |     0     |     0     |     0     |     0     |     2     |   27.73   |  15.8993  |    60.5   |    2.06   |    3.81   |     12    |    108    |     34    |     24    |     20    |   119743  |
|   102599628  |     2    |   2016   |    12    |   2015   |     5    |   2016   |           425          |    47    |    151   |     3     |     4     |     8     |     8     |     9     |     7     |     0     |     0     |     0     |     2     |     8     |   39.28   |  14.8593  |   91.26   |    3.5    |    6.14   |     12    |    108    |     34    |     38    |     15    |   173001  |
|   102606421  |     3    |   2016   |    12    |   2015   |     5    |   2016   |           965          |    55    |    161   |     5     |     11    |     17    |     29    |     44    |     11    |     1     |     1     |     0     |     1     |     22    |   148.06  |  23.7983  |   195.69  |     2     |    8.22   |     10    |     51    |     39    |     47    |    112    |  1196097  |
|   102621293  |     7    |   2016   |     5    |   2016   |     6    |   2016   |           701          |    42    |    27    |     2     |     1     |     4     |     3     |     3     |     1     |     0     |     0     |     0     |     1     |     2     |    8.39   |   3.7455  |   13.93   |    1.48   |    3.72   |     1     |     5     |     14    |     14    |     20    |   258629  |
|   102632364  |     7    |   2016   |     6    |   2016   |     6    |   2016   |           982          |    41    |    26    |     4     |     2     |     7     |     6     |     6     |     2     |     0     |     0     |     0     |     1     |     4     |   26.07   |   2.818   |   37.12   |    3.92   |   13.17   |     1     |     5     |     14    |     22    |     10    |   167768  |
|   102643207  |     9    |   2016   |     9    |   2016   |     7    |   2016   |           255          |     9    |    73    |     3     |     1     |     5     |     4     |     4     |     2     |     0     |     0     |     0     |     0     |     0     |    2.17   |   0.188   |    4.98   |   14.95   |   26.49   |     1     |     4     |     2     |     11    |     1     |   49070   |
|   102656091  |     9    |   2016   |     8    |   2016   |     7    |   2016   |           356          |    21    |    35    |     1     |     0     |     2     |     1     |     1     |     1     |     0     |     0     |     0     |     0     |     1     |    1.45   |   2.0398  |    5.54   |    2.01   |    2.72   |     1     |     4     |     14    |     15    |     3     |   117107  |
|   102660407  |     9    |   2016   |     8    |   2016   |     7    |   2016   |           462          |    21    |    31    |     2     |     0     |     3     |     2     |     2     |     1     |     0     |     0     |     0     |     0     |     2     |    3.18   |   2.063   |    8.76   |    2.7    |    4.25   |     1     |     4     |     14    |     14    |     10    |   151272  |
|   102665666  |    10    |   2016   |     9    |   2016   |     7    |   2016   |           235          |     9    |    64    |     0     |     1     |     2     |     1     |     2     |     1     |     0     |     0     |     0     |     0     |     0     |     1     |   0.188   |    2.95   |   10.37   |   15.69   |     1     |     4     |     2     |     10    |     1     |   52578   |
|   102665667  |    10    |   2016   |     9    |   2016   |     7    |   2016   |           235          |     9    |    64    |     0     |     1     |     2     |     1     |     2     |     1     |     0     |     0     |     0     |     0     |     0     |    0.72   |   0.188   |    2.22   |    7.98   |   11.81   |     1     |     4     |     2     |     10    |     1     |   52578   |
|   102665668  |    10    |   2016   |     9    |   2016   |     7    |   2016   |           235          |     9    |    64    |     0     |     1     |     2     |     1     |     2     |     1     |     0     |     0     |     0     |     0     |     0     |    0.9    |   0.188   |    2.24   |    7.13   |   11.91   |     1     |     4     |     2     |     10    |     1     |   52578   |
|   102666306  |     7    |   2016   |     6    |   2016   |     7    |   2016   |           235          |    16    |    34    |     3     |     1     |     5     |     5     |     6     |     4     |     0     |     0     |     0     |     0     |     3     |   14.06   |   3.3235  |   31.27   |    5.18   |    9.41   |     1     |     1     |     16    |     5     |     18    |   246030  |
|   102668177  |     8    |   2016   |     6    |   2016   |     8    |   2016   |           233          |    36    |    32    |     0     |     1     |     2     |     1     |     1     |     1     |     0     |     0     |     0     |     0     |     1     |    2.5    |   5.2043  |    8.46   |    1.15   |    1.63   |     0     |     1     |     14    |     2     |     4     |   89059   |
|   102669909  |     6    |   2016   |     4    |   2016   |     8    |   2016   |           244          |    46    |    105   |     4     |     11    |     16    |     28    |     30    |     15    |     1     |     2     |     1     |     1     |     25    |   95.49   |   26.541  |   146.89  |    1.94   |    5.53   |     1     |     51    |     33    |     9     |     48    |   78488   |
|   102670188  |     5    |   2016   |     4    |   2016   |     8    |   2016   |           413          |    20    |    109   |     1     |     1     |     2     |     2     |     3     |     2     |     0     |     0     |     0     |     0     |     1     |    2.36   |   6.338   |    8.25   |    0.93   |    1.3    |     2     |     1     |     14    |     5     |     3     |   117137  |
|   102671063  |     8    |   2016   |     6    |   2016   |     8    |   2016   |           296          |    46    |    44    |     2     |     4     |     7     |     7     |    111    |     3     |     1     |     0     |     1     |     0     |     7     |   12.96   |   98.748  |   146.24  |    1.35   |    1.48   |     20    |    113    |     70    |     26    |     9     |   430192  |
|   102672475  |     8    |   2016   |     7    |   2016   |     8    |   2016   |           217          |    20    |    23    |     0     |     1     |     2     |     1     |     2     |     1     |     0     |     0     |     0     |     0     |     1     |    0.5    |   4.9093  |    5.37   |    0.99   |    1.09   |     0     |     1     |     16    |     0     |     1     |   116673  |
|   102672477  |    10    |   2016   |     9    |   2016   |     8    |   2016   |           194          |    20    |    36    |     1     |     0     |     2     |     1     |     1     |     1     |     0     |     0     |     0     |     0     |     1     |    0.61   |   5.1425  |    3.65   |    0.59   |    0.71   |     0     |     1     |     16    |     0     |     2     |   98750   |
|   102672513  |    10    |   2016   |     9    |   2016   |     8    |   2016   |           228          |    20    |    36    |     1     |     1     |     3     |     2     |     2     |     1     |     0     |     0     |     0     |     0     |     1     |    0.25   |   5.1425  |    6.48   |    1.21   |    1.26   |     0     |     1     |     16    |     0     |     2     |   116780  |
|   102682943  |     5    |   2016   |     4    |   2016   |     8    |   2016   |           417          |    20    |    113   |     0     |     1     |     1     |     1     |     1     |     1     |     0     |     0     |     0     |     0     |     1     |    0.64   |   6.338   |    5.53   |    0.77   |    0.87   |     2     |     1     |     14    |     5     |     2     |   100307  |
ORDER\u NUMBER
不应该是模型中的一个特性——它是一个唯一的标识符,我不想在模型中计算它,因为它只是一个随机ID,而是包含在最终的数据集中,所以我可以将预测和实际值与订单联系起来

目前,我的代码如下所示:

from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
import pandas as pd
import numpy as np

def get_feature_importances(cols, importances):
    feats = {}
    for feature, importance in zip(cols, importances):
        feats[feature] = importance

    importances = pd.DataFrame.from_dict(feats, orient='index').rename(columns={0: 'Gini-importance'})

    return importances.sort_values(by='Gini-importance', ascending = False)

def compare_values(arr1, arr2):
    thediff = 0
    thediffs = []
    for thing1, thing2 in zip(arr1, arr2):
        thediff = abs(thing1 - thing2)
        thediffs.append(thediff)

    return thediffs

def print_to_file(filepath, arr):
    with open(filepath, 'w') as f:
        for item in arr:
            f.write("%s\n" % item)

# READ IN THE DATA TABLE ABOVE        
data = pd.read_csv('test.csv')

# create the labels, or field we are trying to estimate
label = data['TOTAL_DAYS_TO_COMPLETE']
# remove the header
label = label[1:]

# create the data, or the data that is to be estimated
data = data.drop('TOTAL_DAYS_TO_COMPLETE', axis=1)

# Remove the order number since we don't need it
data = data.drop('ORDER_NUMBER', axis=1)

# remove the header
data = data[1:]

# # split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data, label, test_size = 0.2)

rf = RandomForestRegressor(
    bootstrap = True,
    max_depth = None,
    max_features = 'sqrt',
    min_samples_leaf = 1,
    min_samples_split = 2,
    n_estimators  = 5000
)
rf.fit(X_train, y_train)
rf_predictions = rf.predict(X_test)
rf_differences = compare_values(y_test, rf_predictions)
rf_Avg = np.average(rf_differences)
print("#################################################")
print("DATA FOR RANDOM FORESTS")
print(rf_Avg)
importances = get_feature_importances(X_test.columns, rf.feature_importances_)
print()
print(importances)
如果我
print(y\u测试)
print(rf\u预测)
我得到如下结果:

**print(y_test)**
7
155
84
64
49
41
200
168
43
111
64
46
96
47
50
27
216
..

**print(rf_predictions)**
34.496
77.366
69.6105
61.6825
80.8495
79.8785
177.5465
129.014
70.0405
97.3975
82.4435
57.9575
108.018
57.5515
..
它是有效的。如果我打印出
y\u测试
rf\u预测
,我会得到测试数据和预测标签值的标签

但是,我想看看与
y\u测试值和
rf\u预测值相关联的顺序。如何保留该数据集并创建数据帧(如下所示):


我试过寻找,但找不到解决办法。我确实尝试了打印(y\u测试,rf\u预测)
,但这没有任何效果,因为我有
.drop()
订单号
字段。

当您使用pandas数据帧时,索引保留在所有x/y列车/测试数据集中,因此您可以在应用模型后重新组装它。我们只需要在删除该列之前保存订单号:
order\u NUMBER=data['order\u NUMBER']
。预测
rf\u预测
以与输入数据相同的顺序返回到
rf.predict(X\u测试)
,即
rf\u预测[i]
属于
X\u测试。iloc[i]

这将创建所需的结果数据集:

res = y_test.to_frame('Actual Value')
res.insert(0, 'Predicted Value', rf_predictions)
res = order_numbers.to_frame().join(res, how='inner')

顺便说一句,
data=data[1://code>不会删除标题,它会删除第一行,因此在处理数据帧时不需要删除任何内容

因此,最终计划将是:

from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
import pandas as pd
import numpy as np

def get_feature_importances(cols, importances):
    feats = {}
    for feature, importance in zip(cols, importances):
        feats[feature] = importance

    importances = pd.DataFrame.from_dict(feats, orient='index').rename(columns={0: 'Gini-importance'})

    return importances.sort_values(by='Gini-importance', ascending = False)

def compare_values(arr1, arr2):
    thediff = 0
    thediffs = []
    for thing1, thing2 in zip(arr1, arr2):
        thediff = abs(thing1 - thing2)
        thediffs.append(thediff)

    return thediffs

def print_to_file(filepath, arr):
    with open(filepath, 'w') as f:
        for item in arr:
            f.write("%s\n" % item)

# READ IN THE DATA TABLE ABOVE        
data = pd.read_csv('test.csv')

# create the labels, or field we are trying to estimate
label = data['TOTAL_DAYS_TO_COMPLETE']

# create the data, or the data that is to be estimated
data = data.drop('TOTAL_DAYS_TO_COMPLETE', axis=1)

# Remove the order number since we don't need it
order_numbers = data['ORDER_NUMBER']
data = data.drop('ORDER_NUMBER', axis=1)

# # split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data, label, test_size = 0.2)

rf = RandomForestRegressor(
    bootstrap = True,
    max_depth = None,
    max_features = 'sqrt',
    min_samples_leaf = 1,
    min_samples_split = 2,
    n_estimators  = 5000
)
rf.fit(X_train, y_train)
rf_predictions = rf.predict(X_test)
rf_differences = compare_values(y_test, rf_predictions)
rf_Avg = np.average(rf_differences)
print("#################################################")
print("DATA FOR RANDOM FORESTS")
print(rf_Avg)
importances = get_feature_importances(X_test.columns, rf.feature_importances_)
print()
print(importances)

res = y_test.to_frame('Actual Value')
res.insert(0, 'Predicted Value', rf_predictions)
res = order_numbers.to_frame().join(res, how='inner')
print(res)
根据上面的示例数据,我们得到(对于
随机状态=1的
列车测试\u分割
):


当您使用熊猫数据帧时,索引将保留在所有x/y训练/测试数据集中,因此您可以在应用模型后重新组装它。我们只需要在删除该列之前保存订单号:
order\u NUMBER=data['order\u NUMBER']
。预测
rf\u预测
以与输入数据相同的顺序返回到
rf.predict(X\u测试)
,即
rf\u预测[i]
属于
X\u测试。iloc[i]

这将创建所需的结果数据集:

res = y_test.to_frame('Actual Value')
res.insert(0, 'Predicted Value', rf_predictions)
res = order_numbers.to_frame().join(res, how='inner')

顺便说一句,
data=data[1://code>不会删除标题,它会删除第一行,因此在处理数据帧时不需要删除任何内容

因此,最终计划将是:

from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
import pandas as pd
import numpy as np

def get_feature_importances(cols, importances):
    feats = {}
    for feature, importance in zip(cols, importances):
        feats[feature] = importance

    importances = pd.DataFrame.from_dict(feats, orient='index').rename(columns={0: 'Gini-importance'})

    return importances.sort_values(by='Gini-importance', ascending = False)

def compare_values(arr1, arr2):
    thediff = 0
    thediffs = []
    for thing1, thing2 in zip(arr1, arr2):
        thediff = abs(thing1 - thing2)
        thediffs.append(thediff)

    return thediffs

def print_to_file(filepath, arr):
    with open(filepath, 'w') as f:
        for item in arr:
            f.write("%s\n" % item)

# READ IN THE DATA TABLE ABOVE        
data = pd.read_csv('test.csv')

# create the labels, or field we are trying to estimate
label = data['TOTAL_DAYS_TO_COMPLETE']

# create the data, or the data that is to be estimated
data = data.drop('TOTAL_DAYS_TO_COMPLETE', axis=1)

# Remove the order number since we don't need it
order_numbers = data['ORDER_NUMBER']
data = data.drop('ORDER_NUMBER', axis=1)

# # split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data, label, test_size = 0.2)

rf = RandomForestRegressor(
    bootstrap = True,
    max_depth = None,
    max_features = 'sqrt',
    min_samples_leaf = 1,
    min_samples_split = 2,
    n_estimators  = 5000
)
rf.fit(X_train, y_train)
rf_predictions = rf.predict(X_test)
rf_differences = compare_values(y_test, rf_predictions)
rf_Avg = np.average(rf_differences)
print("#################################################")
print("DATA FOR RANDOM FORESTS")
print(rf_Avg)
importances = get_feature_importances(X_test.columns, rf.feature_importances_)
print()
print(importances)

res = y_test.to_frame('Actual Value')
res.insert(0, 'Predicted Value', rf_predictions)
res = order_numbers.to_frame().join(res, how='inner')
print(res)
根据上面的示例数据,我们得到(对于
随机状态=1的
列车测试\u分割
):


索引是否相同?也就是说,
订单号[0]
对应于
预测值[0]
实际值[0]
?如果是这样,那么将它们连接起来,可能像
pd.DataFrame([order\u number,predicted\u value,actual\u value],columns=['order number','predicted value','actual value'))
我不知道它们是否是。我的理解是,
sklearn.model\u selection.train\u test\u split
会随机化数据,所以我不这么认为。索引是否相同?也就是说,
订单号[0]
对应于
预测值[0]
实际值[0]
?如果是这样,那么将它们连接起来,可能像
pd.DataFrame([order\u number,predicted\u value,actual\u value],columns=['order number','predicted value','actual value'))
我不知道它们是否是。我的理解是,
sklearn.model\u selection.train\u test\u split
会随机化数据,所以我认为不会。谢谢@Stef。我很困惑。
train\u test\u split的
random\u state
参数是否将数据集随机化?那么,如何将订单号追加回该订单号才是准确的呢?洗牌中未指定
ORDER\u NUMBER
列。
random\u state
仅为随机数生成器种子;我把它放在这里只是为了让这个例子重现。如果不使用此参数,则每次从
train\u test\u split
运行时都会得到不同的行,这很好,因此与您的问题无关。我在第一段解释了它是如何工作的。确实,洗牌中不包括
顺序号
,但数据帧的索引是。只需运行代码并打印X_-train、X_-test、y_-train和y_-test即可查看。谢谢@Stef。我很困惑。
train\u test\u split的
random\u state
参数是否将数据集随机化?那么,如何将订单号追加回该订单号才是准确的呢?洗牌中未指定
ORDER\u NUMBER
列。
random\u state
仅为随机数生成器种子;我把它放在这里只是为了让这个例子重现。如果不使用此参数,则每次从
train\u test\u split
运行时都会得到不同的行,这很好,因此与您的问题无关。我在第一段解释了它是如何工作的。确实,洗牌中不包括
顺序号
,但数据帧的索引是。只需运行代码并打印X_-train、X_-test、y_-train和y_-test即可查看。