Dataframe test_train_split()与按索引切片数据集之间的精度结果不同

Dataframe test_train_split()与按索引切片数据集之间的精度结果不同,dataframe,split,Dataframe,Split,我使用split_train_test()分割数据集,标准化数据并运行回归。 同一数据集我通过索引分割数据和切片,标准化数据,不使用split_train_test() 然后运行回归。 我得到了不同的准确度结果。 你能解释一下原因吗 from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.23, rando

我使用split_train_test()分割数据集,标准化数据并运行回归。 同一数据集我通过索引分割数据和切片,标准化数据,不使用split_train_test() 然后运行回归。 我得到了不同的准确度结果。 你能解释一下原因吗

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, 
test_size=0.23, random_state=42)

#Normalise training data
from sklearn import preprocessing

normalizer = preprocessing.Normalizer()
normalized_train_X = normalizer.fit_transform(X_train)
normalized_train_X

#Normalize testing data
normalized_test_X = normalizer.transform(X_test)
normalized_test_X

from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression()
logreg.fit(normalized_train_X, y_train)

accuracy = logreg.score(normalized_test_X, y_test)
准确度=60.71428571428571%

不使用测试\u列\u拆分:

dfTrain = df[:1000]
dfTest = df[1000:1300]
dfCheck = df[1300:]
trainLabel = np.asarray(dfTrain['insuranceclaim'])
trainData = np.asarray(dfTrain.drop('insuranceclaim',1))
testLabel = np.asarray(dfTest['insuranceclaim'])
testData = np.asarray(dfTest.drop('insuranceclaim',1))

means = np.mean(trainData, axis=0)
stds = np.std(trainData, axis=0)

trainData = (trainData - means)/stds
testData = (testData - means)/stds

insuranceCheck = LogisticRegression()
insuranceCheck.fit(trainData, trainLabel)

accuracy = insuranceCheck.score(testData, testLabel)
准确度=86.0%

我尝试了test_train_split(),然后使用手动公式规范化变量

trainData = (trainData - means)/stds
testData = (testData - means)/stds
再次得到准确度=60.71428571428571%。这是与test_train_split()有关的内容。当我在使用test_train_split()之后使用规范化时,我可以看到数组具有不同的值,然后数据集通过索引切片