Machine learning 为什么我在LR模型中的训练精度为1.0,测试精度为.99

Machine learning 为什么我在LR模型中的训练精度为1.0,测试精度为.99,machine-learning,Machine Learning,我正在做一个关于滑坡分类的项目,我删除了空值和不需要的列,但我得到了1的训练和测试精度 df = pd.read_csv("full_dataset_v1.csv") df.head() # filter by severity. na is for non-landslide data df = df[df['severity'].isin(["medium", "small", "large", "v

我正在做一个关于滑坡分类的项目,我删除了空值和不需要的列,但我得到了1的训练和测试精度

df = pd.read_csv("full_dataset_v1.csv")
df.head()

# filter by severity. na is for non-landslide data
df = df[df['severity'].isin(["medium", "small", "large", "very_large", "na"])]
df = shuffle(df)
df.reset_index(inplace=True, drop=True)
print(len(df))
X = df.copy()
df_col_length = len(df.columns) 
X.drop(X.columns[[0]], axis = 1, inplace = True)
def generate_labels(binary = False):
y = []
idx_to_severity = [ "large","medium","na", "small","very_large"]
for severity in X.severity:
    y.append(idx_to_severity.index(severity))
X.drop(X.columns[[-1]], axis = 1, inplace = True)
print(y.count(1))
return y
y = generate_labels(False)
X.drop(X.columns[[0,1]],axis = 1, inplace = True)
df = X
def cat(string):
    df[string] = df[string].astype('category')

cat('country')
cat('type')
cat('trigger')
cat('location')
cat('severity')

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
def label_encode(string):
le.fit(df[string])
df[string] = le.transform(df[string])

label_encode('country')
label_encode('type')
label_encode('trigger')
label_encode('location')
label_encode('severity')

df.dropna(axis='columns')
df.fillna(X.mean(), inplace=True)
df.head()
X = df

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33,random_state=0)

scaler = StandardScaler()
scaler.fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train,y_train)
print("Train accuracy",model.score(X_train,y_train))
print("Test acuracy",model.score(X_test,y_test))

数据集的大小为4396行x193列,我的代码错了,或者我可以做些什么来纠正我的准确性吗?

您得到的准确性似乎很好,因为您的模型没有针对测试数据进行训练。这就是为什么我们执行
train\u test\u split
以查看我们的模型如何处理我们的测试数据中看不见的数据。我希望您遵循这个概念