Python运行非常慢，只需一行代码_Python_Python 3.x_Scikit Learn

Python运行非常慢，只需一行代码

python python-3.x scikit-learn

Python运行非常慢，只需一行代码,python,python-3.x,scikit-learn,Python,Python 3.x,Scikit Learn,我正在运行下面的代码 import pandas as pd import numpy as np from sklearn.preprocessing import LabelEncoder import random from sklearn.ensemble import RandomForestClassifier from sklearn.ensemble import GradientBoostingClassifier train=pd.read_csv('C:\\path_he

我正在运行下面的代码

import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
import random
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier

train=pd.read_csv('C:\\path_here\\train.csv')
test=pd.read_csv('C:\\path_here\\test.csv')
train['Type']='Train' #Create a flag for Train and Test Data set
test['Type']='Test'
fullData = pd.concat([train,test],axis=0) #Combined both Train and Test Data set


fullData.columns # This will show all the column names
fullData.head(10) # Show first 10 records of dataframe
fullData.describe() #You can look at summary of numerical fields by using describe() function


ID_col = ['REF_NO']
target_col = ['Status']
cat_cols = ['children','age_band','status','occupation','occupation_partner','home_status','family_income','self_employed', 'self_employed_partner','year_last_moved','TVarea','post_code','post_area','gender','region']

num_cols= list(set(list(fullData.columns)))
other_col=['Type'] #Test and Train Data set identifier


fullData.isnull().any()#Will return the feature with True or False,True means have missing value else False

num_cat_cols = num_cols+cat_cols # Combined numerical and Categorical variables

#Create a new variable for each variable having missing value with VariableName_NA 
# and flag missing value with 1 and other with 0

for var in num_cat_cols:
    if fullData[var].isnull().any()==True:
        fullData[var+'_NA']=fullData[var].isnull()*1 


#Impute numerical missing values with mean
fullData[num_cols] = fullData[num_cols].fillna(fullData[num_cols].mean(),inplace=True)

#Impute categorical missing values with 0
fullData[cat_cols] = fullData[cat_cols].fillna(value = 0)


#create label encoders for categorical features
for var in cat_cols:
 number = LabelEncoder()
 fullData[var] = number.fit_transform(fullData[var].astype('str'))

#Target variable is also a categorical so convert it
fullData["Account.Status"] = number.fit_transform(fullData["Account.Status"].astype('str'))

train=fullData[fullData['Type']=='Train']
test=fullData[fullData['Type']=='Test']

train['is_train'] = np.random.uniform(0, 1, len(train)) <= .75
Train, Validate = train[train['is_train']==True], train[train['is_train']==False]


features=list(set(list(fullData.columns))-set(ID_col)-set(target_col)-set(other_col))

x_train = Train[list(features)].values
y_train = Train["Account.Status"].values
x_validate = Validate[list(features)].values
y_validate = Validate["Account.Status"].values
x_test=test[list(features)].values

random.seed(100)
rf = RandomForestClassifier(n_estimators=1000)
rf.fit(x_train, y_train)

我过不了那个地方。我怎样才能看到背景中发生了什么？有什么方法可以看到正在做的工作吗？谢谢。

检查代码指向何处的一种方法是添加打印语句。例如，您可以添加（就在标签编码器之前）：

然后在代码块之后添加另一个print语句。您可以在控制台中准确地看到代码被卡住的位置，并调试该特定行。

您如何知道它是这一行？你能提供样本数据吗？

n_估计器=1000

似乎有点过头了。在不了解任何数据的情况下，培训很可能需要很长时间。使用较小的

n_估计值

看看是否确实如此。@各位……谢谢！！

fullData[cat_cols] = fullData[cat_cols].fillna(value = 0)

print("Code got before label encoder")