Python 如何计算哪个独立变量对依赖变量的影响最大？_Python_Pandas_Dataframe

Python 如何计算哪个独立变量对依赖变量的影响最大？

python pandas dataframe

Python 如何计算哪个独立变量对依赖变量的影响最大？,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个数据框架，有5个自变量和1个因变量。我所有的变量都是连续的，包括因变量。有没有一种方法可以计算python中哪个自变量对因变量的影响最大？有没有一个算法可以帮我运行我尝试了信息获取方法，但这是一种分类方法，因此必须使用labelencoder转换我的因变量。在将数据集拆分为一个训练集和测试集之后，我使用了以下代码 #encoding the dependant variable lab_enc = preprocessing.LabelEncoder() training_scores

我有一个数据框架，有5个自变量和1个因变量。我所有的变量都是连续的，包括因变量。有没有一种方法可以计算python中哪个自变量对因变量的影响最大？有没有一个算法可以帮我运行

我尝试了信息获取方法，但这是一种分类方法，因此必须使用labelencoder转换我的因变量。在将数据集拆分为一个训练集和测试集之后，我使用了以下代码

#encoding the dependant variable
lab_enc = preprocessing.LabelEncoder()
training_scores_encoded = lab_enc.fit_transform(y_train)

#SelectFromModel will select those features which importance is greater than the mean importance of all the features by default, but we can alter this threshold if we want.
#Firstly, I specify the random forest instance, indicating the number of trees.
#Then I use selectFromModel object from sklearn to automatically select the features.
sel = SelectFromModel(RandomForestClassifier(n_estimators = 100))
sel.fit(X_train, training_scores_encoded)

#We can now make a list and count the selected features.    
selected_feat= X_train.columns[(sel.get_support())]
len(selected_feat)

#viewing the importances 
import matplotlib.pyplot as plt
importances = sel.estimator_.feature_importances_
indices = np.argsort(importances)[::-1]
# X is the train data used to fit the model 
plt.figure()
plt.title("Feature importances")
plt.bar(range(X_train.shape[1]), importances[indices],
       color="r", align="center")
plt.xticks(range(X_train.shape[1]), indices)
plt.xlim([-1, X_train.shape[1]])

虽然我得到了一个结果，但我不确定这一点，因为我必须对（连续的）因变量进行编码。这条路对吗？如果没有，我能做什么

提前感谢您的帮助

您可以使用

scikit学习

模块中的

SelectKBest

课程

检查原始文档

这种技术称为特征选择。

您还可以选择与响应相关性最高的特征

print([(feature, abs(df[response].corr(df[feature]))) for feature in features])

这使用了Tamarie评论中的值

for feature in feature_cols:
    print(f'feature: {feature} correlation: {abs(target_v.corr(df[feature]))}')

你能给我看看这个吗？在这种情况下，特征和响应是什么？特征是观察到的现象的一个可测量的特性或特征。换句话说：除目标列之外的所有列。而响应正是塔格列。