Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/354.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python GBM生存的不可信变量重要性:重要性的持续差异_Python_Python 3.x_Survival Analysis_Gbm_Scikit Survival - Fatal编程技术网

Python GBM生存的不可信变量重要性:重要性的持续差异

Python GBM生存的不可信变量重要性:重要性的持续差异,python,python-3.x,survival-analysis,gbm,scikit-survival,Python,Python 3.x,Survival Analysis,Gbm,Scikit Survival,我有一个关于GBM生存分析的问题。我试图量化变量(n=453)的变量重要性,数据集为3614个个体。由此产生的具有可变重要性的图形看起来排列可疑。我以前计算过GBMs,但从未见过这种渐进模式的重要性。重要性条之间通常有不同的距离;在这种情况下,似乎在重要性上存在持续的差异。我的数据帧称为df。由于数据的敏感性,我无法上传样本数据。相反,我的问题是获得这些变量重要性的合理性 from sksurv.ensemble import GradientBoostingSurvivalAnalysis

我有一个关于GBM生存分析的问题。我试图量化变量(n=453)的变量重要性,数据集为3614个个体。由此产生的具有可变重要性的图形看起来排列可疑。我以前计算过GBMs,但从未见过这种渐进模式的重要性。重要性条之间通常有不同的距离;在这种情况下,似乎在重要性上存在持续的差异。我的数据帧称为df。由于数据的敏感性,我无法上传样本数据。相反,我的问题是获得这些变量重要性的合理性

from sksurv.ensemble import GradientBoostingSurvivalAnalysis
from sklearn import crossvalidation, metrics, model_selection   
from sklearn.grid_search import GridSearchCV

import matplotlib.pylab as plt
%matplotlib inline
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 12, 4

from sklearn.datasets import make_regression
predictors = [x for x in df.columns if x not in 'death','surv_death']]
target = ['death','surv_death']
df_X=df[predictors]
df_y=df[target]
X=df_X.values
arr_y=df_y.values

y= np.zeros((n,), dtype=[('death','bool'),('surv_death', 'f8')])
y['death']=arr_y[:,1].flatten()
y['surv_death']=arr_y[:,1].flatten()

gbm0 = GradientBoostingSurvivalAnalysis(criterion='friedman_mse',
dropout_rate=0 .0, learning_rate=0.01, loss='coxph', max_depth=100,   
max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0,   
min_impurity_split=None, min_samples_leaf=10, min_samples_split=20,
min_weight_fraction_leaf=0.0, n_estimators=1000, random_state=10,  
subsample=1.0, verbose=0)               dropout_rate=0.0, 
learning_rate=0.01, loss='coxph', max_depth=100,   
max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0, 
min_impurity_split=None, min_samples_leaf=10, min_samples_split=20,
min_weight_fraction_leaf=0.0, n_estimators=1000, random_state=10,   
subsample=1.0, verbose=0)

gbm0.fit(X, y)

feature_importance = gbm0.feature_importances_

feature_importance = 100.0 * (feature_importance  /feature_importance.max())
sorted_idx = np.argsort(feature_importance)
preds=np.array(predictors)[sorted_idx]

pos = np.arange(sorted_idx.shape[0]) + .5
plt.figure(figsize=(10, 100))
plt.subplot(1, 1, 1)
plt.barh(preds,pos,align='center')

plt.xlabel('Relative Importance')
plt.title('Variable Importance')
plt.savefig("df.png")
plt.show()