如何在python中从头开始为逻辑回归选择功能?
我一直在尝试从头开始编写逻辑回归,我已经这样做了,但我正在使用我的乳腺癌数据集中的所有功能,我想选择一些功能(特别是我发现scikit learn在与它比较并在数据上使用其功能选择时为自己选择的功能)。但是,我不确定在我的代码中应该在哪里执行此操作,我目前拥有的是:如何在python中从头开始为逻辑回归选择功能?,python,machine-learning,logistic-regression,feature-selection,Python,Machine Learning,Logistic Regression,Feature Selection,我一直在尝试从头开始编写逻辑回归,我已经这样做了,但我正在使用我的乳腺癌数据集中的所有功能,我想选择一些功能(特别是我发现scikit learn在与它比较并在数据上使用其功能选择时为自己选择的功能)。但是,我不确定在我的代码中应该在哪里执行此操作,我目前拥有的是: X_train = ['texture_mean', 'smoothness_mean', 'compactness_mean', 'symmetry_mean', 'radius_se', 'symmetry_se' 'fract
X_train = ['texture_mean', 'smoothness_mean', 'compactness_mean', 'symmetry_mean', 'radius_se', 'symmetry_se'
'fractal_dimension_se', 'radius_worst', 'texture_worst', 'area_worst', 'smoothness_worst', 'compactness_worst']
X_test = ['texture_mean', 'smoothness_mean', 'compactness_mean', 'symmetry_mean', 'radius_se', 'symmetry_se'
'fractal_dimension_se', 'radius_worst', 'texture_worst', 'area_worst', 'smoothness_worst', 'compactness_worst']
def Sigmoid(z):
return 1/(1 + np.exp(-z))
def Hypothesis(theta, X):
return Sigmoid(X @ theta)
def Cost_Function(X,Y,theta,m):
hi = Hypothesis(theta, X)
_y = Y.reshape(-1, 1)
J = 1/float(m) * np.sum(-_y * np.log(hi) - (1-_y) * np.log(1-hi))
return J
def Cost_Function_Derivative(X,Y,theta,m,alpha):
hi = Hypothesis(theta,X)
_y = Y.reshape(-1, 1)
J = alpha/float(m) * X.T @ (hi - _y)
return J
def Gradient_Descent(X,Y,theta,m,alpha):
new_theta = theta - Cost_Function_Derivative(X,Y,theta,m,alpha)
return new_theta
def Accuracy(theta):
correct = 0
length = len(X_test)
prediction = (Hypothesis(theta, X_test) > 0.5)
_y = Y_test.reshape(-1, 1)
correct = prediction == _y
my_accuracy = (np.sum(correct) / length)*100
print ('LR Accuracy: ', my_accuracy, "%")
def Logistic_Regression(X,Y,alpha,theta,num_iters):
m = len(Y)
for x in range(num_iters):
new_theta = Gradient_Descent(X,Y,theta,m,alpha)
theta = new_theta
if x % 100 == 0:
print #('theta: ', theta)
print #('cost: ', Cost_Function(X,Y,theta,m))
Accuracy(theta)
ep = .012
initial_theta = np.random.rand(X_train.shape[1],1) * 2 * ep - ep
alpha = 0.5
iterations = 10000
Logistic_Regression(X_train,Y_train,alpha,initial_theta,iterations)
我假设,如果我手动更改X_train和X_test包含的功能,这将起作用,但我得到一个错误:AttributeError:“list”对象在初始的X_theta行没有属性“shape”。在正确方向上的任何帮助都将不胜感激。问题在于X_train是一个列表和形状,仅适用于数据帧 你可以: -保留列表,但改用len(X_train),或
-将X_列类型更改为pandas数据帧,pandas.dataframe(X_列).shape[0]谢谢您,将X_列更改为pandas数据帧会在最后的逻辑回归行出现新错误:TypeError:无法将数组数据从dtype('float64')转换为dtype('对此不是100%确定,但根据,您似乎需要更改输入的类型。类型(Y_train)是浮动的吗?