Python logistic回归中的交叉验证

Python logistic回归中的交叉验证,python,numpy,pandas,scikit-learn,cross-validation,Python,Numpy,Pandas,Scikit Learn,Cross Validation,我想使用arr作为load_数据函数的输入,在逻辑回归中执行交叉验证。我这里有代码大纲。函数运行但不提供输出 import pandas as pd import numpy as np from sklearn.linear_model.logistic import LogisticRegression from sklearn.cross_validation import train_test_split from sklearn.cross_validation import cros

我想使用arr作为load_数据函数的输入,在逻辑回归中执行交叉验证。我这里有代码大纲。函数运行但不提供输出

import pandas as pd
import numpy as np
from sklearn.linear_model.logistic import LogisticRegression
from sklearn.cross_validation import train_test_split
from sklearn.cross_validation import cross_val_score
from sklearn import cross_validation

def load_data(filename):
    df = pd.read_csv(filename)
    arr = df.values
    print arr[:3]
    return arr
# load_data("data.csv")

def fit_logistic_cv(arr, cv=5):
    X=arr[:, :-1]
    y=arr[:, -1]
    print y
    kf_total = cross_validation.KFold(len(X), n_folds=cv) # (indices=True, shuffle=True, random_state=4)
    lr = linear_model.LogisticRegression()
    lr.fit(X,y)
    precisions=cross_validation.cross_val_score(lr, X, y, cv=kf_total, scoring='precision')
    print 'Precision', np.mean(precisions), precisions
    recalls=cross_validation.cross_val_score(lr, X, y, cv=kf_total, scoring='recall')
    print 'Recalls', np.mean(recalls), recalls
    f1s = cross_validation.cross_val_score(lr, X, y, cv=kf_total, scoring='f1')
    print 'F1', np.mean(f1s), f1s


def test_logistic_cv():  # testing above function 
    data_filename = "data.csv"
    fit_logistic_cv(load_data(data_filename))

不清楚为什么要从pandas df提取numpy数组,pandas dfs与sklearn方法兼容,您只需将列作为参数进行索引,例如
classifier.fit(df['X\u train\u vals'],df['y\u train\u vals'))
这只是一个指示性示例,我不知道你的列实际上是什么,但重点是你只需要索引它们并将它们作为参数传递,interweb上有很多关于这个@EdChum的示例代码,这是必需的方法。我在从numpy数组中获取X_列和y_列时遇到问题。您需要更好地解释,请将错误编辑到您的问题中并添加任何附加内容code@EdChum. 我在交叉验证中使用了k-fold,使用了arr(len(arr))。我想知道它是否正确。它仍然可以工作,返回的是用于在df行上执行切片的索引,我认为您需要更多地坚持使用数据帧,因为在这个阶段,您有许多基本问题和错误,这些问题和错误在这里回答不太有用