Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/281.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python KFold split方法为数据帧返回的索引是iloc还是loc?_Python_Pandas_Dataframe_Sklearn Pandas - Fatal编程技术网

Python KFold split方法为数据帧返回的索引是iloc还是loc?

Python KFold split方法为数据帧返回的索引是iloc还是loc?,python,pandas,dataframe,sklearn-pandas,Python,Pandas,Dataframe,Sklearn Pandas,当我们使用\u KFold.split(X)时,其中X是一个数据帧,生成用于将数据拆分为训练集和测试集的索引,是iloc(纯粹基于位置的整数索引,用于按位置选择)还是loc(按标签的行和列组loc))?您需要按位置选择行: 样本: np.random.seed(100) df = pd.DataFrame(np.random.random((10,5)), columns=list('ABCDE')) #changed default index values df.index = df.ind

当我们使用
\u KFold.split(X)
时,其中X是一个数据帧,生成用于将数据拆分为训练集和测试集的索引,是
iloc
(纯粹基于位置的整数索引,用于按位置选择)还是
loc
(按标签的行和列组loc))?

您需要按位置选择行:

样本

np.random.seed(100)
df = pd.DataFrame(np.random.random((10,5)), columns=list('ABCDE'))
#changed default index values
df.index = df.index * 10
print (df)
           A         B         C         D         E
0   0.543405  0.278369  0.424518  0.844776  0.004719
10  0.121569  0.670749  0.825853  0.136707  0.575093
20  0.891322  0.209202  0.185328  0.108377  0.219697
30  0.978624  0.811683  0.171941  0.816225  0.274074
40  0.431704  0.940030  0.817649  0.336112  0.175410
50  0.372832  0.005689  0.252426  0.795663  0.015255
60  0.598843  0.603805  0.105148  0.381943  0.036476
70  0.890412  0.980921  0.059942  0.890546  0.576901
80  0.742480  0.630184  0.581842  0.020439  0.210027
90  0.544685  0.769115  0.250695  0.285896  0.852395


非常感谢。那么你的意思是说返回的索引是iloc而不是loc?@TempO'rary-是的,没错。
from sklearn.model_selection import KFold

#added some parameters
kf = KFold(n_splits = 5, shuffle = True, random_state = 2)
result = next(kf.split(df), None)
print (result)
(array([0, 2, 3, 5, 6, 7, 8, 9]), array([1, 4]))

train = df.iloc[result[0]]
test =  df.iloc[result[1]]

print (train)
           A         B         C         D         E
0   0.543405  0.278369  0.424518  0.844776  0.004719
20  0.891322  0.209202  0.185328  0.108377  0.219697
30  0.978624  0.811683  0.171941  0.816225  0.274074
50  0.372832  0.005689  0.252426  0.795663  0.015255
60  0.598843  0.603805  0.105148  0.381943  0.036476
70  0.890412  0.980921  0.059942  0.890546  0.576901
80  0.742480  0.630184  0.581842  0.020439  0.210027
90  0.544685  0.769115  0.250695  0.285896  0.852395

print (test)
           A         B         C         D         E
10  0.121569  0.670749  0.825853  0.136707  0.575093
40  0.431704  0.940030  0.817649  0.336112  0.175410