Python 3.x scikit学习带索引的分层HuffleSplit键错误_Python 3.x_Pandas_Scikit Learn

Python 3.x scikit学习带索引的分层HuffleSplit键错误

python-3.x pandas scikit-learn

Python 3.x scikit学习带索引的分层HuffleSplit键错误,python-3.x,pandas,scikit-learn,Python 3.x,Pandas,Scikit Learn,这是我的熊猫数据帧批次\u未预处理\u usd： <class 'pandas.core.frame.DataFrame'> Index: 78718 entries, 2017-09-12T18-38-38-076065 to 2017-10-02T07-29-40-245031 Data columns (total 20 columns): created_year 78718 non-null float64 price

这是我的熊猫数据帧

批次\u未预处理\u usd

：

<class 'pandas.core.frame.DataFrame'>
Index: 78718 entries, 2017-09-12T18-38-38-076065 to 2017-10-02T07-29-40-245031
Data columns (total 20 columns):
created_year              78718 non-null float64
price                     78718 non-null float64
........
decade                    78718 non-null int64
dtypes: float64(8), int64(1), object(11)
memory usage: 12.6+ MB

我的剧本：

from sklearn.model_selection import StratifiedShuffleSplit

split = StratifiedShuffleSplit(n_splits=1, test_size =0.2, random_state=42)
for train_index, test_index  in split.split(lots_not_preprocessed_usd, lots_not_preprocessed_usd['decade']):
    strat_train_set = lots_not_preprocessed_usd.loc[train_index]
    strat_test_set  = lots_not_preprocessed_usd.loc[test_index]

我收到了错误信息

KeyError                                  Traceback (most recent call last)
<ipython-input-224-cee2389254f2> in <module>()
      3 split = StratifiedShuffleSplit(n_splits=1, test_size =0.2, random_state=42)
      4 for train_index, test_index  in split.split(lots_not_preprocessed_usd, lots_not_preprocessed_usd['decade']):
----> 5     strat_train_set = lots_not_preprocessed_usd.loc[train_index]
      6     strat_test_set  = lots_not_preprocessed_usd.loc[test_index]

......

KeyError: 'None of [[32199 67509 69003 ..., 44204  2809 56726]] are in the [index]'

当您使用

.loc

时，您需要为行索引器传递相同的索引，因此当您想使用原始数字索引器而不是

.loc

时，请使用

.iloc

。在for循环中，序列索引和文本索引不是datetime，因为

split.split（X，y）

返回随机索引数组

...
for train_index, test_index  in split.split(lots_not_preprocessed_usd, lots_not_preprocessed_usd['decade']):
    strat_train_set = lots_not_preprocessed_usd.iloc[train_index]
    strat_test_set  = lots_not_preprocessed_usd.iloc[test_index]

示例

lots_not_preprocessed_usd = pd.DataFrame({'some':np.random.randint(5,10,100),'decade':np.random.randint(5,10,100)},index= pd.date_range('5-10-15',periods=100))

for train_index, test_index  in split.split(lots_not_preprocessed_usd, lots_not_preprocessed_usd['decade']):

    strat_train_set = lots_not_preprocessed_usd.iloc[train_index]
    strat_test_set  = lots_not_preprocessed_usd.iloc[test_index]

样本输出：

strat_train_set.head()

十年左右 2015-08-02 6 7 2015-06-14 7 6 2015-08-14 7 9 2015-06-25 9 5 2015-05-15 7 9

添加

lots\u not\u preprocessed\u usd.head（）

了解更多信息

lots_not_preprocessed_usd = pd.DataFrame({'some':np.random.randint(5,10,100),'decade':np.random.randint(5,10,100)},index= pd.date_range('5-10-15',periods=100))

for train_index, test_index  in split.split(lots_not_preprocessed_usd, lots_not_preprocessed_usd['decade']):

    strat_train_set = lots_not_preprocessed_usd.iloc[train_index]
    strat_test_set  = lots_not_preprocessed_usd.iloc[test_index]

strat_train_set.head()

decade some 2015-08-02 6 7 2015-06-14 7 6 2015-08-14 7 9 2015-06-25 9 5 2015-05-15 7 9