Python 值错误时放置'Replace=True':在';replace=False';
正在尝试复制一个: 我拆分了一个数据集Python 值错误时放置'Replace=True':在';replace=False';,python,pandas,replace,Python,Pandas,Replace,正在尝试复制一个: 我拆分了一个数据集 # Split data raw_train_df, valid_df = train_test_split(image_df, test_size = 0.25, random_state = 12345, stratify = image_df['class_name']) # Print results print(raw_train_df.shape, 'Training data') print(valid_df.shape, 'Valida
# Split data
raw_train_df, valid_df = train_test_split(image_df, test_size = 0.25, random_state = 12345, stratify =
image_df['class_name'])
# Print results
print(raw_train_df.shape, 'Training data')
print(valid_df.shape, 'Validation data')
(11250, 10) Training data
(3750, 10) Validation data
现在尝试平衡训练集:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize = (20, 10))
raw_train_df.groupby('class_name').size().plot.bar(ax = ax1)
train_df = raw_train_df.groupby('class_name').\
apply(lambda x: x.sample(TRAIN_SAMPLES//15)).\ # Here I put 15 instead of 3, because I have 15
classes
reset_index(drop=True)
train_df.groupby('class_name').size().plot.bar(ax=ax2)
print(train_df.shape[0], 'new training size')
我收到一个错误:
ValueError Traceback (most recent call last)
<ipython-input-16-3b4d2b82246c> in <module>()
1 fig, (ax1, ax2) = plt.subplots(1, 2, figsize = (20, 10))
2 raw_train_df.groupby('class_name').size().plot.bar(ax = ax1)
----> 3 train_df = raw_train_df.groupby('class_name'). apply(lambda x:
x.sample(TRAIN_SAMPLES//15)). reset_index(drop=True)
4 train_df.groupby('class_name').size().plot.bar(ax=ax2)
5 print(train_df.shape[0], 'new training size')
4 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in sample(self, n, frac, replace,
weights, random_state, axis)
4993 )
4994
-> 4995 locs = rs.choice(axis_length, size=n, replace=replace, p=weights)
4996 return self.take(locs, axis=axis)
4997
mtrand.pyx in numpy.random.mtrand.RandomState.choice()
ValueError: Cannot take a larger sample than population when 'replace=False'
ValueError回溯(最近一次调用)
在()
1图(ax1,ax2)=plt.子批次(1,2,figsize=(20,10))
2原始列df.groupby('class\u name').size().plot.bar(ax=ax1)
---->3列车df=原始列车df.groupby('class\u name')。应用(λx:
x、 样品(系列样品//15))。重置索引(drop=True)
4序列df.groupby('class_name').size().plot.bar(ax=ax2)
5打印(序列测向形状[0],“新训练尺寸”)
4帧
/样本中的usr/local/lib/python3.6/dist-packages/pandas/core/generic.py(self、n、frac、replace、,
权重,随机状态,轴)
4993 )
4994
->4995 locs=rs.选择(轴长度,尺寸=n,替换=替换,p=重量)
4996返回自取(locs,轴=轴)
4997
numpy.random.mtrand.RandomState.choice()中的mtrand.pyx
ValueError:当“replace=False”时,无法获取比总体更大的样本
图像看起来像这样
这是一个常见的错误,当您需要将
Replace=True
放在某个地方,但我不确定确切的位置。错误出现在对x.sample(TRAIN\u SAMPLES//15)
的调用中,在设置TRAIN\u df
的线路上
这可以根据以下情况进行追踪:
列车df
的线路(如箭头所示--->3列车df
)replace=True/False
,只有x.sample()调用除外。也就是说,groupby()、apply()和reset_index()都没有参数replace=True/False
如有必要,您可以参考pandas API指南以获取更多错误提示。错误在于调用
x.sample(TRAIN\u SAMPLES//15)
设置TRAIN\u df的线路
这可以根据以下情况进行追踪:
错误回溯消息将错误指向将值分配给列车df
的线路(如箭头所示--->3列车df
)
此行中的所有函数调用都没有参数replace=True/False
,只有x.sample()调用除外。也就是说,groupby()、apply()和reset_index()都没有参数replace=True/False
如有必要,您可以参考pandas API指南以获取更多错误提示