Python 值错误时放置'Replace=True'：在'；replace=False'；_Python_Pandas_Replace

Python 值错误时放置'Replace=True'：在'；replace=False'；

python pandas replace

Python 值错误时放置'Replace=True'：在'；replace=False'；,python,pandas,replace,Python,Pandas,Replace,正在尝试复制一个：我拆分了一个数据集 # Split data raw_train_df, valid_df = train_test_split(image_df, test_size = 0.25, random_state = 12345, stratify = image_df['class_name']) # Print results print(raw_train_df.shape, 'Training data') print(valid_df.shape, 'Valida

正在尝试复制一个：

我拆分了一个数据集

# Split data
raw_train_df, valid_df = train_test_split(image_df, test_size = 0.25, random_state = 12345, stratify = 
image_df['class_name'])

# Print results
print(raw_train_df.shape, 'Training data')
print(valid_df.shape, 'Validation data')

(11250, 10) Training data
(3750, 10) Validation data

现在尝试平衡训练集：

fig, (ax1, ax2) = plt.subplots(1, 2, figsize = (20, 10))
raw_train_df.groupby('class_name').size().plot.bar(ax = ax1)
train_df = raw_train_df.groupby('class_name').\
    apply(lambda x: x.sample(TRAIN_SAMPLES//15)).\ # Here I put 15 instead of 3, because I have 15 
classes
    reset_index(drop=True)
train_df.groupby('class_name').size().plot.bar(ax=ax2) 
print(train_df.shape[0], 'new training size')

我收到一个错误：

ValueError                                Traceback (most recent call last)
<ipython-input-16-3b4d2b82246c> in <module>()
  1 fig, (ax1, ax2) = plt.subplots(1, 2, figsize = (20, 10))
  2 raw_train_df.groupby('class_name').size().plot.bar(ax = ax1)
----> 3 train_df = raw_train_df.groupby('class_name').    apply(lambda x: 
x.sample(TRAIN_SAMPLES//15)).    reset_index(drop=True)
  4 train_df.groupby('class_name').size().plot.bar(ax=ax2)
  5 print(train_df.shape[0], 'new training size')

4 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in sample(self, n, frac, replace, 
weights, random_state, axis)
4993             )
4994 
-> 4995         locs = rs.choice(axis_length, size=n, replace=replace, p=weights)
4996         return self.take(locs, axis=axis)
4997 

mtrand.pyx in numpy.random.mtrand.RandomState.choice()

ValueError: Cannot take a larger sample than population when 'replace=False'

ValueError回溯（最近一次调用）
在（）
1图（ax1，ax2）=plt.子批次（1，2，figsize=（20，10））
2原始列df.groupby（'class\u name'）.size（）.plot.bar（ax=ax1）
---->3列车df=原始列车df.groupby（'class\u name'）。应用（λx：
x、 样品（系列样品//15））。重置索引（drop=True）
4序列df.groupby（'class_name'）.size（）.plot.bar（ax=ax2）
5打印（序列测向形状[0]，“新训练尺寸”）
4帧
/样本中的usr/local/lib/python3.6/dist-packages/pandas/core/generic.py（self、n、frac、replace、，
权重，随机状态，轴）
4993             )
4994
->4995 locs=rs.选择（轴长度，尺寸=n，替换=替换，p=重量）
4996返回自取（locs，轴=轴）
4997
numpy.random.mtrand.RandomState.choice（）中的mtrand.pyx
ValueError:当“replace=False”时，无法获取比总体更大的样本

图像看起来像这样

这是一个常见的错误，当您需要将

Replace=True

放在某个地方，但我不确定确切的位置。

错误出现在对

x.sample（TRAIN\u SAMPLES//15）

的调用中，在设置

TRAIN\u df

的线路上

这可以根据以下情况进行追踪：

错误回溯消息将错误指向将值分配给

列车df

的线路（如箭头所示

--->3列车df

）

此行中的所有函数调用都没有参数

replace=True/False

，只有x.sample（）调用除外。也就是说，groupby（）、apply（）和reset_index（）都没有参数

replace=True/False

如有必要，您可以参考pandas API指南以获取更多错误提示。

错误在于调用

x.sample（TRAIN\u SAMPLES//15）

设置

TRAIN\u df的线路
这可以根据以下情况进行追踪：
错误回溯消息将错误指向将值分配给列车df
的线路（如箭头所示--->3列车df
）
此行中的所有函数调用都没有参数replace=True/False
，只有x.sample（）调用除外。也就是说，groupby（）、apply（）和reset_index（）都没有参数replace=True/False
如有必要，您可以参考pandas API指南以获取更多错误提示