Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/298.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 值错误时放置'Replace=True':在';replace=False';_Python_Pandas_Replace - Fatal编程技术网

Python 值错误时放置'Replace=True':在';replace=False';

Python 值错误时放置'Replace=True':在';replace=False';,python,pandas,replace,Python,Pandas,Replace,正在尝试复制一个: 我拆分了一个数据集 # Split data raw_train_df, valid_df = train_test_split(image_df, test_size = 0.25, random_state = 12345, stratify = image_df['class_name']) # Print results print(raw_train_df.shape, 'Training data') print(valid_df.shape, 'Valida

正在尝试复制一个:

我拆分了一个数据集

# Split data
raw_train_df, valid_df = train_test_split(image_df, test_size = 0.25, random_state = 12345, stratify = 
image_df['class_name'])

# Print results
print(raw_train_df.shape, 'Training data')
print(valid_df.shape, 'Validation data')

(11250, 10) Training data
(3750, 10) Validation data
现在尝试平衡训练集:

fig, (ax1, ax2) = plt.subplots(1, 2, figsize = (20, 10))
raw_train_df.groupby('class_name').size().plot.bar(ax = ax1)
train_df = raw_train_df.groupby('class_name').\
    apply(lambda x: x.sample(TRAIN_SAMPLES//15)).\ # Here I put 15 instead of 3, because I have 15 
classes
    reset_index(drop=True)
train_df.groupby('class_name').size().plot.bar(ax=ax2) 
print(train_df.shape[0], 'new training size')
我收到一个错误:

ValueError                                Traceback (most recent call last)
<ipython-input-16-3b4d2b82246c> in <module>()
  1 fig, (ax1, ax2) = plt.subplots(1, 2, figsize = (20, 10))
  2 raw_train_df.groupby('class_name').size().plot.bar(ax = ax1)
----> 3 train_df = raw_train_df.groupby('class_name').    apply(lambda x: 
x.sample(TRAIN_SAMPLES//15)).    reset_index(drop=True)
  4 train_df.groupby('class_name').size().plot.bar(ax=ax2)
  5 print(train_df.shape[0], 'new training size')

4 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in sample(self, n, frac, replace, 
weights, random_state, axis)
4993             )
4994 
-> 4995         locs = rs.choice(axis_length, size=n, replace=replace, p=weights)
4996         return self.take(locs, axis=axis)
4997 

mtrand.pyx in numpy.random.mtrand.RandomState.choice()

ValueError: Cannot take a larger sample than population when 'replace=False'
ValueError回溯(最近一次调用)
在()
1图(ax1,ax2)=plt.子批次(1,2,figsize=(20,10))
2原始列df.groupby('class\u name').size().plot.bar(ax=ax1)
---->3列车df=原始列车df.groupby('class\u name')。应用(λx:
x、 样品(系列样品//15))。重置索引(drop=True)
4序列df.groupby('class_name').size().plot.bar(ax=ax2)
5打印(序列测向形状[0],“新训练尺寸”)
4帧
/样本中的usr/local/lib/python3.6/dist-packages/pandas/core/generic.py(self、n、frac、replace、,
权重,随机状态,轴)
4993             )
4994
->4995 locs=rs.选择(轴长度,尺寸=n,替换=替换,p=重量)
4996返回自取(locs,轴=轴)
4997
numpy.random.mtrand.RandomState.choice()中的mtrand.pyx
ValueError:当“replace=False”时,无法获取比总体更大的样本
图像看起来像这样


这是一个常见的错误,当您需要将
Replace=True
放在某个地方,但我不确定确切的位置。

错误出现在对
x.sample(TRAIN\u SAMPLES//15)
的调用中,在设置
TRAIN\u df
的线路上

这可以根据以下情况进行追踪:

  • 错误回溯消息将错误指向将值分配给
    列车df
    的线路(如箭头所示
    --->3列车df
  • 此行中的所有函数调用都没有参数
    replace=True/False
    ,只有x.sample()调用除外。也就是说,groupby()、apply()和reset_index()都没有参数
    replace=True/False

  • 如有必要,您可以参考pandas API指南以获取更多错误提示。

    错误在于调用
    x.sample(TRAIN\u SAMPLES//15)
    设置
    TRAIN\u df的线路

    这可以根据以下情况进行追踪:

  • 错误回溯消息将错误指向将值分配给
    列车df
    的线路(如箭头所示
    --->3列车df
  • 此行中的所有函数调用都没有参数
    replace=True/False
    ,只有x.sample()调用除外。也就是说,groupby()、apply()和reset_index()都没有参数
    replace=True/False
  • 如有必要,您可以参考pandas API指南以获取更多错误提示