Python Keep getting ValueError:传递值的形状为(4474,10),索引暗示为(14084,10)

Python Keep getting ValueError:传递值的形状为(4474,10),索引暗示为(14084,10),python,pandas,numpy,data-science,Python,Pandas,Numpy,Data Science,首先,如果您能帮助解决这个问题,请提前感谢!我正试图为我的模型平衡一些客户数据。我的目标都是1和0,0的数量非常丰富。因此,我创建了一个计数器,一旦0行超过1行,它将开始删除0行。但是在我的代码的最后,当我创建np.delete以从数据集中获取这些额外的行时,我不断得到这个错误 我真的不知道该尝试什么,因为我甚至不明白错误告诉了我什么 import pandas as pd import numpy as np from sklearn import preprocessing #%% #L

首先,如果您能帮助解决这个问题,请提前感谢!我正试图为我的模型平衡一些客户数据。我的目标都是1和0,0的数量非常丰富。因此,我创建了一个计数器,一旦0行超过1行,它将开始删除0行。但是在我的代码的最后,当我创建np.delete以从数据集中获取这些额外的行时,我不断得到这个错误

我真的不知道该尝试什么,因为我甚至不明白错误告诉了我什么

import pandas as pd 
import numpy as np 
from sklearn import preprocessing
#%%
#Loading the Raw Data
raw_csv_data= pd.read_csv('Audiobooks-data_raw.csv')
print(display(raw_csv_data.head(20)))
#%%
df=raw_csv_data.copy()
print(display(df.head(20)))
#%%
print(df.info())
#%%
#Separate the Targets from the dataset
inputs_all= df.loc[:,'Book length (mins)_overall':'Last visited minus Purchase date']
targets_all= df['Targets']
print(display(inputs_all.head()))
print(display(targets_all.head()))
#%%
#Shuffling the Data to prep for balancing
shuffled_indices= np.arange(inputs_all.shape[0])
np.random.shuffle(shuffled_indices)
shuffled_inputs= inputs_all.iloc[shuffled_indices]
shuffled_targets= targets_all[shuffled_indices]
#%%
#Balance the Dataset
#There are significantly more 0's than 1's in our target.
#We want a good accurate model
print(inputs_all.shape)
print(targets_all.shape)
#%%
num_one_targets= int(np.sum(targets_all))
zero_targets_counter= 0
indices_to_remove= []
print(num_one_targets)
#%%
for i in range(targets_all.shape[0]):
    if targets_all[i]==0:
        zero_targets_counter +=1
        if zero_targets_counter> num_one_targets:
            indices_to_remove.append(i)

#%%

inputs_all_balanced= np.delete(inputs_all, indices_to_remove, axis=0)
targets_all_balanced= np.delete(targets_all, indices_to_remove, axis=0)
除了尝试对平衡数据集进行分组并删除多余的0行外,所有操作都正常。以下是错误:

ValueError                                Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in create_block_manager_from_blocks(blocks, axes)
   1652 
-> 1653         mgr = BlockManager(blocks, axes)
   1654         mgr._consolidate_inplace()

~\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in __init__(self, blocks, axes, do_integrity_check)
    113         if do_integrity_check:
--> 114             self._verify_integrity()
    115 

~\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in _verify_integrity(self)
    310             if block._verify_integrity and block.shape[1:] != mgr_shape[1:]:
--> 311                 construction_error(tot_items, block.shape[1:], self.axes)
    312         if len(self.items) != tot_items:

~\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in construction_error(tot_items, block_shape, axes, e)
   1690     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 1691         passed, implied))
   1692 

ValueError: Shape of passed values is (4474, 10), indices imply (14084, 10)

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
 in 
----> 1 inputs_all_balanced= np.delete(inputs_all, indices_to_remove, axis=0)
      2 targets_all_balanced= np.delete(targets_all, indices_to_remove, axis=0)

~\Anaconda3\lib\site-packages\numpy\lib\function_base.py in delete(arr, obj, axis)
   4419 
   4420     if wrap:
-> 4421         return wrap(new)
   4422     else:
   4423         return new

~\Anaconda3\lib\site-packages\pandas\core\generic.py in __array_wrap__(self, result, context)
   1907     def __array_wrap__(self, result, context=None):
   1908         d = self._construct_axes_dict(self._AXIS_ORDERS, copy=False)
-> 1909         return self._constructor(result, **d).__finalize__(self)
   1910 
   1911     # ideally we would define this to avoid the getattr checks, but

~\Anaconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
    422             else:
    423                 mgr = init_ndarray(data, index, columns, dtype=dtype,
--> 424                                    copy=copy)
    425 
    426         # For data is list-like, or Iterable (will consume into list)

~\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in init_ndarray(values, index, columns, dtype, copy)
    165         values = maybe_infer_to_datetimelike(values)
    166 
--> 167     return create_block_manager_from_blocks([values], [columns, index])
    168 
    169 

~\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in create_block_manager_from_blocks(blocks, axes)
   1658         blocks = [getattr(b, 'values', b) for b in blocks]
   1659         tot_items = sum(b.shape[0] for b in blocks)
-> 1660         construction_error(tot_items, blocks[0].shape[1:], axes, e)
   1661 
   1662 

~\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in construction_error(tot_items, block_shape, axes, e)
   1689         raise ValueError("Empty data passed with indices specified.")
   1690     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 1691         passed, implied))
   1692 
   1693 

ValueError: Shape of passed values is (4474, 10), indices imply (14084, 10)

请尝试使用以下命令删除行:

inputs_all_balanced  = inputs_all.drop(indices_to_remove,axis=0)
targets_all_balanced = targets_all.drop(indices_to_remove,axis=0)

哇@莫。成功了。非常感谢。你能告诉我为什么它会抛出那个错误吗?你是怎么做到的?我想这可能与numpy vs pandas有关,但我不明白。你的变量输入、目标都是pandas的数据帧。尝试用numpy删除pandas数据帧的行有点冒险,这可能会破坏一些东西。由于pandas有标准的drop方法,所以您不希望将两者混合使用,因为numpy数组和pnadas数据帧结构之间存在差异。