Python 插补数据导致ValueError:替换长度必须等于序列长度
我有一个包含以下列的数据框Python 插补数据导致ValueError:替换长度必须等于序列长度,python,pandas,Python,Pandas,我有一个包含以下列的数据框 missing_df.columns.tolist() 返回一个列表: ['order_id', 'customer_id', 'date', 'nearest_warehouse', 'shopping_cart', 'order_price', 'delivery_charges', 'customer_lat', 'customer_long', 'coupon_d
missing_df.columns.tolist()
返回一个列表:
['order_id',
'customer_id',
'date',
'nearest_warehouse',
'shopping_cart',
'order_price',
'delivery_charges',
'customer_lat',
'customer_long',
'coupon_discount',
'order_total',
'season',
'is_expedited_delivery',
'distance_to_nearest_warehouse',
'latest_customer_review',
'is_happy_customer',
'Autumn',
'Spring',
'Summer',
'Winter',
'exp_int']
如果我尝试运行以下操作
missing_df['delivery_charges'][missing_df['delivery_charges'].isnull()] = iimput_model.predict(missing_df.drop(['order_id','customer_id','date','nearest_warehouse','shopping_cart','order_price','delivery_charges','customer_lat','customer_long','coupon_discount','order_total','season','is_expedited_delivery','latest_customer_review'],1))
我得到以下错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~\AppData\Roaming\Python\Python37\site-packages\pandas\core\series.py in __setitem__(self, key, value)
999 try:
-> 1000 self._set_with_engine(key, value)
1001 except (KeyError, ValueError):
~\AppData\Roaming\Python\Python37\site-packages\pandas\core\series.py in _set_with_engine(self, key, value)
1032 # fails with AttributeError for IntervalIndex
-> 1033 loc = self.index._engine.get_loc(key)
1034 validate_numeric_casting(self.dtype, value)
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
TypeError: '0 False
1 False
2 False
3 True
4 False
...
495 False
496 False
497 False
498 False
499 False
Name: delivery_charges, Length: 500, dtype: bool' is an invalid key
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-418-60aee038ac4a> in <module>
1 #dirty_df.reindex(columns=filtered_columns)
2 #missing_df['delivery_charges'][missing_df['delivery_charges'].isnull()] = iimput_model.predict(missing_df.drop(['order_id','customer_id','date','nearest_warehouse','shopping_cart','order_price','delivery_charges','customer_lat','customer_long','coupon_discount','order_total','season','is_expedited_delivery','latest_customer_review'],1))
----> 3 missing_df['delivery_charges'][missing_df['delivery_charges'].isnull()] = iimput_model.predict(missing_df.reindex(columns=filter_cols))
~\AppData\Roaming\Python\Python37\site-packages\pandas\core\series.py in __setitem__(self, key, value)
1018 key = np.asarray(key, dtype=bool)
1019 try:
-> 1020 self._where(~key, value, inplace=True)
1021 except InvalidIndexError:
1022 self.iloc[key] = value
~\AppData\Roaming\Python\Python37\site-packages\pandas\core\generic.py in _where(self, cond, other, inplace, axis, level, errors, try_cast)
8820 else:
8821 raise ValueError(
-> 8822 "Length of replacements must equal series length"
8823 )
8824
ValueError: Length of replacements must equal series length
---------------------------------------------------------------------------
TypeError回溯(最近一次调用上次)
\uuuu setitem\uuuu中的~\AppData\Roaming\Python\Python37\site packages\pandas\core\series.py(self、key、value)
999请尝试:
->1000自我设置,带引擎(键、值)
1001除外(KeyError、ValueError):
~\AppData\Roaming\Python\Python37\site packages\pandas\core\series.py在带有引擎的\u集合中(self、key、value)
1032#IntervalIndex的AttributeError失败
->1033 loc=自索引引擎获取loc(键)
1034验证数值转换(self.dtype,value)
熊猫\\u libs\index.pyx在熊猫中。\ u libs.index.IndexEngine.get_loc()
熊猫\\u libs\index.pyx在熊猫中。\ u libs.index.IndexEngine.get_loc()
TypeError:“0错误”
1错误
2错误
3正确
4错误
...
495错误
496错
497错误
498错误
499错误
名称:交货费用,长度:500,数据类型:bool'是无效密钥
在处理上述异常期间,发生了另一个异常:
ValueError回溯(最近一次调用上次)
在里面
1#dirty#df.reindex(列=过滤列)
2#缺少df[‘交货费用’][缺少df[‘交货费用’]。isnull()]=iimput模型。预测(缺少df.drop([‘订单id’、‘客户id’、‘日期’、‘最近的仓库’、‘购物车’、‘订单价格’、‘交货费用’、‘客户lat’、‘客户长时间’、‘优惠券折扣’、‘订单总数’、‘季节’、‘是否加快交货’、‘最新交货’、‘客户审核’)
---->3缺少_df['delivery_charges'][缺少_df['delivery_charges'].isnull()]=iimput_model.predict(缺少_df.reindex(列=过滤器列))
\uuuu setitem\uuuu中的~\AppData\Roaming\Python\Python37\site packages\pandas\core\series.py(self、key、value)
1018 key=np.asarray(key,dtype=bool)
1019试试:
->1020 self.\u其中(~key,value,inplace=True)
1021除InvalidIndex错误外:
1022 self.iloc[键]=值
~\AppData\Roaming\Python\Python37\site packages\pandas\core\generic.py in\u where(self、cond、other、in place、axis、level、errors、try\u cast)
8820其他:
8821上升值错误(
->8822“更换件的长度必须等于系列长度”
8823 )
8824
ValueError:替换的长度必须等于序列长度
有人能解释一下这个错误以及如何解决它吗?我在这个问题上花了一点时间,并且一直在缩短
感谢您的建议或意见,这是因为等号两边有两个不同长度的序列。这是因为您在左侧根据空值对序列进行子集划分,而在右侧不这样做。你只需要确保你的系列长度相等