Python 使用pandas将字符串替换为其自身的较短版本

Python 使用pandas将字符串替换为其自身的较短版本,python,pandas,Python,Pandas,我有一个pandas数据框架,其中一列是模型变量,另一列是相应的统计数据。我已经进行了一些字符串操作,以获取一个派生的汇总表,从而从模型中加入汇总表。 lost_cost_final_table.loc[lost_cost_final_table['variable'].str.contains('class_cc',case=False),'variable']=lost_cost_final_table['variable'].str[:8] 完全回溯 -------------------

我有一个pandas数据框架,其中一列是模型变量,另一列是相应的统计数据。我已经进行了一些字符串操作,以获取一个派生的汇总表,从而从模型中加入汇总表。
lost_cost_final_table.loc[lost_cost_final_table['variable'].str.contains('class_cc',case=False),'variable']=lost_cost_final_table['variable'].str[:8]

完全回溯

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-229-1dbe5bd14d4b> in <module>
----> 1 lost_cost_final_table.loc[lost_cost_final_table['variable'].str.contains('class_cc', case = False), 'variable'] = lost_cost_final_table['variable'].str[:8]
      2 #lost_cost_final_table.loc[lost_cost_final_table['variable'].str.contains('class_v_age', case = False), 'variable'] = lost_cost_final_table['variable'].str[:11]
      3 #lost_cost_final_table.loc[lost_cost_final_table['variable'].str.contains('married_age', case = False), 'variable'] = lost_cost_final_table['variable'].str[:11]
      4 #lost_cost_final_table.loc[lost_cost_final_table['variable'].str.contains('state_model', case = False), 'variable'] = lost_cost_final_table['variable'].str[:11]
      5 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in __setitem__(self, key, value)
    187             key = com._apply_if_callable(key, self.obj)
    188         indexer = self._get_setitem_indexer(key)
--> 189         self._setitem_with_indexer(indexer, value)
    190 
    191     def _validate_key(self, key, axis):

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in _setitem_with_indexer(self, indexer, value)
    467 
    468             if isinstance(value, ABCSeries):
--> 469                 value = self._align_series(indexer, value)
    470 
    471             info_idx = indexer[info_axis]

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in _align_series(self, indexer, ser, multiindex_indexer)
    732                         return ser._values.copy()
    733 
--> 734                     return ser.reindex(new_ix)._values
    735 
    736                 # 2 dims

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py in reindex(self, index, **kwargs)
   3323     @Appender(generic._shared_docs['reindex'] % _shared_doc_kwargs)
   3324     def reindex(self, index=None, **kwargs):
-> 3325         return super(Series, self).reindex(index=index, **kwargs)
   3326 
   3327     def drop(self, labels=None, axis=0, index=None, columns=None,

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in reindex(self, *args, **kwargs)
   3687         # perform the reindex on the axes
   3688         return self._reindex_axes(axes, level, limit, tolerance, method,
-> 3689                                   fill_value, copy).__finalize__(self)
   3690 
   3691     def _reindex_axes(self, axes, level, limit, tolerance, method, fill_value,

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
   3705             obj = obj._reindex_with_indexers({axis: [new_index, indexer]},
   3706                                              fill_value=fill_value,
-> 3707                                              copy=copy, allow_dups=False)
   3708 
   3709         return obj

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in _reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups)
   3808                                                 fill_value=fill_value,
   3809                                                 allow_dups=allow_dups,
-> 3810                                                 copy=copy)
   3811 
   3812         if copy and new_data is self._data:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals.py in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy)
   4412         # some axes don't allow reindexing with dups
   4413         if not allow_dups:
-> 4414             self.axes[axis]._can_reindex(indexer)
   4415 
   4416         if axis >= self.ndim:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in _can_reindex(self, indexer)
   3574         # trying to reindex on an axis with duplicates
   3575         if not self.is_unique and len(indexer):
-> 3576             raise ValueError("cannot reindex from a duplicate axis")
   3577 
   3578     def reindex(self, target, method=None, level=None, limit=None,

ValueError: cannot reindex from a duplicate axis

lost\u cost\u final\u表的索引不唯一,可以通过运行
reset\u index
来修复:

lost_cost_final_table.reset_index(inplace=True)

看起来
丢失的成本\u最终\u表的索引可能包含重复项。
lost\u cost\u final\u table.index.is\u unique的输出是什么。
lost\u cost\u final\u table.index.is\u unique
=
False
正常,尝试使用
lost\u cost\u final\u table.reset\u index(inplace=True)
重置索引,然后再次完整运行代码行。如果你想把它写进一个答案,我可以接受。非常感谢。
lost_cost_final_table.reset_index(inplace=True)