Pandas 重命名列后get-keyerror

Pandas 重命名列后get-keyerror,pandas,numpy,multiple-columns,rename,Pandas,Numpy,Multiple Columns,Rename,我有df: df = pd.DataFrame({'a':[7,8,9], 'b':[1,3,5], 'c':[5,3,6]}) print (df) a b c 0 7 1 5 1 8 3 3 2 9 5 6 然后将第一个值重命名为: 一切似乎都很美好: print (df) f b c 0 7 1 5 1 8 3 3 2 9 5 6 print (df.c

我有
df

df = pd.DataFrame({'a':[7,8,9],
                   'b':[1,3,5],
                   'c':[5,3,6]})

print (df)
   a  b  c
0  7  1  5
1  8  3  3
2  9  5  6
然后将第一个值重命名为:

一切似乎都很美好:

print (df)
   f  b  c
0  7  1  5
1  8  3  3
2  9  5  6

print (df.columns)
Index(['f', 'b', 'c'], dtype='object')

print (df.columns.values)
['f' 'b' 'c']
如果选择
b
,效果很好:

print (df['b'])
0    1
1    3
2    5
Name: b, dtype: int64
但是如果选择
a
它将返回列
f

print (df['a'])
0    7
1    8
2    9
Name: f, dtype: int64
如果选择
f
get key error

print (df['f'])
#KeyError: 'f'

print (df.info())
#KeyError: 'f'

问题是什么?有人能解释一下吗?还是bug?

您不需要更改
属性

尝试
df.columns.values=['a','b','c']
,您会得到:

这个故事的寓意是:以这种方式重命名专栏可能不是一个好主意


但是这个故事变得更奇怪了

df = pd.DataFrame({'a':[7,8,9],
                   'b':[1,3,5],
                   'c':[5,3,6]})

print(df)

df.columns.values[0] = 'f'

df['f'] = 1

df['f']

   f  f
0  7  1
1  8  1
2  9  1
def rename(self, *args, **kwargs):

    axes, kwargs = self._construct_axes_from_arguments(args, kwargs)
    copy = kwargs.pop('copy', True)
    inplace = kwargs.pop('inplace', False)

    if kwargs:
        raise TypeError('rename() got an unexpected keyword '
                        'argument "{0}"'.format(list(kwargs.keys())[0]))

    if com._count_not_none(*axes.values()) == 0:
        raise TypeError('must pass an index to rename')

    # renamer function if passed a dict
    def _get_rename_function(mapper):
        if isinstance(mapper, (dict, ABCSeries)):

            def f(x):
                if x in mapper:
                    return mapper[x]
                else:
                    return x
        else:
            f = mapper

        return f

    self._consolidate_inplace()
    result = self if inplace else self.copy(deep=copy)

    # start in the axis order to eliminate too many copies
    for axis in lrange(self._AXIS_LEN):
        v = axes.get(self._AXIS_NAMES[axis])
        if v is None:
            continue
        f = _get_rename_function(v)

        baxis = self._get_block_manager_axis(axis)
        result._data = result._data.rename_axis(f, axis=baxis, copy=copy)
        result._clear_item_cache()

    if inplace:
        self._update_inplace(result._data)
    else:
        return result.__finalize__(self)
这很好

df = pd.DataFrame({'a':[7,8,9],
                   'b':[1,3,5],
                   'c':[5,3,6]})

df.columns.values[0] = 'f'

df['f']

0    7
1    8
2    9
Name: f, dtype: int64
这不是好的

df = pd.DataFrame({'a':[7,8,9],
                   'b':[1,3,5],
                   'c':[5,3,6]})

print(df)

df.columns.values[0] = 'f'

df['f']
事实证明,我们可以在显示
df
之前修改
属性,它显然会在第一次显示
时运行所有初始化。如果在更改
属性之前显示它,它将出错

更奇怪

df = pd.DataFrame({'a':[7,8,9],
                   'b':[1,3,5],
                   'c':[5,3,6]})

print(df)

df.columns.values[0] = 'f'

df['f'] = 1

df['f']

   f  f
0  7  1
1  8  1
2  9  1
def rename(self, *args, **kwargs):

    axes, kwargs = self._construct_axes_from_arguments(args, kwargs)
    copy = kwargs.pop('copy', True)
    inplace = kwargs.pop('inplace', False)

    if kwargs:
        raise TypeError('rename() got an unexpected keyword '
                        'argument "{0}"'.format(list(kwargs.keys())[0]))

    if com._count_not_none(*axes.values()) == 0:
        raise TypeError('must pass an index to rename')

    # renamer function if passed a dict
    def _get_rename_function(mapper):
        if isinstance(mapper, (dict, ABCSeries)):

            def f(x):
                if x in mapper:
                    return mapper[x]
                else:
                    return x
        else:
            f = mapper

        return f

    self._consolidate_inplace()
    result = self if inplace else self.copy(deep=copy)

    # start in the axis order to eliminate too many copies
    for axis in lrange(self._AXIS_LEN):
        v = axes.get(self._AXIS_NAMES[axis])
        if v is None:
            continue
        f = _get_rename_function(v)

        baxis = self._get_block_manager_axis(axis)
        result._data = result._data.rename_axis(f, axis=baxis, copy=copy)
        result._clear_item_cache()

    if inplace:
        self._update_inplace(result._data)
    else:
        return result.__finalize__(self)
好像我们还不知道这是个坏主意


重命名的源代码

df = pd.DataFrame({'a':[7,8,9],
                   'b':[1,3,5],
                   'c':[5,3,6]})

print(df)

df.columns.values[0] = 'f'

df['f'] = 1

df['f']

   f  f
0  7  1
1  8  1
2  9  1
def rename(self, *args, **kwargs):

    axes, kwargs = self._construct_axes_from_arguments(args, kwargs)
    copy = kwargs.pop('copy', True)
    inplace = kwargs.pop('inplace', False)

    if kwargs:
        raise TypeError('rename() got an unexpected keyword '
                        'argument "{0}"'.format(list(kwargs.keys())[0]))

    if com._count_not_none(*axes.values()) == 0:
        raise TypeError('must pass an index to rename')

    # renamer function if passed a dict
    def _get_rename_function(mapper):
        if isinstance(mapper, (dict, ABCSeries)):

            def f(x):
                if x in mapper:
                    return mapper[x]
                else:
                    return x
        else:
            f = mapper

        return f

    self._consolidate_inplace()
    result = self if inplace else self.copy(deep=copy)

    # start in the axis order to eliminate too many copies
    for axis in lrange(self._AXIS_LEN):
        v = axes.get(self._AXIS_NAMES[axis])
        if v is None:
            continue
        f = _get_rename_function(v)

        baxis = self._get_block_manager_axis(axis)
        result._data = result._data.rename_axis(f, axis=baxis, copy=copy)
        result._clear_item_cache()

    if inplace:
        self._update_inplace(result._data)
    else:
        return result.__finalize__(self)

该委员会的评论中提到了这种行为。由于正在修改此索引对象的内部状态,因此可能无法将其传播到使用它的所有实例。我认为使用
df.rename(columns={'a':'f'})
是一种理想的方法。非常有趣的研究!我在想
print
如何造成这种差异。你知道为什么吗?以前从未见过它。@jezrael我的理论是,在第一次打印时会发生初始化。但这是错误,因为
打印的影响?我认为这是不可能的,但也许我错了。@jezrael调用print时,它调用repr方法。在这一点上,我猜pandas会运行一些缓存脚本,如果它们以前没有运行过的话。