为pandas DataFrame和本机Python dict创建Mixin类

为pandas DataFrame和本机Python dict创建Mixin类,python,pandas,dictionary,get,mixins,Python,Pandas,Dictionary,Get,Mixins,如何为pandas DataFrame和本机Python dict创建一个mixin类,以便可以像嵌套的dict一样访问DataFrame列? 从中,使用df.loc()函数是访问所需行/列/切片的方法 但目标是使用与本地Python dict相同的语法访问二维数据帧 >>> import pandas as pd >>> df = pd.DataFrame([['x', 1,2,3,4,5], ['y', 6,7,8,9,10], ['z', 11,12,1

如何为pandas DataFrame和本机Python dict创建一个mixin类,以便可以像嵌套的dict一样访问DataFrame列?

从中,使用
df.loc()
函数是访问所需行/列/切片的方法

但目标是使用与本地Python dict相同的语法访问二维数据帧

>>> import pandas as pd
>>> df = pd.DataFrame([['x', 1,2,3,4,5], ['y', 6,7,8,9,10], ['z', 11,12,13,14,15]])
>>> df.columns = ['index', 'a', 'b', 'c', 'd', 'e']
>>> df = df.set_index(['index'])
>>> df
        a   b   c   d   e
index                    
x       1   2   3   4   5
y       6   7   8   9  10
z      11  12  13  14  15

>>> df['x']
[1, 2, 3, 4, 5]

>>> df['x']['a']
1

>>> df['x']['a', 'b']
(1, 2)

>>> df['x']['a', 'd', 'c']
(1, 4, 3)
我尝试创建一个mixin类,如下所示:

from pandas import DataFrame

class VegeTable(DataFrame, dict):
    def __init__(self, *args, **kwargs):
        DataFrame.__init__(self, *args, **kwargs)
    def __getitem__(self, row_key, column_key):
        if type(row_key) != list:
            row_key = [row_key]
        if type(column_key) != list:
            column_key = [column_key]
        return df.loc[row_key, column_key]
但是我认为缺少了一些东西,比如字典键访问不起作用,
dict.get
返回一个奇怪的值:

>>> from pandas import DataFrame
>>> 
>>> 
>>> class VegeTable(DataFrame, dict):
...     def __init__(self, *args, **kwargs):
...         DataFrame.__init__(self, *args, **kwargs)
...     def __getitem__(self, row_key, column_key):
...         if type(row_key) != list:
...             row_key = [row_key]
...         if type(column_key) != list:
...             column_key = [column_key]
...         return df.loc[row_key, column_key]
... 
>>> 
>>> vt = VegeTable([['x', 1,2,3,4,5], ['y', 6,7,8,9,10], ['z', 11,12,13,14,15]])
>>> vt.columns = ['index', 'a', 'b', 'c', 'd', 'e']
>>> vt = vt.set_index(['index'])
>>> vt
        a   b   c   d   e
index                    
x       1   2   3   4   5
y       6   7   8   9  10
z      11  12  13  14  15
>>> vt['x']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 2062, in __getitem__
    return self._getitem_column(key)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 2069, in _getitem_column
    return self._get_item_cache(key)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/generic.py", line 1534, in _get_item_cache
    values = self._data.get(item)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/internals.py", line 3590, in get
    loc = self.items.get_loc(item)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 2395, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5239)
  File "pandas/_libs/index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5085)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1207, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20405)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1215, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20359)
KeyError: 'x'
>>> vt.get(['x'])
>>> vt.get('x')
>>> vt.get('x', 'a')
'a'
>>> vt.get('x', ['a', 'b'])
['a', 'b']
>>> vt.get('x', ['a', 'b'])
>>从熊猫导入数据帧
>>> 
>>> 
>>>类蔬菜(数据帧,dict):
...     定义初始化(self,*args,**kwargs):
...         数据帧.uuuu初始化(self,*args,**kwargs)
...     def uu getitem uu(self、row_键、column_键):
...         如果类型(行键)!=名单:
...             行\键=[行\键]
...         如果类型(列_键)!=名单:
...             列\u键=[列\u键]
...         返回df.loc[行\键,列\键]
... 
>>> 
>>>vt=蔬菜(['x',1,2,3,4,5],'y',6,7,8,9,10],'z',11,12,13,14,15])
>>>vt.columns=['index','a','b','c','d','e']
>>>vt=vt.set_索引(['index'])
>>>vt
a、b、c、d、e
指数
x123445
y 6 7 8 9 10
z 11 12 13 14 15
>>>vt['x']
回溯(最近一次呼叫最后一次):
文件“”,第1行,在
文件“/usr/local/lib/python2.7/site packages/pandas/core/frame.py”,第2062行,在__
返回self.\u getitem\u列(键)
文件“/usr/local/lib/python2.7/site packages/pandas/core/frame.py”,第2069行,在_getitem_列中
返回self.\u获取\u项目\u缓存(密钥)
文件“/usr/local/lib/python2.7/site packages/pandas/core/generic.py”,第1534行,在获取项目缓存中
values=self.\u data.get(项目)
get中的文件“/usr/local/lib/python2.7/site packages/pandas/core/internals.py”,第3590行
loc=自身项目。获取loc(项目)
文件“/usr/local/lib/python2.7/site packages/pandas/core/index/base.py”,第2395行,在get_loc中
返回self.\u引擎。获取\u loc(self.\u可能\u cast\u索引器(键))
文件“pandas/_libs/index.pyx”,第132行,在pandas._libs.index.IndexEngine.get_loc中(pandas/_libs/index.c:5239)
文件“pandas/_libs/index.pyx”,第154行,在pandas._libs.index.IndexEngine.get_loc中(pandas/_libs/index.c:5085)
pandas.\u libs.hashtable.PyObjectHashTable.get_项中的文件“pandas/_libs/hashtable\u class\u helper.pxi”,第1207行(pandas/_libs/hashtable.c:20405)
pandas.\u libs.hashtable.PyObjectHashTable.get_项(pandas/\u libs/hashtable.c:20359)中第1215行的文件“pandas/\u libs/hashtable\u class\u helper.pxi”
KeyError:'x'
>>>vt.get(['x'])
>>>vt.get('x')
>>>vt.get('x','a')
“a”
>>>vt.get('x',['a','b']))
['a','b']
>>>vt.get('x',['a','b']))
如何为pandas DataFrame和原生Python dict创建一个mixin类,以便像嵌套dict一样访问DataFrame列?这可能吗?如果是,怎么做

推理错误

  • vt=vt.set\u索引(['index'])

    这将把
    df
    重新定义为

    您必须重载它或
    Typecast
    结果
    df

  • def\uuuu getitem\uuuuuuuu(self,row\u key,column\u key=None):

    只有一个参数被传递到
    def\uuu getitem\uuu(…

    多个参数必须位于
    […]
    ,, e、 g.vt['x',['a','b','c']]

  • 如果你接受这个稍微不同的符号, 此实现可执行您想要的操作:

    class DataFrame2(DataFrame):
        def __init__(self, *args, **kwargs):
            super().__init__(*args, **kwargs)
    
        def __getitem__(self, item):
            if isinstance(item, tuple):
                row = self.loc[item[0]]
                sub_item = item[1]
                if isinstance(sub_item, list):
                    r = [row.loc[key] for key in sub_item]
                    if len(r) == 1:
                        return r[0]
                    else:
                        return tuple(r)
                else:
                    # NotImplemented, Parameter other than tuple('x', [list])
                    raise Exception(NotImplemented)
            else:
                return tuple(self.loc[item])
    
        def set_index(self, index):
            return DataFrame2(super().set_index(index))
    
    # Usage:
    df = DataFrame2(data)
    df.columns = ['index', 'a', 'b', 'c', 'd', 'e']
    df = df.set_index(['index'])
    
    print('df[\'x\']={}\n'.format(df['x']))
    print('df[\'x\'][\'a\']={}\n'.format(df['x',['a']]))
    print('df[\'x\'][\'a\', \'b\']={}\n'.format(df['x', ['a', 'b']]))
    print('df[\'x\'][\'a\', \'b\', \'c\']={}\n'.format(df['x', ['a', 'b', 'c']]))
    
    输出


    使用Python:3.4.2进行测试

    我认为创建一个mixin类不是一个好主意。当您使用pandas时,您应该以pandas的方式进行思考。我还怀疑本机Python嵌套dict是否可以通过这种方式进行评估:

    In []: df['x']['a', 'b']
    
    但是,如果您坚持,请先尝试以下代码:

    In []: df.T.to_dict()
    Out[]:
    {'x': {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5},
     'y': {'a': 6, 'b': 7, 'c': 8, 'd': 9, 'e': 10},
     'z': {'a': 11, 'b': 12, 'c': 13, 'd': 14, 'e': 15}}
    

    如果要使用
    \uu getitem\uuu
    进行行访问,而不是当前的列访问,您建议如何进行列访问?与访问嵌套dict
    defaultdict(dict)
    的方式相同,即
    vt[row\u id,column\u id]
    对于行访问,
    vt[row\u id]
    In []: df.T.to_dict()
    Out[]:
    {'x': {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5},
     'y': {'a': 6, 'b': 7, 'c': 8, 'd': 9, 'e': 10},
     'z': {'a': 11, 'b': 12, 'c': 13, 'd': 14, 'e': 15}}