Python 带有Pandas面板的非唯一轴上的Groupby

Python 带有Pandas面板的非唯一轴上的Groupby,python,pandas,Python,Pandas,我有一个带有非唯一长轴的pandas面板,我试图使用groupby对非唯一行求和,但我得到一个错误,即长轴不可编辑。我搜索了stack overflow和message board,但是面板的使用似乎不如dataframe广泛 下面是一个产生错误的示例: import pandas as pd import datetime as dt import dateutil.relativedelta as rd import numpy as np items = ['A','B'] minor_

我有一个带有非唯一长轴的pandas面板,我试图使用groupby对非唯一行求和,但我得到一个错误,即长轴不可编辑。我搜索了stack overflow和message board,但是面板的使用似乎不如dataframe广泛

下面是一个产生错误的示例:

import pandas as pd
import datetime as dt
import dateutil.relativedelta as rd
import numpy as np

items = ['A','B']
minor_axis = ['x','y']

diff = rd.relativedelta(years=1)

major_axis = [dt.date(2013,1,1) + (diff * shift) for shift in xrange(4)] * 2

values = np.random.randn(2,8,2)

data = pd.Panel(data=values, major_axis=major_axis, minor_axis=minor_axis, items=items)

data.groupby(sum, axis='major')
这是stacktrace:

    ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-29-e30fb9b32fce> in <module>()
----> 1 data.groupby(sum, axis='major')

/home/brendan/python_dev/venv/local/lib/python2.7/site-packages/pandas/core/panel.pyc in groupby(self, function, axis)
   1084         from pandas.core.groupby import PanelGroupBy
   1085         axis = self._get_axis_number(axis)
-> 1086         return PanelGroupBy(self, function, axis=axis)
   1087 
   1088     def swapaxes(self, axis1='major', axis2='minor', copy=True):

/home/brendan/python_dev/venv/local/lib/python2.7/site-packages/pandas/core/groupby.pyc in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze)
    195         if grouper is None:
    196             grouper, exclusions = _get_grouper(obj, keys, axis=axis,
--> 197                                                level=level, sort=sort)
    198 
    199         self.grouper = grouper

/home/brendan/python_dev/venv/local/lib/python2.7/site-packages/pandas/core/groupby.pyc in _get_grouper(obj, key, axis, level, sort)
   1323             raise AssertionError(errmsg)
   1324 
-> 1325         ping = Grouping(group_axis, gpr, name=name, level=level, sort=sort)
   1326         groupings.append(ping)
   1327 

/home/brendan/python_dev/venv/local/lib/python2.7/site-packages/pandas/core/groupby.pyc in __init__(self, index, grouper, name, level, sort)
   1197             # no level passed
   1198             if not isinstance(self.grouper, np.ndarray):
-> 1199                 self.grouper = self.index.map(self.grouper)
   1200                 if not (hasattr(self.grouper,"__len__") and \
   1201                    len(self.grouper) == len(self.index)):

/home/brendan/python_dev/venv/local/lib/python2.7/site-packages/pandas/core/index.pyc in map(self, mapper)
    856 
    857     def map(self, mapper):
--> 858         return self._arrmap(self.values, mapper)
    859 
    860     def isin(self, values):

/home/brendan/python_dev/venv/local/lib/python2.7/site-packages/pandas/algos.so in pandas.algos.arrmap_object (pandas/algos.c:62269)()

TypeError: 'datetime.date' object is not iterable
---------------------------------------------------------------------------
TypeError回溯(最近一次调用上次)
在()
---->1数据分组方式(求和,轴=主轴)
/groupby中的home/brendan/python_dev/venv/local/lib/python2.7/site-packages/pandas/core/panel.pyc(self、function、axis)
1084从pandas.core.groupby导入PanelGroupBy
1085轴=自身。获取轴编号(轴)
->1086返回面板分组方式(自身、功能、轴=轴)
1087
1088 def交换盘(self,axis1='major',axis2='minor',copy=True):
/home/brendan/python_dev/venv/local/lib/python2.7/site-packages/pandas/core/groupby.pyc in_uuuuinit_uuuuuuuu(self、obj、key、axis、level、grouper、excludes、selection、as_index、sort、group_ukey、squence)
195如果石斑鱼没有:
196 grouper,Exclutions=_get_grouper(对象,键,轴=轴,
-->197级别=级别,排序=排序)
198
199 self.gropper=石斑鱼
/home/brendan/python_dev/venv/local/lib/python2.7/site-packages/pandas/core/groupby.pyc in_get_grouper(对象、键、轴、级别、排序)
1323提出断言错误(errmsg)
1324
->1325 ping=分组(组_轴,gpr,名称=名称,级别=级别,排序=排序)
1326分组。追加(ping)
1327
/home/brendan/python_dev/venv/local/lib/python2.7/site-packages/pandas/core/groupby.pyc in_uuuuuuuinit_uuuuuuuuu(self、index、grouper、name、level、sort)
1197#未通过任何级别
1198如果不存在(self.gropper,np.ndarray):
->1199 self.grouper=self.index.map(self.grouper)
1200如果不是(hasattr(self.gropper,“\uuu len\uuuuuu”)和\
1201 len(self.gropper)==len(self.index)):
/地图中的home/brendan/python_dev/venv/local/lib/python2.7/site-packages/pandas/core/index.pyc(self,mapper)
856
857 def映射(自身,映射器):
-->858返回self.\u arrmap(self.values,mapper)
859
860 def isin(自身,值):
/pandas.algos.arrmap_对象中的home/brendan/python_dev/venv/local/lib/python2.7/site-packages/pandas/algos.so(pandas/algos.c:62269)()
TypeError:“datetime.date”对象不可编辑
关于如何处理这种情况有什么想法吗

非常感谢,

布伦丹

在0.12你可以试试

>>> data.groupby(np.sum, axis='major')
<pandas.core.groupby.PanelGroupBy object at 0x1a2ba50>
>data.groupby(np.sum,axis='major')
在0.12中,您可以尝试

>>> data.groupby(np.sum, axis='major')
<pandas.core.groupby.PanelGroupBy object at 0x1a2ba50>
>data.groupby(np.sum,axis='major')

虽然我认为你误解了群比,但@alko的答案确实是你问题的答案。您仍然需要在
groupby()
调用上应用函数或聚合,以对组
data.groupby(..).sum()中的所有项求和

但是我建议你考虑一下是否需要使用面板。当然我不知道你的情况,但在很多情况下,使用多索引可以解决问题

您的面板和groupby如下所示:

>>> items = ['A', 'A', 'B', 'B']
>>> minor_axis = ['x','y', 'x', 'y']
>>> diff = rd.relativedelta(years=1)
>>> major_axis = [dt.date(2013,1,1) + (diff * shift) for shift in xrange(4)] * 2
>>> values = np.random.randn(8,4)
>>> 
>>> data = pd.DataFrame(values, index=major_axis, columns=pd.MultiIndex.from_arrays([items, minor_axis]))
>>> data
                   A                   B          
                   x         y         x         y
2013-01-01 -1.063086  0.564123  0.128006 -0.658767
2014-01-01  2.182473 -0.851618  1.180264  0.165581
2015-01-01 -0.003941  0.590801 -1.616197 -2.270557
2016-01-01 -0.736524  0.172791  1.220589 -1.303294
2013-01-01 -1.052184 -1.171545 -0.473488 -0.140327
2014-01-01  0.021189  0.827241  0.775863 -0.882874
2015-01-01 -1.762289  0.705692  0.593365 -0.984109
2016-01-01 -1.946106 -1.108336 -1.691758 -0.088932

>>> data.groupby(data.index).sum()
                   A                   B          
                   x         y         x         y
2013-01-01 -2.115270 -0.607422 -0.345482 -0.799094
2014-01-01  2.203662 -0.024377  1.956127 -0.717293
2015-01-01 -1.766230  1.296492 -1.022832 -3.254667
2016-01-01 -2.682630 -0.935544 -0.471170 -1.392226

@alko的答案确实是你问题的答案,尽管我认为你误解了groupby。您仍然需要在
groupby()
调用上应用函数或聚合,以对组
data.groupby(..).sum()中的所有项求和

但是我建议你考虑一下是否需要使用面板。当然我不知道你的情况,但在很多情况下,使用多索引可以解决问题

您的面板和groupby如下所示:

>>> items = ['A', 'A', 'B', 'B']
>>> minor_axis = ['x','y', 'x', 'y']
>>> diff = rd.relativedelta(years=1)
>>> major_axis = [dt.date(2013,1,1) + (diff * shift) for shift in xrange(4)] * 2
>>> values = np.random.randn(8,4)
>>> 
>>> data = pd.DataFrame(values, index=major_axis, columns=pd.MultiIndex.from_arrays([items, minor_axis]))
>>> data
                   A                   B          
                   x         y         x         y
2013-01-01 -1.063086  0.564123  0.128006 -0.658767
2014-01-01  2.182473 -0.851618  1.180264  0.165581
2015-01-01 -0.003941  0.590801 -1.616197 -2.270557
2016-01-01 -0.736524  0.172791  1.220589 -1.303294
2013-01-01 -1.052184 -1.171545 -0.473488 -0.140327
2014-01-01  0.021189  0.827241  0.775863 -0.882874
2015-01-01 -1.762289  0.705692  0.593365 -0.984109
2016-01-01 -1.946106 -1.108336 -1.691758 -0.088932

>>> data.groupby(data.index).sum()
                   A                   B          
                   x         y         x         y
2013-01-01 -2.115270 -0.607422 -0.345482 -0.799094
2014-01-01  2.203662 -0.024377  1.956127 -0.717293
2015-01-01 -1.766230  1.296492 -1.022832 -3.254667
2016-01-01 -2.682630 -0.935544 -0.471170 -1.392226

你用的是哪种版本的熊猫?在开发版本中,这似乎是可行的。您使用的是熊猫的哪个版本?在开发版本中,这似乎是可行的。这对我来说似乎是可行的。如何从PanelGroupby对象获取面板?这似乎对我很有用。如何从PanelGroupby对象获取面板?谢谢,我认为这可能是更好的解决方案。谢谢,我认为这可能是更好的解决方案。