Python 按多列(时间序列、字符串列)分组失败

Python 按多列(时间序列、字符串列)分组失败,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我有一个TimeSeries数据,我试图按月分组,然后是服务类型,所以基本上是在多个列上分组 我可以让每个groupby独立工作(参见下面的两种情况)。然而,当我尝试将这两个项目组合在一起时,它失败了,出现了以下异常(如下所示) 第一个groupby位于“服务”列上: df_service =df_mem[['service','amount']].groupby('service').agg(['sum','count'])

我有一个TimeSeries数据,我试图按月分组,然后是服务类型,所以基本上是在多个列上分组

我可以让每个groupby独立工作(参见下面的两种情况)。然而,当我尝试将这两个项目组合在一起时,它失败了,出现了以下异常(如下所示)

第一个groupby位于“服务”列上:

df_service =df_mem[['service','amount']].groupby('service').agg(['sum','count'])


                                                   sum count
service                                                     
10-Class Pack - $170+HST                       1728.90     9
646 Blue T-shirt (stars)                         25.00     1
646 Foundations + 12 classes $210+HST           237.30     1
646 Foundations + 8 Classes - $159+HST          876.38     5
646 Klawkov Tee, Wonder Woman Tee, 2 Drop-Ins    96.05     1
Bronze (8/mth)                                 1830.60    12
Bronze (8/mth) $135+HST                        1121.07    10
Clothing - Sweatpants XL, Grey Hoodie L          94.27     2
Drop-In $20+HST                                 158.20     7
Gold (Unlimited) - $185+HST                    1604.56    19
Leather lifting straps                           25.00     1
Men's Dimas Tee, Large                           28.25     1
Open Gym                                        220.35     3
Open Gym - $65+HST                               83.07     3
Red 646 Raglan, Large                            33.90     1
Silver                                         2237.40    12
Silver (12/mth) $165+HST                       1294.28    13
Test                                              2.00     2
Thumb Tape                                        6.25     1
Unlimited Gold                                 6898.65    33
Women's Fleece Pants                             48.59     1
Wonder Woman muscle tank (2) +HST                56.50     1
第二个groupby位于TimeSeries索引上(频率=月):

当我将两者结合在一起时,它会失败:

df_montly_service = df_mem.groupby([pd.Grouper(freq='M'),'service']).agg(['sum','count'])

Traceback (most recent call last):
  File "C:/Users/Karunyan/PycharmProjects/646W/import_data.py", line 24, in <module>
    df_montly_service = df_mem.groupby([pd.Grouper(freq='M'),'service']).agg(['sum','count'])
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py", line 3991, in groupby
    **kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 1511, in groupby
    return klass(obj, by, **kwds)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 370, in __init__
    mutated=self.mutated)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 2484, in _get_grouper
    if not isinstance(gpr, Grouping) else gpr
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 2262, in __init__
    grouper = self.grouper._get_binner_for_grouping(self.obj)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\tseries\resample.py", line 1102, in _get_binner_for_grouping
    grouper = grouper.take(indexer)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\tseries\base.py", line 379, in take
    na_value=tslib.iNaT)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\indexes\base.py", line 1539, in _assert_take_fillable
    taken = values.take(indices)
IndexError: index 142 is out of bounds for size 142




df_mem.head(15)

                              member                   service                       option       method  amount  ccy
date                                                                                                                 
2017-07-01 20:18:00  Names_coded_out            Unlimited Gold         subscription payment  credit card  209.05  cad
2017-07-01 19:07:00  Names_coded_out            Unlimited Gold         subscription payment  credit card  209.05  cad
2017-07-01 18:50:00  Names_coded_out            Bronze (8/mth)         subscription payment  credit card  152.55  cad
2017-07-01 18:33:00  Names_coded_out            Unlimited Gold         subscription payment  credit card  209.05  cad
2017-07-01 18:15:00  Names_coded_out            Unlimited Gold         subscription payment  credit card  209.05  cad
2017-07-01 18:14:00  Names_coded_out            Bronze (8/mth)         subscription payment  credit card  152.55  cad
2017-07-01 16:50:00  Names_coded_out            Unlimited Gold         subscription payment  credit card  209.05  cad
2017-07-01 16:23:00  Names_coded_out                  Open Gym  subscription payment failed  credit card   73.45  cad
2017-07-01 16:09:00  Names_coded_out            Unlimited Gold         subscription payment  credit card  209.05  cad
2017-07-01 15:22:00  Names_coded_out  Silver (12/mth) $165+HST         subscription prorate  credit card  179.00  cad
2017-07-01 15:20:00  Names_coded_out                    Silver         subscription payment  credit card  186.45  cad
2017-07-01 14:36:00  Names_coded_out            Unlimited Gold         subscription payment  credit card  209.05  cad
2017-07-01 14:14:00  Names_coded_out            Unlimited Gold         subscription payment  credit card  209.05  cad
2017-07-01 14:06:00  Names_coded_out                    Silver         subscription payment  credit card  186.45  cad
2017-07-01 13:57:00  Names_coded_out            Unlimited Gold         subscription payment  credit card  209.05  cad
df_montly_service=df_mem.groupby([pd.Grouper(freq='M'),'service']).agg(['sum','count']))
回溯(最近一次呼叫最后一次):
文件“C:/Users/Karunyan/PycharmProjects/646W/import_data.py”,第24行,在
df_montly_service=df_mem.groupby([pd.Grouper(freq='M'),'service']).agg(['sum','count']))
文件“C:\ProgramData\Anaconda3\lib\site packages\pandas\core\generic.py”,第3991行,在groupby中
**kwargs)
文件“C:\ProgramData\Anaconda3\lib\site packages\pandas\core\groupby.py”,第1511行,在groupby中
返回klass(obj,由,**科威特先令)
文件“C:\ProgramData\Anaconda3\lib\site packages\pandas\core\groupby.py”,第370行,在\uuu init中__
变异的
文件“C:\ProgramData\Anaconda3\lib\site packages\pandas\core\groupby.py”,第2484行,在\u get\u grouper中
如果不存在(gpr,分组),则为其他gpr
文件“C:\ProgramData\Anaconda3\lib\site packages\pandas\core\groupby.py”,第2262行,在\uuu init中__
grouper=self.grouper.\u获取\u binner\u进行分组(self.obj)
文件“C:\ProgramData\Anaconda3\lib\site packages\pandas\tseries\resample.py”,第1102行,位于用于分组的\u get\u binner\u中
石斑鱼=石斑鱼.take(索引器)
文件“C:\ProgramData\Anaconda3\lib\site packages\pandas\tseries\base.py”,第379行,在take中
na_值=tslib.iNaT)
文件“C:\ProgramData\Anaconda3\lib\site packages\pandas\indexes\base.py”,第1539行,在\u assert\u take\u filleble中
take=值。take(索引)
索引器:索引142超出大小142的范围
df_成员头(15)
会员服务选择权方法金额ccy
日期
2017-07-01 20:18:00姓名编码无限制黄金认购支付信用卡209.05加元
2017-07-01 19:07:00姓名编码无限制黄金认购支付信用卡209.05加元
2017-07-01 18:50:00姓名编码青铜(8个月)订阅支付信用卡152.55加元
2017-07-01 18:33:00姓名编码无限制黄金认购支付信用卡209.05加元
2017-07-01 18:15:00姓名编码无限制黄金认购支付信用卡209.05加元
2017-07-01 18:14:00姓名编码青铜(8个月)订阅支付信用卡152.55加元
2017-07-01 16:50:00姓名编码无限制黄金认购支付信用卡209.05加元
2017-07-01 16:23:00姓名编码公开健身房订阅付款失败信用卡73.45加元
2017-07-01 16:09:00姓名编码无限制黄金认购支付信用卡209.05加元
2017-07-01 15:22:00姓名编码银(12/月)165美元+HST订阅按比例信用卡179.00加元
2017-07-01 15:20:00姓名编码银订阅支付信用卡186.45加元
2017-07-01 14:36:00姓名编码无限制黄金认购支付信用卡209.05加元
2017-07-01 14:14:00姓名编码无限制黄金认购支付信用卡209.05加元
2017-07-01 14:06:00姓名编码银订阅支付信用卡186.45加元
2017-07-01 13:57:00姓名编码无限制黄金认购支付信用卡209.05加元

您能否发布
df_mem.head()
?更新以包括dataframe@Karun,无法使用发布的数据集重现该错误-一切正常。。。您的Pandas版本是什么?Pandas v.0.19.2看起来可以升级到0.20.3。我试试看。Fingers crossedI无法使用0.19.2进行复制。
df_montly_service = df_mem.groupby([pd.Grouper(freq='M'),'service']).agg(['sum','count'])

Traceback (most recent call last):
  File "C:/Users/Karunyan/PycharmProjects/646W/import_data.py", line 24, in <module>
    df_montly_service = df_mem.groupby([pd.Grouper(freq='M'),'service']).agg(['sum','count'])
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py", line 3991, in groupby
    **kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 1511, in groupby
    return klass(obj, by, **kwds)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 370, in __init__
    mutated=self.mutated)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 2484, in _get_grouper
    if not isinstance(gpr, Grouping) else gpr
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 2262, in __init__
    grouper = self.grouper._get_binner_for_grouping(self.obj)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\tseries\resample.py", line 1102, in _get_binner_for_grouping
    grouper = grouper.take(indexer)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\tseries\base.py", line 379, in take
    na_value=tslib.iNaT)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\indexes\base.py", line 1539, in _assert_take_fillable
    taken = values.take(indices)
IndexError: index 142 is out of bounds for size 142




df_mem.head(15)

                              member                   service                       option       method  amount  ccy
date                                                                                                                 
2017-07-01 20:18:00  Names_coded_out            Unlimited Gold         subscription payment  credit card  209.05  cad
2017-07-01 19:07:00  Names_coded_out            Unlimited Gold         subscription payment  credit card  209.05  cad
2017-07-01 18:50:00  Names_coded_out            Bronze (8/mth)         subscription payment  credit card  152.55  cad
2017-07-01 18:33:00  Names_coded_out            Unlimited Gold         subscription payment  credit card  209.05  cad
2017-07-01 18:15:00  Names_coded_out            Unlimited Gold         subscription payment  credit card  209.05  cad
2017-07-01 18:14:00  Names_coded_out            Bronze (8/mth)         subscription payment  credit card  152.55  cad
2017-07-01 16:50:00  Names_coded_out            Unlimited Gold         subscription payment  credit card  209.05  cad
2017-07-01 16:23:00  Names_coded_out                  Open Gym  subscription payment failed  credit card   73.45  cad
2017-07-01 16:09:00  Names_coded_out            Unlimited Gold         subscription payment  credit card  209.05  cad
2017-07-01 15:22:00  Names_coded_out  Silver (12/mth) $165+HST         subscription prorate  credit card  179.00  cad
2017-07-01 15:20:00  Names_coded_out                    Silver         subscription payment  credit card  186.45  cad
2017-07-01 14:36:00  Names_coded_out            Unlimited Gold         subscription payment  credit card  209.05  cad
2017-07-01 14:14:00  Names_coded_out            Unlimited Gold         subscription payment  credit card  209.05  cad
2017-07-01 14:06:00  Names_coded_out                    Silver         subscription payment  credit card  186.45  cad
2017-07-01 13:57:00  Names_coded_out            Unlimited Gold         subscription payment  credit card  209.05  cad