Python 按多列(时间序列、字符串列)分组失败
我有一个TimeSeries数据,我试图按月分组,然后是服务类型,所以基本上是在多个列上分组 我可以让每个groupby独立工作(参见下面的两种情况)。然而,当我尝试将这两个项目组合在一起时,它失败了,出现了以下异常(如下所示) 第一个groupby位于“服务”列上:Python 按多列(时间序列、字符串列)分组失败,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我有一个TimeSeries数据,我试图按月分组,然后是服务类型,所以基本上是在多个列上分组 我可以让每个groupby独立工作(参见下面的两种情况)。然而,当我尝试将这两个项目组合在一起时,它失败了,出现了以下异常(如下所示) 第一个groupby位于“服务”列上: df_service =df_mem[['service','amount']].groupby('service').agg(['sum','count'])
df_service =df_mem[['service','amount']].groupby('service').agg(['sum','count'])
sum count
service
10-Class Pack - $170+HST 1728.90 9
646 Blue T-shirt (stars) 25.00 1
646 Foundations + 12 classes $210+HST 237.30 1
646 Foundations + 8 Classes - $159+HST 876.38 5
646 Klawkov Tee, Wonder Woman Tee, 2 Drop-Ins 96.05 1
Bronze (8/mth) 1830.60 12
Bronze (8/mth) $135+HST 1121.07 10
Clothing - Sweatpants XL, Grey Hoodie L 94.27 2
Drop-In $20+HST 158.20 7
Gold (Unlimited) - $185+HST 1604.56 19
Leather lifting straps 25.00 1
Men's Dimas Tee, Large 28.25 1
Open Gym 220.35 3
Open Gym - $65+HST 83.07 3
Red 646 Raglan, Large 33.90 1
Silver 2237.40 12
Silver (12/mth) $165+HST 1294.28 13
Test 2.00 2
Thumb Tape 6.25 1
Unlimited Gold 6898.65 33
Women's Fleece Pants 48.59 1
Wonder Woman muscle tank (2) +HST 56.50 1
第二个groupby位于TimeSeries索引上(频率=月):
当我将两者结合在一起时,它会失败:
df_montly_service = df_mem.groupby([pd.Grouper(freq='M'),'service']).agg(['sum','count'])
Traceback (most recent call last):
File "C:/Users/Karunyan/PycharmProjects/646W/import_data.py", line 24, in <module>
df_montly_service = df_mem.groupby([pd.Grouper(freq='M'),'service']).agg(['sum','count'])
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py", line 3991, in groupby
**kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 1511, in groupby
return klass(obj, by, **kwds)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 370, in __init__
mutated=self.mutated)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 2484, in _get_grouper
if not isinstance(gpr, Grouping) else gpr
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 2262, in __init__
grouper = self.grouper._get_binner_for_grouping(self.obj)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\tseries\resample.py", line 1102, in _get_binner_for_grouping
grouper = grouper.take(indexer)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\tseries\base.py", line 379, in take
na_value=tslib.iNaT)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\indexes\base.py", line 1539, in _assert_take_fillable
taken = values.take(indices)
IndexError: index 142 is out of bounds for size 142
df_mem.head(15)
member service option method amount ccy
date
2017-07-01 20:18:00 Names_coded_out Unlimited Gold subscription payment credit card 209.05 cad
2017-07-01 19:07:00 Names_coded_out Unlimited Gold subscription payment credit card 209.05 cad
2017-07-01 18:50:00 Names_coded_out Bronze (8/mth) subscription payment credit card 152.55 cad
2017-07-01 18:33:00 Names_coded_out Unlimited Gold subscription payment credit card 209.05 cad
2017-07-01 18:15:00 Names_coded_out Unlimited Gold subscription payment credit card 209.05 cad
2017-07-01 18:14:00 Names_coded_out Bronze (8/mth) subscription payment credit card 152.55 cad
2017-07-01 16:50:00 Names_coded_out Unlimited Gold subscription payment credit card 209.05 cad
2017-07-01 16:23:00 Names_coded_out Open Gym subscription payment failed credit card 73.45 cad
2017-07-01 16:09:00 Names_coded_out Unlimited Gold subscription payment credit card 209.05 cad
2017-07-01 15:22:00 Names_coded_out Silver (12/mth) $165+HST subscription prorate credit card 179.00 cad
2017-07-01 15:20:00 Names_coded_out Silver subscription payment credit card 186.45 cad
2017-07-01 14:36:00 Names_coded_out Unlimited Gold subscription payment credit card 209.05 cad
2017-07-01 14:14:00 Names_coded_out Unlimited Gold subscription payment credit card 209.05 cad
2017-07-01 14:06:00 Names_coded_out Silver subscription payment credit card 186.45 cad
2017-07-01 13:57:00 Names_coded_out Unlimited Gold subscription payment credit card 209.05 cad
df_montly_service=df_mem.groupby([pd.Grouper(freq='M'),'service']).agg(['sum','count']))
回溯(最近一次呼叫最后一次):
文件“C:/Users/Karunyan/PycharmProjects/646W/import_data.py”,第24行,在
df_montly_service=df_mem.groupby([pd.Grouper(freq='M'),'service']).agg(['sum','count']))
文件“C:\ProgramData\Anaconda3\lib\site packages\pandas\core\generic.py”,第3991行,在groupby中
**kwargs)
文件“C:\ProgramData\Anaconda3\lib\site packages\pandas\core\groupby.py”,第1511行,在groupby中
返回klass(obj,由,**科威特先令)
文件“C:\ProgramData\Anaconda3\lib\site packages\pandas\core\groupby.py”,第370行,在\uuu init中__
变异的
文件“C:\ProgramData\Anaconda3\lib\site packages\pandas\core\groupby.py”,第2484行,在\u get\u grouper中
如果不存在(gpr,分组),则为其他gpr
文件“C:\ProgramData\Anaconda3\lib\site packages\pandas\core\groupby.py”,第2262行,在\uuu init中__
grouper=self.grouper.\u获取\u binner\u进行分组(self.obj)
文件“C:\ProgramData\Anaconda3\lib\site packages\pandas\tseries\resample.py”,第1102行,位于用于分组的\u get\u binner\u中
石斑鱼=石斑鱼.take(索引器)
文件“C:\ProgramData\Anaconda3\lib\site packages\pandas\tseries\base.py”,第379行,在take中
na_值=tslib.iNaT)
文件“C:\ProgramData\Anaconda3\lib\site packages\pandas\indexes\base.py”,第1539行,在\u assert\u take\u filleble中
take=值。take(索引)
索引器:索引142超出大小142的范围
df_成员头(15)
会员服务选择权方法金额ccy
日期
2017-07-01 20:18:00姓名编码无限制黄金认购支付信用卡209.05加元
2017-07-01 19:07:00姓名编码无限制黄金认购支付信用卡209.05加元
2017-07-01 18:50:00姓名编码青铜(8个月)订阅支付信用卡152.55加元
2017-07-01 18:33:00姓名编码无限制黄金认购支付信用卡209.05加元
2017-07-01 18:15:00姓名编码无限制黄金认购支付信用卡209.05加元
2017-07-01 18:14:00姓名编码青铜(8个月)订阅支付信用卡152.55加元
2017-07-01 16:50:00姓名编码无限制黄金认购支付信用卡209.05加元
2017-07-01 16:23:00姓名编码公开健身房订阅付款失败信用卡73.45加元
2017-07-01 16:09:00姓名编码无限制黄金认购支付信用卡209.05加元
2017-07-01 15:22:00姓名编码银(12/月)165美元+HST订阅按比例信用卡179.00加元
2017-07-01 15:20:00姓名编码银订阅支付信用卡186.45加元
2017-07-01 14:36:00姓名编码无限制黄金认购支付信用卡209.05加元
2017-07-01 14:14:00姓名编码无限制黄金认购支付信用卡209.05加元
2017-07-01 14:06:00姓名编码银订阅支付信用卡186.45加元
2017-07-01 13:57:00姓名编码无限制黄金认购支付信用卡209.05加元
您能否发布df_mem.head()
?更新以包括dataframe@Karun,无法使用发布的数据集重现该错误-一切正常。。。您的Pandas版本是什么?Pandas v.0.19.2看起来可以升级到0.20.3。我试试看。Fingers crossedI无法使用0.19.2进行复制。
df_montly_service = df_mem.groupby([pd.Grouper(freq='M'),'service']).agg(['sum','count'])
Traceback (most recent call last):
File "C:/Users/Karunyan/PycharmProjects/646W/import_data.py", line 24, in <module>
df_montly_service = df_mem.groupby([pd.Grouper(freq='M'),'service']).agg(['sum','count'])
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py", line 3991, in groupby
**kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 1511, in groupby
return klass(obj, by, **kwds)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 370, in __init__
mutated=self.mutated)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 2484, in _get_grouper
if not isinstance(gpr, Grouping) else gpr
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 2262, in __init__
grouper = self.grouper._get_binner_for_grouping(self.obj)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\tseries\resample.py", line 1102, in _get_binner_for_grouping
grouper = grouper.take(indexer)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\tseries\base.py", line 379, in take
na_value=tslib.iNaT)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\indexes\base.py", line 1539, in _assert_take_fillable
taken = values.take(indices)
IndexError: index 142 is out of bounds for size 142
df_mem.head(15)
member service option method amount ccy
date
2017-07-01 20:18:00 Names_coded_out Unlimited Gold subscription payment credit card 209.05 cad
2017-07-01 19:07:00 Names_coded_out Unlimited Gold subscription payment credit card 209.05 cad
2017-07-01 18:50:00 Names_coded_out Bronze (8/mth) subscription payment credit card 152.55 cad
2017-07-01 18:33:00 Names_coded_out Unlimited Gold subscription payment credit card 209.05 cad
2017-07-01 18:15:00 Names_coded_out Unlimited Gold subscription payment credit card 209.05 cad
2017-07-01 18:14:00 Names_coded_out Bronze (8/mth) subscription payment credit card 152.55 cad
2017-07-01 16:50:00 Names_coded_out Unlimited Gold subscription payment credit card 209.05 cad
2017-07-01 16:23:00 Names_coded_out Open Gym subscription payment failed credit card 73.45 cad
2017-07-01 16:09:00 Names_coded_out Unlimited Gold subscription payment credit card 209.05 cad
2017-07-01 15:22:00 Names_coded_out Silver (12/mth) $165+HST subscription prorate credit card 179.00 cad
2017-07-01 15:20:00 Names_coded_out Silver subscription payment credit card 186.45 cad
2017-07-01 14:36:00 Names_coded_out Unlimited Gold subscription payment credit card 209.05 cad
2017-07-01 14:14:00 Names_coded_out Unlimited Gold subscription payment credit card 209.05 cad
2017-07-01 14:06:00 Names_coded_out Silver subscription payment credit card 186.45 cad
2017-07-01 13:57:00 Names_coded_out Unlimited Gold subscription payment credit card 209.05 cad