Python pandas中多个聚合和列重命名的语法

Python pandas中多个聚合和列重命名的语法,python,pandas,Python,Pandas,请有人解释一下为什么这个agg()不适合我: import pandas as pd df = pd.DataFrame({'name':["alice","bob","charlene","alice","bob","charlene","alice","bob","charlene","edna&qu

请有人解释一下为什么这个agg()不适合我:

import pandas as pd 

df = pd.DataFrame({'name':["alice","bob","charlene","alice","bob","charlene","alice","bob","charlene","edna" ],
                   'date':["2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-01-02","2020-01-01","2020-01-02","2020-01-01"],
                   'contribution': [5,5,10,20,30,1,5,5,10,100],
                   'payment-type': ["cash","transfer","cash","transfer","cash","transfer","cash","transfer","cash","transfer",]})
df['date'] = pd.to_datetime(df['date'])

daily_count = df.groupby(pd.Grouper(key='date', freq='1D')).agg({'name': 'value_counts', 'contribution': 'sum'}).rename(columns={'name': 'name_count', 'contribution': 'contribution_sum' }).reset_index()
我看到以下错误:

Traceback (most recent call last):
  File "/Users/andrew/git-analysis/some_pandas_test copy.py", line 11, in <module>
    daily_count = df.groupby(pd.Grouper(key='date', freq='1D')).agg({'name': ['value_counts'], 'contribution': ['sum']}).rename(columns={'name': 'name_count', 'contribution': 'contribution_sum' })
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 928, in aggregate
    result, how = self._aggregate(func, *args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/base.py", line 443, in _aggregate
    return concat([result[k] for k in keys], keys=keys, axis=1), True
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 271, in concat
    op = _Concatenator(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 452, in __init__
    self.new_axes = self._get_new_axes()
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 515, in _get_new_axes
    return [
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 516, in <listcomp>
    self._get_concat_axis() if i == self.axis else self._get_comb_axis(i)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 522, in _get_comb_axis
    return get_objs_combined_axis(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/indexes/api.py", line 90, in get_objs_combined_axis
    return _get_combined_index(obs_idxes, intersect=intersect, sort=sort)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/indexes/api.py", line 138, in _get_combined_index
    index = union_indexes(indexes, sort=sort)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/indexes/api.py", line 205, in union_indexes
    result = result.union(other)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/indexes/multi.py", line 3211, in union
    uniq_tuples = lib.fast_unique_multiple(
  File "pandas/_libs/lib.pyx", line 229, in pandas._libs.lib.fast_unique_multiple
ValueError: cannot include dtype 'M' in a buffer
回溯(最近一次呼叫最后一次):
文件“/Users/andrew/git analysis/some_pandas_test copy.py”,第11行,在
daily_count=df.groupby(pd.Grouper(key='date',freq='1D')).agg({'name':['value_counts'],'contribution':['sum']})。重命名(列={'name':'name_count','contribution':'contribution_sum'})
文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site packages/pandas/core/groupby/generic.py”,共928行
结果,how=self.\u聚合(func、*args、**kwargs)
文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/base.py”,第443行,合计
返回concat([result[k]表示k个键,键=键,轴=1),True
concat中的文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site packages/pandas/core/reformate/concat.py”,第271行
op=\u串联器(
文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site packages/pandas/core/reformate/concat.py”,第452行,在__
self.new\u axes=self.\u获取\u new\u axes()
文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site packages/pandas/core/reformate/concat.py”,第515行,在“获取新的”轴中
返回[
文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site packages/pandas/core/reformate/concat.py”,第516行,在
如果i==self.axis else self.\u获取comb\u轴(i),则获取comb\u轴()
文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/reformate/concat.py”,第522行,在“get\u comb\u轴”中
返回get_objs_组合轴(
文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site packages/pandas/core/index/api.py”,第90行,在get_objs_combined_轴中
return\u get\u combined\u索引(obs\u idx,intersect=intersect,sort=sort)
文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site packages/pandas/core/indexes/api.py”,第138行,在“get”和“combined”索引中
索引=联合索引(索引,排序=排序)
文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site packages/pandas/core/index/api.py”,第205行,联合索引中
结果=结果.联合(其他)
文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site packages/pandas/core/index/multi.py”,第3211行,以union格式
uniq\u tuples=lib.fast\u unique\u multiple(
文件“pandas/_libs/lib.pyx”,第229行,在pandas._libs.lib.fast_unique_multiple中
ValueError:不能在缓冲区中包含数据类型“M”
如果有人知道一本好的agg()食谱,我会很高兴看到它。我发现很难掌握这个函数


Ta,安德鲁

我相信这正是你想要的:

>>> (df
     .groupby(df['date'])
     .agg({'name': 'count', 'contribution': np.sum})
     .rename(columns={'name': 'name_count', 'contribution': 'contribution_sum' })
     .reset_index())
        date  name_count  contribution_sum
0 2020-01-01           8               176
1 2020-01-02           2                15

你有没有看过中关于聚合的部分?当我第一次了解这些东西时,它对我帮助很大

关于您的错误,您可以分别查看这两个聚合

In [6]: df.resample('1D', on='date')['name'].value_counts()
Out[6]: 
date        name    
2020-01-01  bob         3
            alice       2
            charlene    2
            edna        1
2020-01-02  alice       1
            charlene    1
Name: name, dtype: int64
In [7]: df.resample('1D', on='date')['contribution'].sum()
Out[7]: 
date
2020-01-01    176
2020-01-02     15
Freq: D, Name: contribution, dtype: int64
如果您所做的只是在日期列中按频率分组,
resample
函数比使用
Grouper
更简单。我只在需要按某个内容分组并同时重新采样时使用后者


如您所见(并在评论中提到)
value\u counts
返回一个多索引为
date
name
的序列,而
sum
只返回一个索引为
date
的序列

agg
大致是一个简短的符号,用于执行多个聚合,并将结果连接到一个数据帧,每个结果有一列。由于索引不匹配,因此无法组合这两个结果。 如果我在pandas 1.1上尝试你的代码,我会得到一个

NotImplementedError: Can only union MultiIndex with MultiIndex or Index of tuples, try mi.to_flat_index().union(other) instead.

基本上说,没有办法将具有不同索引级别数的序列串联起来。

value\u counts
返回一个序列,
sum
返回一个数字。我相信熊猫很难将它们对齐。您预期的结果是什么?