Python pandas中多个聚合和列重命名的语法
请有人解释一下为什么这个agg()不适合我:Python pandas中多个聚合和列重命名的语法,python,pandas,Python,Pandas,请有人解释一下为什么这个agg()不适合我: import pandas as pd df = pd.DataFrame({'name':["alice","bob","charlene","alice","bob","charlene","alice","bob","charlene","edna&qu
import pandas as pd
df = pd.DataFrame({'name':["alice","bob","charlene","alice","bob","charlene","alice","bob","charlene","edna" ],
'date':["2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-01-02","2020-01-01","2020-01-02","2020-01-01"],
'contribution': [5,5,10,20,30,1,5,5,10,100],
'payment-type': ["cash","transfer","cash","transfer","cash","transfer","cash","transfer","cash","transfer",]})
df['date'] = pd.to_datetime(df['date'])
daily_count = df.groupby(pd.Grouper(key='date', freq='1D')).agg({'name': 'value_counts', 'contribution': 'sum'}).rename(columns={'name': 'name_count', 'contribution': 'contribution_sum' }).reset_index()
我看到以下错误:
Traceback (most recent call last):
File "/Users/andrew/git-analysis/some_pandas_test copy.py", line 11, in <module>
daily_count = df.groupby(pd.Grouper(key='date', freq='1D')).agg({'name': ['value_counts'], 'contribution': ['sum']}).rename(columns={'name': 'name_count', 'contribution': 'contribution_sum' })
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 928, in aggregate
result, how = self._aggregate(func, *args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/base.py", line 443, in _aggregate
return concat([result[k] for k in keys], keys=keys, axis=1), True
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 271, in concat
op = _Concatenator(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 452, in __init__
self.new_axes = self._get_new_axes()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 515, in _get_new_axes
return [
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 516, in <listcomp>
self._get_concat_axis() if i == self.axis else self._get_comb_axis(i)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 522, in _get_comb_axis
return get_objs_combined_axis(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/indexes/api.py", line 90, in get_objs_combined_axis
return _get_combined_index(obs_idxes, intersect=intersect, sort=sort)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/indexes/api.py", line 138, in _get_combined_index
index = union_indexes(indexes, sort=sort)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/indexes/api.py", line 205, in union_indexes
result = result.union(other)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/indexes/multi.py", line 3211, in union
uniq_tuples = lib.fast_unique_multiple(
File "pandas/_libs/lib.pyx", line 229, in pandas._libs.lib.fast_unique_multiple
ValueError: cannot include dtype 'M' in a buffer
回溯(最近一次呼叫最后一次):
文件“/Users/andrew/git analysis/some_pandas_test copy.py”,第11行,在
daily_count=df.groupby(pd.Grouper(key='date',freq='1D')).agg({'name':['value_counts'],'contribution':['sum']})。重命名(列={'name':'name_count','contribution':'contribution_sum'})
文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site packages/pandas/core/groupby/generic.py”,共928行
结果,how=self.\u聚合(func、*args、**kwargs)
文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/base.py”,第443行,合计
返回concat([result[k]表示k个键,键=键,轴=1),True
concat中的文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site packages/pandas/core/reformate/concat.py”,第271行
op=\u串联器(
文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site packages/pandas/core/reformate/concat.py”,第452行,在__
self.new\u axes=self.\u获取\u new\u axes()
文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site packages/pandas/core/reformate/concat.py”,第515行,在“获取新的”轴中
返回[
文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site packages/pandas/core/reformate/concat.py”,第516行,在
如果i==self.axis else self.\u获取comb\u轴(i),则获取comb\u轴()
文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/reformate/concat.py”,第522行,在“get\u comb\u轴”中
返回get_objs_组合轴(
文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site packages/pandas/core/index/api.py”,第90行,在get_objs_combined_轴中
return\u get\u combined\u索引(obs\u idx,intersect=intersect,sort=sort)
文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site packages/pandas/core/indexes/api.py”,第138行,在“get”和“combined”索引中
索引=联合索引(索引,排序=排序)
文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site packages/pandas/core/index/api.py”,第205行,联合索引中
结果=结果.联合(其他)
文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site packages/pandas/core/index/multi.py”,第3211行,以union格式
uniq\u tuples=lib.fast\u unique\u multiple(
文件“pandas/_libs/lib.pyx”,第229行,在pandas._libs.lib.fast_unique_multiple中
ValueError:不能在缓冲区中包含数据类型“M”
如果有人知道一本好的agg()食谱,我会很高兴看到它。我发现很难掌握这个函数
Ta,安德鲁我相信这正是你想要的:
>>> (df
.groupby(df['date'])
.agg({'name': 'count', 'contribution': np.sum})
.rename(columns={'name': 'name_count', 'contribution': 'contribution_sum' })
.reset_index())
date name_count contribution_sum
0 2020-01-01 8 176
1 2020-01-02 2 15
你有没有看过中关于聚合的部分?当我第一次了解这些东西时,它对我帮助很大 关于您的错误,您可以分别查看这两个聚合
In [6]: df.resample('1D', on='date')['name'].value_counts()
Out[6]:
date name
2020-01-01 bob 3
alice 2
charlene 2
edna 1
2020-01-02 alice 1
charlene 1
Name: name, dtype: int64
In [7]: df.resample('1D', on='date')['contribution'].sum()
Out[7]:
date
2020-01-01 176
2020-01-02 15
Freq: D, Name: contribution, dtype: int64
如果您所做的只是在日期列中按频率分组,resample
函数比使用Grouper
更简单。我只在需要按某个内容分组并同时重新采样时使用后者
如您所见(并在评论中提到)
value\u counts
返回一个多索引为date
和name
的序列,而sum
只返回一个索引为date
的序列
agg
大致是一个简短的符号,用于执行多个聚合,并将结果连接到一个数据帧,每个结果有一列。由于索引不匹配,因此无法组合这两个结果。
如果我在pandas 1.1上尝试你的代码,我会得到一个
NotImplementedError: Can only union MultiIndex with MultiIndex or Index of tuples, try mi.to_flat_index().union(other) instead.
基本上说,没有办法将具有不同索引级别数的序列串联起来。
value\u counts
返回一个序列,sum
返回一个数字。我相信熊猫很难将它们对齐。您预期的结果是什么?