Python 使用Pandas定义自定义GroupBy聚合函数会导致AttributeError_Python_Pandas

Python 使用Pandas定义自定义GroupBy聚合函数会导致AttributeError

python pandas

Python 使用Pandas定义自定义GroupBy聚合函数会导致AttributeError,python,pandas,Python,Pandas,我有一个熊猫数据框- data = {'year':[1990, 1990, 1990, 1990, 1990, 1990, 1990, 1990, 1990], 'zip':['22204', '22204', '22204', '20194', '20194', '20194', '24060', '24060', '24060'],

我有一个熊猫数据框-

data = {'year':[1990, 1990, 1990, 
                1990, 1990, 1990, 
                1990, 1990, 1990], 
        'zip':['22204', '22204', '22204',
               '20194', '20194', '20194', 
               '24060', '24060', '24060'],
        'education':[0, 0, 1,
                     1, 0, 1, 
                     0, 1, 0]}
df = pd.DataFrame(data = data)

我想用groupby函数计算教育变量教育中每个结果的百分比-

df = df.groupby(['zip', 'year'])['education'].value_counts(normalize = True, dropna = False).unstack().fillna(0)

但是，我想调用自定义聚合函数中的代码行。当我运行下面的代码行时，我得到一条错误消息-AttributeError:“Float64Index”对象没有属性“remove\u unused\u levels”

是否可以创建一个自定义聚合函数来计算groupby组中每个结果的百分比？理想情况下，我希望调用其他几个内置和自定义聚合函数。比如说-

df = df.groupby(['zip', 'year']).agg({'education':percent_by_category,
                                      'education':sum, 
                                      'education':another_custom_function, 
                                       another_variable:another_custom_function})

否，如果使用agg函数是聚合函数的标量输出所必需的

如果测试如何工作。值_计数，则存在序列，因此不可能取消堆叠

因此，如果要返回非标量输出，则会引发错误：

def percent_by_category(group):
    return group.value_counts(normalize = True, dropna = False)

df = df.groupby(['zip', 'year']).agg({'education':percent_by_category})
print (df)

ValueError:函数未减少

函数中的问题是，将value_counts用作一个不带self或不引用任何数据的函数。在任何情况下，您都应该尝试返回组。值\u计数。。。另外，您正在使函数接受一个您从未在函数中使用过的参数。您可以添加预期的输出吗？

def percent_by_category(group):
    print (group.value_counts(normalize = True, dropna = False))

df = df.groupby(['zip', 'year']).agg({'education':percent_by_category})
print (df)
1    0.666667
0    0.333333
Name: education, dtype: float64
0    0.666667
1    0.333333
Name: education, dtype: float64
0    0.666667
1    0.333333
Name: education, dtype: float64
           education
zip   year          
20194 1990      None
22204 1990      None
24060 1990      None

def percent_by_category(group):
    return group.value_counts(normalize = True, dropna = False)

df = df.groupby(['zip', 'year']).agg({'education':percent_by_category})
print (df)