Python SpecificationError的解决方案：agg（）与groupby（）一起使用时不支持嵌套重命名器_Python_Pandas_Aggregate

Python SpecificationError的解决方案：agg（）与groupby（）一起使用时不支持嵌套重命名器

python pandas

Python SpecificationError的解决方案：agg（）与groupby（）一起使用时不支持嵌套重命名器,python,pandas,aggregate,Python,Pandas,Aggregate,错误： def stack_plot(data, xtick, col2='project_is_approved', col3='total'): ind = np.arange(data.shape[0]) plt.figure(figsize=(20,5)) p1 = plt.bar(ind, data[col3].values) p2 = plt.bar(ind, data[col2].values) plt.ylabel('Projects'

错误：

def stack_plot(data, xtick, col2='project_is_approved', col3='total'):
    ind = np.arange(data.shape[0])

    plt.figure(figsize=(20,5))
    p1 = plt.bar(ind, data[col3].values)
    p2 = plt.bar(ind, data[col2].values)

    plt.ylabel('Projects')
    plt.title('Number of projects aproved vs rejected')
    plt.xticks(ind, list(data[xtick].values))
    plt.legend((p1[0], p2[0]), ('total', 'accepted'))
    plt.show()

def univariate_barplots(data, col1, col2='project_is_approved', top=False):
    # Count number of zeros in dataframe python: https://stackoverflow.com/a/51540521/4084039
    temp = pd.DataFrame(project_data.groupby(col1)[col2].agg(lambda x: x.eq(1).sum())).reset_index()

    # Pandas dataframe grouby count: https://stackoverflow.com/a/19385591/4084039
    temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total']

    temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'Avg':'mean'})).reset_index()['Avg']

    temp.sort_values(by=['total'],inplace=True, ascending=False)

    if top:
        temp = temp[0:top]

    stack_plot(temp, xtick=col1, col2=col2, col3='total')
    print(temp.head(5))
    print("="*50)
    print(temp.tail(5))

univariate_barplots(project_data, 'school_state', 'project_is_approved', False)

specification错误回溯（最近一次调用）
在（）
---->1个单变量条形图（项目数据，'学校州'，'项目批准'，假）
在单变量_条形图中（数据，col1，col2，顶部）
4.
5#熊猫数据帧计数：https://stackoverflow.com/a/19385591/4084039
---->6 temp['total']=pd.DataFrame（project_data.groupby（col1）[col2].agg（{'total'：'count'}））.reset_index（）['total']
7打印（临时[总计]打印头（2））
8 temp['Avg']=pd.DataFrame（project_data.groupby（col1）[col2].agg（{'Avg'：'mean'}））.reset_index（）['Avg']
~\AppData\Roaming\Python\Python36\site packages\pandas\core\groupby\generic.py聚合（self、func、*args、**kwargs）
251#但不是类列表/元组本身。
252 func=\u maggle\u lambdas（func）
-->253 ret=自聚集函数（func）
254如果重新标记：
255 ret.columns=列
~\AppData\Roaming\Python\Python36\site packages\pandas\core\groupby\generic.py in\u aggregate\u multiple\u funcs（self，arg）
292#GH 15931
293如果存在（自选择对象，系列）：
-->294 raise SpecificationError（“不支持嵌套重命名程序”）
295
296列=列表（arg.keys（））
规范错误：**不支持嵌套重命名程序**

如果更改，是否会出现相同的错误

SpecificationError                        Traceback (most recent call last)
<ipython-input-21-2cace8f16608> in <module>()
----> 1 univariate_barplots(project_data, 'school_state', 'project_is_approved', False)

<ipython-input-20-856fcc83737b> in univariate_barplots(data, col1, col2, top)
      4 
      5     # Pandas dataframe grouby count: https://stackoverflow.com/a/19385591/4084039
----> 6     temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total']
      7     print (temp['total'].head(2))
      8     temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'Avg':'mean'})).reset_index()['Avg']

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\groupby\generic.py in aggregate(self, func, *args, **kwargs)
    251             # but not the class list / tuple itself.
    252             func = _maybe_mangle_lambdas(func)
--> 253             ret = self._aggregate_multiple_funcs(func)
    254             if relabeling:
    255                 ret.columns = columns

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\groupby\generic.py in _aggregate_multiple_funcs(self, arg)
    292             # GH 15931
    293             if isinstance(self._selected_obj, Series):
--> 294                 raise SpecificationError("nested renamer is not supported")
    295 
    296             columns = list(arg.keys())

SpecificationError: **nested renamer is not supported**

到

改变

到

原因：在新版本中，建议使用名为聚合的方法来替代不推荐使用的“dict of dict”方法来命名特定于列的聚合的输出

来源：

如果数据帧中不存在聚合函数dict中指定的列，也会发生此错误：

[190]中的

：group=pd.DataFrame（[[1,2]]，columns=['A'，'B']）。groupby（'A'）
[195]中的group.agg（{'B'：'mean'}）
Out[195]：
B
A.
1  2
[196]中的group.agg（{'B'：'mean'，'nonexistingcolumn'：'mean'}）
...
SpecificationError:不支持嵌套重命名程序

我遇到了与@akshay jindal类似的问题，但我按照@artikay Khanna的建议检查了文档，问题解决了，一些函数已经调整，旧的已经弃用。下面是上次执行时提供的代码警告

temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg(total='count')).reset_index()['total']
temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg(Avg='mean')).reset_index()['Avg']

因此，我建议你试试

/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:1: FutureWarning: using a dict on a Series for aggregation
is deprecated and will be removed in a future version. Use                 named aggregation instead.

    >>> grouper.agg(name_1=func_1, name_2=func_2)

  """Entry point for launching an IPython kernel.

希望这会有所帮助

我尝试了所有的解决方案，结果发现是名称错误。如果您的列名中有一些内置关键字，如“in”、“is”等，则会抛出错误。在我的例子中，我的列名是“多边形中的点”，我通过将该列重命名为“点”解决了这个问题。

@Rishi的解决方案对我有效。我的数据框中该列的原始名称是

净值\u预算费率

，它本质上是销售的美元价值。我把它改成了

dollars

，它成功了。

不用

.agg（{'total'：'count'}））

，你可以像

.agg（[（'total'，'count'）]）这样用一个元组列表来传递名称，也可以将其用于Avg
。希望它能起作用。
这不是一个非常优雅的解决方案，但这个解决方案很有效。因为重命名列的方式不推荐使用。但是有很多工作要做。创建一个临时变量“approved”，并在其中存储col2。因为应用agg函数时，原始列值将随列名而更改。您可以保留列名，但这些列中的值将更改。因此，为了保留原始数据帧并拥有两个具有所需名称的新列，可以使用以下代码
grouper.agg(name_1=func_1, name_2=func_2)

附言：似乎是AAIC的一项任务，我也在做同样的工作：）
有时候，保存一份汇总表是很方便的，它说明了如何在聚合下对每一列进行转换，该汇总表将使用不同的列集和不同的按列分组。通过使用**解包dict，您可以很容易地使用新语法实现这一点。下面是一个简单数据的最小工作示例
approved = temp[col2]
temp = pd.DataFrame(project_data.groupby(col1)[col2].agg([('Avg','mean'),('total','count')]).reset_index())
temp[col2] = approved

也许当你想得到第一个“A”
，最后一个“B”
，平均值“C”
，有时你的管道中有一个“D”
（但这次不是），你也想要平均值
dfx=pd.DataFrame(columns=["A","B","C"],data=np.random.randint(0,5,size=(10,3)))
#dfx
#
#   A  B  C
#0  4  4  1
#1  2  4  4
#2  1  3  3
#3  2  4  3
#4  1  2  1
#5  0  4  2
#6  2  3  4
#7  1  0  2
#8  2  1  4
#9  3  0  3

您可以像以前一样构建一个简单的dict
，然后使用**过滤相关键将其解包：
aggdict = {"A":lambda x: x.iloc[0], "B": lambda x: x.iloc[-1], "C" : "mean" , "D":lambda x: "mean"}

然后，您可以使用相同的语法按照自己的意愿进行切割：
gb_col="C"
gbc = dfx.groupby(gb_col).agg(**{k:(k,v) for k,v in aggdict.items() if k in dfx.columns and k != gb_col})
#       A  B
#C      
#1  4  2
#2  0  0
#3  1  4
#4  2  3

我找到了方法：而不是像
mygb = lambda gb_col: dfx.groupby(gb_col).agg(**{k:(k,v) for k,v in aggdict.items() if k in dfx.columns and k != gb_col})
allgb = [mygb(c) for c in dfx.columns]

按以下步骤进行：
g2 = df.groupby(["Description","CustomerID"],as_index=False).agg({'Quantity':{"maxQ":np.max,"minQ":np.min,"meanQ":np.mean}})
g2.columns = ["Description","CustomerID","maxQ","minQ",'meanQ']

我也犯了同样的错误，我就是这样解决的 StackOverflow上不鼓励只使用代码的答案，同样也不鼓励只使用代码的帖子。请解释此过程的某些内容。如果尝试聚合并且数据中不存在一个或多个列，也可能会出现此错误frame@ConfusionMatrix但愿我早一点看到这一点——这是一个非常有用的指针，带有不太直观的错误消息。非常感谢。这个答案指向了错误的实际来源。另一个表示有另一种指定方法的答案可能是真的，但没有找到根本原因。此解决方案的好处是将结果列正确命名。如何在聚合中放置多个函数？例如，添加最小值和最大值太棒了，你救了我一天，谢谢！
gb_col="C"
gbc = dfx.groupby(gb_col).agg(**{k:(k,v) for k,v in aggdict.items() if k in dfx.columns and k != gb_col})
#       A  B
#C      
#1  4  2
#2  0  0
#3  1  4
#4  2  3

mygb = lambda gb_col: dfx.groupby(gb_col).agg(**{k:(k,v) for k,v in aggdict.items() if k in dfx.columns and k != gb_col})
allgb = [mygb(c) for c in dfx.columns]

g2 = df.groupby(["Description","CustomerID"],as_index=False).agg({'Quantity':{"maxQ":np.max,"minQ":np.min,"meanQ":np.mean}})
g2.columns = ["Description","CustomerID","maxQ","minQ",'meanQ']

g2 = df.groupby(["Description","CustomerID"],as_index=False).agg({'Quantity':{np.max,np.min,np.mean}})
g2.columns = ["Description","CustomerID","maxQ","minQ",'meanQ']