Python 计算列上组的中位数_Python_Pandas

Python 计算列上组的中位数

python pandas

Python 计算列上组的中位数,python,pandas,Python,Pandas,我试图计算列上的组的中值。我发现了一个非常明显的例子这个问题和答案正是我需要的答案。我创建了一个完全相同的示例，并通过我自己的详细信息进行了处理 import pandas import numpy data_3 = [2,3,4,5,4,2] data_4 = [0,1,2,3,4,2] df = pandas.DataFrame({'COL1': ['A','A','A','A','B','B'], 'COL2': ['AA','AA'

我试图计算列上的组的中值。我发现了一个非常明显的例子

这个问题和答案正是我需要的答案。我创建了一个完全相同的示例，并通过我自己的详细信息进行了处理

import pandas
import numpy

data_3 = [2,3,4,5,4,2]
data_4 = [0,1,2,3,4,2]

df = pandas.DataFrame({'COL1': ['A','A','A','A','B','B'], 
                       'COL2': ['AA','AA','BB','BB','BB','BB'],
                       'COL3': data_3,
                       'COL4': data_4})

m = df.groupby(['COL1', 'COL2'])[['COL3','COL4']].apply(numpy.median)

当我试图计算列上的组的中值时，我遇到了错误

TypeError: Series.name must be a hashable type

如果我使用完全相同的代码，用不同的统计数据（平均值、最小值、最大值、标准值）替换中间值，那么一切正常

我不明白这个错误的原因，以及为什么它只出现在中位数上，这是我真正需要计算的

提前感谢您的帮助

鲍勃

下面是完整的错误消息。我正在使用python 3.5.2

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-af0ef7da3347> in <module>()
----> 1 m = df.groupby(['COL1', 'COL2'])[['COL3','COL4']].apply(numpy.median)

/Applications/anaconda3/lib/python3.5/site-packages/pandas/core/groupby.py in apply(self, func, *args, **kwargs)
    649         # ignore SettingWithCopy here in case the user mutates
    650         with option_context('mode.chained_assignment', None):
--> 651             return self._python_apply_general(f)
    652 
    653     def _python_apply_general(self, f):

/Applications/anaconda3/lib/python3.5/site-packages/pandas/core/groupby.py in _python_apply_general(self, f)
    658             keys,
    659             values,
--> 660             not_indexed_same=mutated or self.mutated)
    661 
    662     def _iterate_slices(self):

/Applications/anaconda3/lib/python3.5/site-packages/pandas/core/groupby.py in _wrap_applied_output(self, keys, values, not_indexed_same)
   3373                 coerce = True if any([isinstance(x, Timestamp)
   3374                                       for x in values]) else False
-> 3375                 return (Series(values, index=key_index, name=self.name)
   3376                         ._convert(datetime=True,
   3377                                   coerce=coerce))

    /Applications/anaconda3/lib/python3.5/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
        231         generic.NDFrame.__init__(self, data, fastpath=True)
        232 
    --> 233         self.name = name
        234         self._set_axis(0, index, fastpath=True)
        235 

    /Applications/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py in __setattr__(self, name, value)

   2692             object.__setattr__(self, name, value)
   2693         elif name in self._metadata:
-> 2694             object.__setattr__(self, name, value)
   2695         else:
   2696             try:

/Applications/anaconda3/lib/python3.5/site-packages/pandas/core/series.py in name(self, value)
    307     def name(self, value):
    308         if value is not None and not com.is_hashable(value):
--> 309             raise TypeError('Series.name must be a hashable type')
    310         object.__setattr__(self, '_name', value)
    311 

TypeError: Series.name must be a hashable type

---------------------------------------------------------------------------
TypeError回溯（最近一次调用上次）
在（）
---->1 m=df.groupby（['COL1'，'COL2']）[['COL3'，'COL4']]。应用（numpy.median）
/apply中的Applications/anaconda3/lib/python3.5/site-packages/pandas/core/groupby.py（self、func、*args、**kwargs）
649#在此处使用copy忽略设置，以防用户发生变异
650带有选项上下文（'模式链接分配'，无）：
-->651返回自我。_python_apply_general（f）
652
653 def_python_apply_general（self，f）：
/Applications/anaconda3/lib/python3.5/site-packages/pandas/core/groupby.py in_python_apply_general（self，f）
658把钥匙，
659个价值观，
-->660未索引（相同=突变或自突变）
661
662定义迭代切片（自）：
/Applications/anaconda3/lib/python3.5/site-packages/pandas/core/groupby.py in\u wrap\u applicated\u输出（self、key、value、not\u index\u same）
3373强制=如果有，则为真（[isinstance（x，时间戳）
3374（对于值中的x），否则为假
->3375返回（系列（值，索引=键索引，名称=self.name）
3376.\u转换（日期时间=真，
3377胁迫=胁迫）
/Applications/anaconda3/lib/python3.5/site-packages/pandas/core/series.py in_u_____________（self、数据、索引、数据类型、名称、副本、快速路径）
231 generic.NDFrame.\uuuuu init\uuuuu（self，data，fastpath=True）
232
-->233 self.name=名称
234自整定轴（0，索引，快速路径=真）
235
/Applications/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py in_____setattr__（self、name、value）
2692对象。设置属性（自身、名称、值）
2693自我元数据中的elif名称：
->2694对象。设置属性（自身、名称、值）
2695其他：
2696请尝试：
/名称中的Applications/anaconda3/lib/python3.5/site-packages/pandas/core/series.py（self，value）
307 def名称（自身、值）：
308如果值不是None且不是com.可散列（值）：
-->309 raise TypeError（'Series.name必须是可哈希类型'）
310对象。\uuuuSetAttr\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
311
TypeError:Series.name必须是可哈希类型

不知何故，这个阶段的序列名被解释为不可散列的，尽管它被认为是一个元组。我认为它可能与已修复并关闭的错误相同：

基本上，组中的单个标量值（如示例中所示）导致无法传递序列的名称。它固定在

0.19.2

中

在任何情况下，这不应该是一个实际问题，因为您可以（也应该）直接在GroupBy对象上调用
平均值
，
中值
，等等

>>> df.groupby(['COL1', 'COL2'])[['COL3', 'COL4']].median() COL3 COL4 COL1 COL2 A AA 2.5 0.5 BB 4.5 2.5 B BB 3.0 3.0

非常感谢。你的评论非常有帮助，解决了我的问题。@Upboard这怎么会是一个问题<代码>np.中值（[1,2]）（偶数个值）与
np.中值（[1,2,3]）
（奇数）一样有效。