Python pd.namedagh覆盖以前的列值_Python_Pandas_Dataframe_Aggregation

Python pd.namedagh覆盖以前的列值

python pandas dataframe

Python pd.namedagh覆盖以前的列值,python,pandas,dataframe,aggregation,Python,Pandas,Dataframe,Aggregation,这是我使用的数据帧 token name ltp change 0 12345.0 abc 2.0 NaN 1 12345.0 abc 5.0 1.500000 2 12345.0 abc 3.0 -0.400000 3 12345.0 abc 9.0 2.000000 4 12345.0 abc 5.0 -0.444444 5 12345.0 abc 16.0 2.200000 6 6789

这是我使用的数据帧

      token name   ltp    change
0   12345.0  abc   2.0       NaN
1   12345.0  abc   5.0  1.500000
2   12345.0  abc   3.0 -0.400000
3   12345.0  abc   9.0  2.000000
4   12345.0  abc   5.0 -0.444444
5   12345.0  abc  16.0  2.200000
6    6789.0  xyz   1.0       NaN
7    6789.0  xyz   5.0  4.000000
8    6789.0  xyz   3.0 -0.400000
9    6789.0  xyz  13.0  3.333333
10   6789.0  xyz   9.0 -0.307692
11   6789.0  xyz  20.0  1.222222

在试图解决问题时，我遇到了

在特定列上执行此操作时

df.groupby('name')['change'].agg(pos = pd.NamedAgg(column='change',aggfunc=lambda x:x.gt(0).sum()),\
                                 neg = pd.NamedAgg(column='change',aggfunc=lambda x:x.lt(0).sum()))
#Output
      pos  neg
name
abc   2.0  2.0
xyz   2.0  2.0

更奇怪的结果是：

df.groupby('name')['change'].agg(pos = pd.NamedAgg(column='change',aggfunc=lambda x:x.gt(0).sum()),\
                                 neg = pd.NamedAgg(column='change',aggfunc=lambda x:x.sum()),\
                                 max = pd.NamedAgg(column='ltp',aggfunc='max'))

# I'm applying on Series `'change'` but I mentioned `column='ltp'` which should
# raise an `KeyError: "Column 'ltp' does not exist!"` but it produces results as follows

           pos       neg  max
name
abc   4.855556  4.855556  2.2
xyz   7.847863  7.847863  4.0

问题在于与pd.Series一起使用时

s = pd.Series([1,1,2,2,3,3,4,5])
s.groupby(s.values).agg(one = pd.NamedAgg(column='new',aggfunc='sum'))

   one
1    2
2    4
3    6
4    4
5    5

它不应该引发一个

键错误吗
一些更奇怪的结果是，当我们使用不同的列名时，one
列的值并没有被过度写入
s.groupby(s.values).agg(one=pd.NamedAgg(column='anything',aggfunc='sum'),\
                        second=pd.NamedAgg(column='something',aggfunc='max'))

   one  second       
1    2       1     
2    4       2
3    6       3
4    4       4
5    5       5

在pd.namedagh

s.groupby(s.values).agg(one=pd.NamedAgg(column='weird',aggfunc='sum'),\
                        second=pd.NamedAgg(column='weird',aggfunc='max'))

  one  second  # Values of column `one` are over-written
1  1       1
2  2       2
3  3       3
4  4       4
5  5       5


我的熊猫版
pd.__version__
# '1.0.3'

从pandas文档中：
命名聚合也适用于Series groupby聚合。在本例中，没有列选择，因此值只是函数
In [82]: animals.groupby("kind").height.agg(
   ....:     min_height='min',
   ....:     max_height='max',
   ....: )
   ....: 
Out[82]: 
      min_height  max_height
kind                        
cat          9.1         9.5
dog          6.0        34.0

但是找不到为什么将它与列一起使用会产生奇怪的结果
更新：
缺陷报告由in存档，并且
编辑：这是由pandas dev确认的错误，如果在以下中指定的groupby use解决方案后有指定列，则已在PR中解决此问题：
命名聚合也适用于Series groupby聚合。在本例中，没有列选择，因此值只是函数
In [82]: animals.groupby("kind").height.agg(
   ....:     min_height='min',
   ....:     max_height='max',
   ....: )
   ....: 
Out[82]: 
      min_height  max_height
kind                        
cat          9.1         9.5
dog          6.0        34.0

为什么在列中使用它会产生奇怪的结果
我认为这是错误，相反，错误的输出应该引起错误。
谢谢您的回答。但我不明白为什么与pd.namedagh一起使用会导致这种奇怪的行为。@Ch3steR-我认为使用是错误的，正确的是应该产生错误。是的，如果它不打算以这种方式工作，它应该产生错误。在GitHub中报告可能会有帮助。@Ch3steR-yop，给我一些时间。是的，我会等你和其他人给出答案。否则，我们可以在GitHub中共同打开一个问题。
In [82]: animals.groupby("kind").height.agg(
   ....:     min_height='min',
   ....:     max_height='max',
   ....: )
   ....: 
Out[82]: 
      min_height  max_height
kind                        
cat          9.1         9.5
dog          6.0        34.0

df = df.groupby('name')['change'].agg(pos = lambda x:x.gt(0).sum(),\
                                      neg = lambda x:x.lt(0).sum())
print (df)
      pos  neg
name          
abc   3.0  2.0
xyz   3.0  2.0