Python 2.7 Python的groupby对象应用方法添加索引_Python 2.7_Pandas_Group By

Python 2.7 Python的groupby对象应用方法添加索引

python-2.7 pandas

Python 2.7 Python的groupby对象应用方法添加索引,python-2.7,pandas,group-by,Python 2.7,Pandas,Group By,我有一个问题，是在读了这篇文章之后的延伸我得到了答案，并自己尝试了一些实验，例如： import pandas as pd from cStringIO import StringIO s = '''c1 c2 c3 1 2 3 4 5 6''' df = pd.read_csv(StringIO(s), sep=' ') print df def f2(df): print df.iloc[:] print "--------" return df.iloc[:] d

我有一个问题，是在读了这篇文章之后的延伸

我得到了答案，并自己尝试了一些实验，例如：

import pandas as pd
from cStringIO import StringIO
s = '''c1 c2 c3
1 2 3
4 5 6'''
df = pd.read_csv(StringIO(s), sep=' ')
print df
def f2(df):
    print df.iloc[:]
    print "--------"
    return df.iloc[:]
df2 = df.groupby(['c1']).apply(f2)
print "======"
print df2

如预期所示：

   c1  c2  c3
0   1   2   3
1   4   5   6
   c1  c2  c3
0   1   2   3
--------
   c1  c2  c3
0   1   2   3
--------
   c1  c2  c3
1   4   5   6
--------
======
   c1  c2  c3
0   1   2   3
1   4   5   6

但是，当我尝试仅返回df.iloc[0]时：

def f3(df):
    print df.iloc[0:]
    print "--------"
    return df.iloc[0:]
df3 = df.groupby(['c1']).apply(f3)
print "======"
print df3

，我得到一个附加索引：

   c1  c2  c3
0   1   2   3
--------
   c1  c2  c3
0   1   2   3
--------
   c1  c2  c3
1   4   5   6
--------
======
      c1  c2  c3
c1              
1  0   1   2   3
4  1   4   5   6

我做了一些搜索，并怀疑这可能意味着采用了不同的代码路径？

不同之处在于

iloc[：]

返回对象本身，而

iloc[0:][/code>返回对象的视图。看看这个：
>>> df.iloc[:] is df
True

>>> df.iloc[0:] is df
False

不同之处在于，在groupby中，每个组都有一个反映分组的name
属性。当函数返回具有此名称
属性的对象时，不会向结果中添加索引，而如果返回没有此名称
属性的对象，则会添加索引以跟踪每个对象来自哪个组
有趣的是，您可以通过在返回之前显式设置组的name
属性来强制iloc[：]
的iloc[0:]
行为：
def f(x):
    out = x.iloc[0:]
    out.name = x.name
    return out

df.groupby('c1').apply(f)
#    c1  c2  c3
# 0   1   2   3
# 1   4   5   6

我的猜测是，带有命名输出的无索引行为基本上是一个特例，旨在使df.groupby（col.apply）（lambda x:x）
成为无操作。
似乎完全正确（还尝试了=x.iloc[0:1]；out.name=x.name，并获得了额外的索引）。另外，Scikit Learn上的酷视频，you rock:）也尝试了=x.iloc[0:1]；out.name=x.name，并获得了额外的索引，但前提是当存在重复的c1值时，返回的结果会不同。