Python 熊猫：groupby对象是否存储索引？_Python_Pandas

Python 熊猫：groupby对象是否存储索引？

python pandas

Python 熊猫：groupby对象是否存储索引？,python,pandas,Python,Pandas,据我所知，groupby需要计算分组变量的索引。但是，我不能完全确定它是否存储在groupby对象中我的代码看起来像 df.groupby(["col1","col2"]).agg( something ) ( ... some code ... ) df.groupby(["col1","col2"]).agg( something else ) 我是否正确地理解了以下内容可以避免索引被构建两次 my_group = groupby(["col1","col2"]) my_group.ag

据我所知，

groupby

需要计算分组变量的索引。但是，我不能完全确定它是否存储在groupby对象中

我的代码看起来像

df.groupby(["col1","col2"]).agg( something )
( ... some code ... )
df.groupby(["col1","col2"]).agg( something else )

我是否正确地理解了以下内容可以避免索引被构建两次

my_group = groupby(["col1","col2"])
my_group.agg( something )
( ... some code ... )
my_group.agg( something else )

这对我很重要，因为我写的东西必须经过两次组，如果没有存储索引，我可能必须实现我自己的

groupby

是的，groupby为计算聚合计算索引，如果我们可以将其存储在groupby对象中，它将再次存储正在构建的索引

df3 = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
                         "bar", "bar", "bar", "bar"],
                    "B": ["one", "one", "one", "two", "two",
                          "one", "one", "two", "two"],
                    "C": ["small", "large", "large", "small",
                          "small", "large", "small", "small",
                         "large"],
                    "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
                    "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
df4 = df3.sort_values(['A','B'])
res1 = df3.groupby(['A', 'B'])['D'].mean()
res2 = df4.groupby(['A', 'B'])['D'].median()

print res1.index
MultiIndex(levels=[[u'bar', u'foo'], [u'one', u'two']],
           labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
           names=[u'A', u'B'])

print res2.index
MultiIndex(levels=[[u'bar', u'foo'], [u'one', u'two']],
           labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
           names=[u'A', u'B'])

你一定能做到

my_group = df3.groupby(['A', 'B']) 
print type(my_group)
pandas.core.groupby.groupby.DataFrameGroupBy

然后可以在创建的同一groupby对象上执行不同的聚合，确保不再计算索引

让我知道这是否有帮助

您的示例没有提供证据表明索引是保存的，而不是重新计算的。我的问题不是“指数是计算出来的吗”。你能提供一点背景吗？是否担心组建团队需要很多时间，所以你只想做一次？还是需要在第二次聚合中使用第一次聚合的结果？