Python 获取Pandas.groupby.shift（）结果，并将groupbyvars作为cols/index？_Python_Pandas_Pandas Groupby

Python 获取Pandas.groupby.shift（）结果，并将groupbyvars作为cols/index？

python pandas

Python 获取Pandas.groupby.shift（）结果，并将groupbyvars作为cols/index？,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,考虑到这个微不足道的数据集 df = pd.DataFrame({'one': ['a', 'a', 'a', 'b', 'b', 'b'], 'two': ['c', 'c', 'c', 'c', 'd', 'd'], 'three': [1, 2, 3, 4, 5, 6]}) 按one/two分组并应用.max（） df.groupby(['one', 'two'])['three']

考虑到这个微不足道的数据集

df = pd.DataFrame({'one':   ['a', 'a', 'a', 'b', 'b', 'b'],
                   'two':   ['c', 'c', 'c', 'c', 'd', 'd'],
                   'three': [1,   2,    3,   4,   5,   6]})

按

one

two

分组并应用

.max（）
df.groupby(['one', 'two'])['three'].max()

输出：
one  two
a    c      3
b    c      4
     d      6
Name: three, dtype: int64

df.groupby(['one', 'two'])['three'].shift()
0    NaN
1    1.0
2    2.0
3    NaN
4    NaN
5    5.0
Name: three, dtype: float64 

…在我的例子中，我想按组shift（）
我的记录。但由于某些原因，当我将.shift（）
应用于groupby对象时，我的结果不包括groupby变量：
输出：
one  two
a    c      3
b    c      4
     d      6
Name: three, dtype: int64

df.groupby(['one', 'two'])['three'].shift()
0    NaN
1    1.0
2    2.0
3    NaN
4    NaN
5    5.0
Name: three, dtype: float64 

是否有办法将这些groupby变量保留在结果中，作为列或多索引系列（如.max（）
）保存？谢谢
 在max
和diff
-max
聚合值（返回聚合Series
）和diff
之间存在差异-返回相同大小的Series

因此，可以将输出追加到新列：
df['shifted'] = df.groupby(['one', 'two'])['three'].shift()

理论上可以使用agg
，但它在pandas0.20.3
中返回错误：
df1 = df.groupby(['one', 'two'])['three'].agg(['max', lambda x: x.shift()])
print (df1)

ValueError:函数未减少
一种可能的解决方案是使用diff
进行transform
，如果需要max
：
g = df.groupby(['one', 'two'])['three']
df['max'] = g.transform('max')
df['shifted'] = g.shift()
print (df)
  one  three two  max  shifted
0   a      1   c    3      NaN
1   a      2   c    3      1.0
2   a      3   c    3      2.0
3   b      4   c    4      NaN
4   b      5   d    6      NaN
5   b      6   d    6      5.0

正如Jez所解释的，shift返回序列保持数据帧的相同长度，如果您像max（）
那样分配它，将得到错误
功能不会减少
使用max
作为键，并shift
value对value max行进行切片
df.groupby(['one', 'two'])['three'].apply(lambda x : x.shift()[x==x.max()])
Out[58]: 
one  two   
a    c    2    2.0
b    c    3    NaN
     d    5    5.0
Name: three, dtype: float64