Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/symfony/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 获取Pandas.groupby.shift()结果,并将groupbyvars作为cols/index?_Python_Pandas_Pandas Groupby - Fatal编程技术网

Python 获取Pandas.groupby.shift()结果,并将groupbyvars作为cols/index?

Python 获取Pandas.groupby.shift()结果,并将groupbyvars作为cols/index?,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,考虑到这个微不足道的数据集 df = pd.DataFrame({'one': ['a', 'a', 'a', 'b', 'b', 'b'], 'two': ['c', 'c', 'c', 'c', 'd', 'd'], 'three': [1, 2, 3, 4, 5, 6]}) 按one/two分组并应用.max() df.groupby(['one', 'two'])['three']

考虑到这个微不足道的数据集

df = pd.DataFrame({'one':   ['a', 'a', 'a', 'b', 'b', 'b'],
                   'two':   ['c', 'c', 'c', 'c', 'd', 'd'],
                   'three': [1,   2,    3,   4,   5,   6]})
one
/
two
分组并应用
.max()

df.groupby(['one', 'two'])['three'].max()
输出:

one  two
a    c      3
b    c      4
     d      6
Name: three, dtype: int64
df.groupby(['one', 'two'])['three'].shift()
0    NaN
1    1.0
2    2.0
3    NaN
4    NaN
5    5.0
Name: three, dtype: float64 
…在我的例子中,我想按组
shift()
我的记录。但由于某些原因,当我将
.shift()
应用于groupby对象时,我的结果不包括groupby变量:

输出:

one  two
a    c      3
b    c      4
     d      6
Name: three, dtype: int64
df.groupby(['one', 'two'])['three'].shift()
0    NaN
1    1.0
2    2.0
3    NaN
4    NaN
5    5.0
Name: three, dtype: float64 

是否有办法将这些groupby变量保留在结果中,作为列或多索引系列(如
.max()
)保存?谢谢

max
diff
-
max
聚合值(返回聚合
Series
)和
diff
之间存在差异-返回相同大小的
Series

因此,可以将输出追加到新列:

df['shifted'] = df.groupby(['one', 'two'])['three'].shift()
理论上可以使用
agg
,但它在pandas
0.20.3
中返回错误:

df1 = df.groupby(['one', 'two'])['three'].agg(['max', lambda x: x.shift()])
print (df1)
ValueError:函数未减少

一种可能的解决方案是使用
diff
进行
transform
,如果需要
max

g = df.groupby(['one', 'two'])['three']
df['max'] = g.transform('max')
df['shifted'] = g.shift()
print (df)
  one  three two  max  shifted
0   a      1   c    3      NaN
1   a      2   c    3      1.0
2   a      3   c    3      2.0
3   b      4   c    4      NaN
4   b      5   d    6      NaN
5   b      6   d    6      5.0

正如Jez所解释的,shift返回序列保持数据帧的相同长度,如果您像
max()
那样分配它,将得到错误

功能不会减少

使用
max
作为键,并
shift
value对value max行进行切片

df.groupby(['one', 'two'])['three'].apply(lambda x : x.shift()[x==x.max()])
Out[58]: 
one  two   
a    c    2    2.0
b    c    3    NaN
     d    5    5.0
Name: three, dtype: float64