Python Pandas，从dataframe的子列中获得最大和第二名_Python_Pandas

Python Pandas，从dataframe的子列中获得最大和第二名

python pandas

Python Pandas，从dataframe的子列中获得最大和第二名,python,pandas,Python,Pandas,我有以下数据帧： usersidid clienthostid LoginDaysSum 0 12 1 240 1 11 1 60 3 5 1 5 4 6 3 2702 2 10

我有以下数据帧：

    usersidid   clienthostid    LoginDaysSum    
0       12            1             240     
1       11            1             60  
3       5             1             5       
4       6             3             2702    
2       10            3             423     
5       8             3             18

每个clienthostid都有usersidid和LoginDaysSum。 df已排序

df.sort_values(['clienthostid', 'LoginDaysSum'], ascending=[True, False], inplace=True)

现在，我需要的是为每个clienthostid获取他的max LoginDaysSum，aka，first_place和second_place并计算（first_place/second_place）

例如-usersidid=1：

first_place = 240
second_place = 60
(first_place/second_place) = 4

我该怎么做？我尝试了几种方法，但找不到任何东西可以从同一列访问不同的成员，例如：

df['clienthostid'].apply(x: x.max() / x.one_index_lower_from_max())

如果您有任何建议

谢谢，

我认为您可以使用and对每个由or选择的第一个和第二个值进行除法：

另一种选择是使用计算每组前2个最大值。通过将第二个最大值元件移动一个位置到顶部，使其与第一个最大值对齐，从而将元件按方向分开

这是通过跨

level=1

广播它们，然后从跨

level=0

分组的每组中选取第一项来完成的

grp = df.groupby('clienthostid').LoginDaysSum
grp.nlargest(2).div(grp.shift(-1), level=1).groupby(level=0).first()

clienthostid
1    4.000000
3    6.387707
Name: LoginDaysSum, dtype: float64

另一个等效变体：

grp = df.groupby('clienthostid').LoginDaysSum.nlargest(2)
grp.div(grp.shift(-1)).groupby(level=0).nth(0)

clienthostid
1    4.000000
3    6.387707
Name: LoginDaysSum, dtype: float64

由于LoginDaysSum事先已按降序排序，因此在这里调用

nlargest

似乎是一个相当冗余的操作。或者，

.head（2）

实际上就足够了，也会产生更快的结果

然后，我们将偶数行索引位置中的每个值除以它们的下一个奇数索引位置值

grp = df.groupby('clienthostid').LoginDaysSum.head(2)
pd.Series(grp.iloc[::2].values/(grp.iloc[1::2].values), df.clienthostid.unique())

1    4.000000
3    6.387707
dtype: float64

grp = df.groupby('clienthostid').LoginDaysSum.nlargest(2)
grp.div(grp.shift(-1)).groupby(level=0).nth(0)

clienthostid
1    4.000000
3    6.387707
Name: LoginDaysSum, dtype: float64

grp = df.groupby('clienthostid').LoginDaysSum.head(2)
pd.Series(grp.iloc[::2].values/(grp.iloc[1::2].values), df.clienthostid.unique())

1    4.000000
3    6.387707
dtype: float64