Python 有领带的大熊猫价值的累积排名

Python 有领带的大熊猫价值的累积排名,python,pandas,dataframe,cumulative-sum,Python,Pandas,Dataframe,Cumulative Sum,我正试图找到一种方法来做一个累计总数,以说明熊猫的关系 让我们从田径比赛中获取假设数据,在那里我有人、比赛、预赛和时间 每个人的位置如下所示: 对于给定的比赛/热火组合: 时间最短的人排在第一位 时间排在第二位的人排在第二位 等等 这将是相当简单的代码,但有一点 如果两个人有相同的时间,他们都会得到相同的位置,然后下一个大于他们时间的时间将具有该值+1作为位置 在下表中,对于100码短跑,赛程1、RUNNER1排在第一位,RUNNER2/RUNNER3排在第二位,RUNNER3排在第三位(下

我正试图找到一种方法来做一个累计总数,以说明熊猫的关系

让我们从田径比赛中获取假设数据,在那里我有人、比赛、预赛和时间

每个人的位置如下所示:

对于给定的比赛/热火组合:

  • 时间最短的人排在第一位
  • 时间排在第二位的人排在第二位
等等

这将是相当简单的代码,但有一点

如果两个人有相同的时间,他们都会得到相同的位置,然后下一个大于他们时间的时间将具有该值+1作为位置

在下表中,对于100码短跑,赛程1、RUNNER1排在第一位,RUNNER2/RUNNER3排在第二位,RUNNER3排在第三位(下次在RUNNER2/之后)

因此,基本上,逻辑如下:

如果race race.shift()或heat heat.shift(),则place=1

如果race=race.shift()和heat=heat.shift()和time>time.shift,那么place=place.shift()+1

如果race=race.shift()和heat=heat.shift()和time>time.shift,那么place=place.shift()

让我困惑的是如何处理这些关系。否则我可以做类似的事情

df['Place']=np.where(
              (df['race']==df['race'].shift())
              &
              (df['heat']==df['heat'].shift()),
              df['Place'].shift()+1,
              1)
谢谢大家!

样本数据如下:

Person,Race,Heat,Time
RUNNER1,100 Yard Dash,1,9.87
RUNNER2,100 Yard Dash,1,9.92
RUNNER3,100 Yard Dash,1,9.92
RUNNER4,100 Yard Dash,1,9.96
RUNNER5,100 Yard Dash,1,9.97
RUNNER6,100 Yard Dash,1,10.01
RUNNER7,100 Yard Dash,2,9.88
RUNNER8,100 Yard Dash,2,9.93
RUNNER9,100 Yard Dash,2,9.93
RUNNER10,100 Yard Dash,2,10.03
RUNNER11,100 Yard Dash,2,10.26
RUNNER7,200 Yard Dash,1,19.63
RUNNER8,200 Yard Dash,1,19.67
RUNNER9,200 Yard Dash,1,19.72
RUNNER10,200 Yard Dash,1,19.72
RUNNER11,200 Yard Dash,1,19.86
RUNNER12,200 Yard Dash,1,19.92
我最后想要的是

Person,Race,Heat,Time,Place
RUNNER1,100 Yard Dash,1,9.87,1
RUNNER2,100 Yard Dash,1,9.92,2
RUNNER3,100 Yard Dash,1,9.92,2
RUNNER4,100 Yard Dash,1,9.96,3
RUNNER5,100 Yard Dash,1,9.97,4
RUNNER6,100 Yard Dash,1,10.01,5
RUNNER7,100 Yard Dash,2,9.88,1
RUNNER8,100 Yard Dash,2,9.93,2
RUNNER9,100 Yard Dash,2,9.93,2
RUNNER10,100 Yard Dash,2,10.03,3
RUNNER11,100 Yard Dash,2,10.26,4
RUNNER7,200 Yard Dash,1,19.63,1
RUNNER8,200 Yard Dash,1,19.67,2
RUNNER9,200 Yard Dash,1,19.72,3
RUNNER10,200 Yard Dash,1,19.72,3
RUNNER11,200 Yard Dash,1,19.86,4
RUNNER12,200 Yard Dash,1,19.92,4
[edit]现在,再往前走一步。

假设一旦我留下一组唯一的值,下一次出现该设置时,这些值将重置为1

比如说,请注意,先是“热火1号”,然后是“热火2号”,再回到“热火1号”——我不想让排名从之前的“热火1号”继续,而是想让它们重新设置

Person,Race,Heat,Time,Place
RUNNER1,100 Yard Dash,1,9.87,1
RUNNER2,100 Yard Dash,1,9.92,2
RUNNER3,100 Yard Dash,1,9.92,2
RUNNER4,100 Yard Dash,2,9.96,1
RUNNER5,100 Yard Dash,2,9.97,2
RUNNER6,100 Yard Dash,2,10.01,3
RUNNER7,100 Yard Dash,1,9.88,1
RUNNER8,100 Yard Dash,1,9.93,2
RUNNER9,100 Yard Dash,1,9.93,2
您可以使用:

grouped =  df.groupby(['Race','Heat'])
df['Place'] = grouped['Time'].transform(lambda x: pd.factorize(x, sort=True)[0]+1)

屈服

    Heat    Person           Race   Time  Place  Rank
0      1   RUNNER1  100 Yard Dash   9.87    1.0   1.0
1      1   RUNNER2  100 Yard Dash   9.92    2.0   2.0
2      1   RUNNER3  100 Yard Dash   9.92    2.0   2.0
3      1   RUNNER4  100 Yard Dash   9.96    3.0   4.0
4      1   RUNNER5  100 Yard Dash   9.97    4.0   5.0
5      1   RUNNER6  100 Yard Dash  10.01    5.0   6.0
6      2   RUNNER7  100 Yard Dash   9.88    1.0   1.0
7      2   RUNNER8  100 Yard Dash   9.93    2.0   2.0
8      2   RUNNER9  100 Yard Dash   9.93    2.0   2.0
9      2  RUNNER10  100 Yard Dash  10.03    3.0   4.0
10     2  RUNNER11  100 Yard Dash  10.26    4.0   5.0
11     1   RUNNER7  200 Yard Dash  19.63    1.0   1.0
12     1   RUNNER8  200 Yard Dash  19.67    2.0   2.0
13     1   RUNNER9  200 Yard Dash  19.72    3.0   3.0
14     1  RUNNER10  200 Yard Dash  19.72    3.0   3.0
15     1  RUNNER11  200 Yard Dash  19.86    4.0   5.0
16     1  RUNNER12  200 Yard Dash  19.92    5.0   6.0
    Heat    Person           Race   Time  HeatGroup  Place  Rank
0      1   RUNNER1  100 Yard Dash   9.87          1    1.0   1.0
1      1   RUNNER2  100 Yard Dash   9.92          1    2.0   2.0
2      1   RUNNER3  100 Yard Dash   9.92          1    2.0   2.0
3      1   RUNNER4  100 Yard Dash   9.96          1    3.0   4.0
4      1   RUNNER5  100 Yard Dash   9.97          1    4.0   5.0
5      1   RUNNER6  100 Yard Dash  10.01          1    5.0   6.0
6      2   RUNNER7  100 Yard Dash   9.88          2    1.0   1.0
7      2   RUNNER8  100 Yard Dash   9.93          2    2.0   2.0
8      2   RUNNER9  100 Yard Dash   9.93          2    2.0   2.0
9      2  RUNNER10  100 Yard Dash  10.03          2    3.0   4.0
10     2  RUNNER11  100 Yard Dash  10.26          2    4.0   5.0
11     1   RUNNER7  100 Yard Dash  19.63          3    1.0   1.0
12     1   RUNNER8  100 Yard Dash  19.67          3    2.0   2.0
13     1   RUNNER9  100 Yard Dash  19.72          3    3.0   3.0
14     1  RUNNER10  100 Yard Dash  19.72          3    3.0   3.0
15     1  RUNNER11  100 Yard Dash  19.86          3    4.0   5.0
16     1  RUNNER12  100 Yard Dash  19.92          3    5.0   6.0

请注意,Pandas有一个方法可以计算许多常见的秩形式——但不是您描述的那种。请注意,例如在第3行,第二名和第三名选手平局后,
排名是4,而
位置是3


关于编辑:使用

(df['Heat'] != df['Heat'].shift()).cumsum()
要消除热的歧义,请执行以下操作:

import pandas as pd
df = pd.DataFrame({'Heat': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1], 'Person': ['RUNNER1', 'RUNNER2', 'RUNNER3', 'RUNNER4', 'RUNNER5', 'RUNNER6', 'RUNNER7', 'RUNNER8', 'RUNNER9', 'RUNNER10', 'RUNNER11', 'RUNNER7', 'RUNNER8', 'RUNNER9', 'RUNNER10', 'RUNNER11', 'RUNNER12'], 'Race': ['100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash'], 'Time': [9.8699999999999992, 9.9199999999999999, 9.9199999999999999, 9.9600000000000009, 9.9700000000000006, 10.01, 9.8800000000000008, 9.9299999999999997, 9.9299999999999997, 10.029999999999999, 10.26, 19.629999999999999, 19.670000000000002, 19.719999999999999, 19.719999999999999, 19.859999999999999, 19.920000000000002]})

df['HeatGroup'] = (df['Heat'] != df['Heat'].shift()).cumsum()
grouped =  df.groupby(['Race','HeatGroup'])
df['Place'] = grouped['Time'].transform(lambda x: pd.factorize(x, sort=True)[0]+1)
df['Rank'] = grouped['Time'].rank(method='min')
print(df)
屈服

    Heat    Person           Race   Time  Place  Rank
0      1   RUNNER1  100 Yard Dash   9.87    1.0   1.0
1      1   RUNNER2  100 Yard Dash   9.92    2.0   2.0
2      1   RUNNER3  100 Yard Dash   9.92    2.0   2.0
3      1   RUNNER4  100 Yard Dash   9.96    3.0   4.0
4      1   RUNNER5  100 Yard Dash   9.97    4.0   5.0
5      1   RUNNER6  100 Yard Dash  10.01    5.0   6.0
6      2   RUNNER7  100 Yard Dash   9.88    1.0   1.0
7      2   RUNNER8  100 Yard Dash   9.93    2.0   2.0
8      2   RUNNER9  100 Yard Dash   9.93    2.0   2.0
9      2  RUNNER10  100 Yard Dash  10.03    3.0   4.0
10     2  RUNNER11  100 Yard Dash  10.26    4.0   5.0
11     1   RUNNER7  200 Yard Dash  19.63    1.0   1.0
12     1   RUNNER8  200 Yard Dash  19.67    2.0   2.0
13     1   RUNNER9  200 Yard Dash  19.72    3.0   3.0
14     1  RUNNER10  200 Yard Dash  19.72    3.0   3.0
15     1  RUNNER11  200 Yard Dash  19.86    4.0   5.0
16     1  RUNNER12  200 Yard Dash  19.92    5.0   6.0
    Heat    Person           Race   Time  HeatGroup  Place  Rank
0      1   RUNNER1  100 Yard Dash   9.87          1    1.0   1.0
1      1   RUNNER2  100 Yard Dash   9.92          1    2.0   2.0
2      1   RUNNER3  100 Yard Dash   9.92          1    2.0   2.0
3      1   RUNNER4  100 Yard Dash   9.96          1    3.0   4.0
4      1   RUNNER5  100 Yard Dash   9.97          1    4.0   5.0
5      1   RUNNER6  100 Yard Dash  10.01          1    5.0   6.0
6      2   RUNNER7  100 Yard Dash   9.88          2    1.0   1.0
7      2   RUNNER8  100 Yard Dash   9.93          2    2.0   2.0
8      2   RUNNER9  100 Yard Dash   9.93          2    2.0   2.0
9      2  RUNNER10  100 Yard Dash  10.03          2    3.0   4.0
10     2  RUNNER11  100 Yard Dash  10.26          2    4.0   5.0
11     1   RUNNER7  100 Yard Dash  19.63          3    1.0   1.0
12     1   RUNNER8  100 Yard Dash  19.67          3    2.0   2.0
13     1   RUNNER9  100 Yard Dash  19.72          3    3.0   3.0
14     1  RUNNER10  100 Yard Dash  19.72          3    3.0   3.0
15     1  RUNNER11  100 Yard Dash  19.86          3    4.0   5.0
16     1  RUNNER12  100 Yard Dash  19.92          3    5.0   6.0

只需将最后一个跑步者的时间存储在内存中(就像一个缓冲区),并对照他检查当前跑步者的时间(就像使用某种精度e比较任意两个浮动)。如果差值低于e,不要增加位置。我该如何做?谢谢你的回答。我想控制比较所使用的准确性需要更明确的方法,比如自己通过math.ceil()或其他方法修改数字。@Ev.Kounis:是的,你可以使用类似
df['Time']=df['Time']的方法。round(2)
在使用
groupby/transform
groupby/rank
之前,将所有时间四舍五入到2位小数。谢谢!我在结尾加了一点转折。。解释我为什么需要这样做有点让人困惑,但我有什么想法可以实现这个目标吗?非常优雅的解决方案!