Python 如何从不同的两列中找到累积和?

Python 如何从不同的两列中找到累积和?,python,pandas,Python,Pandas,我有一个数据框,如下所示 import pandas as pd t1_ids = [4991, 6899, 6665, 4991, 7010, 6899] t2_ids = [6899, 6908, 4869, 6899, 6899, 4991] values = [1, 1, 1, 1, 1, 0] data = { 'team1_id': t1_ids, 'team2_id': t2_ids, 'is_1st_team_won': values } df =

我有一个数据框,如下所示

import pandas as pd

t1_ids = [4991, 6899, 6665, 4991, 7010, 6899]
t2_ids = [6899, 6908, 4869, 6899, 6899, 4991]
values = [1, 1, 1, 1, 1, 0]

data = {
    'team1_id': t1_ids,
    'team2_id': t2_ids,
    'is_1st_team_won': values
}

df = pd.DataFrame(data)
print(df)


   team1_id  team2_id  is_1st_team_won
0      4991      6899                1
1      6899      6908                1
2      6665      4869                1
3      4991      6899                1
4      7010      6899                1
5      6899      4991                0
我不能做的问题是在最后一场比赛之前找出每个队的获胜率。因此,据我所知,这个问题可以通过
cumsum()
shift()
函数来解决,但我找不到确切的解决方案。这是预期产出

   team1_id  team2_id  is_1st_team_won  t1_winning_ratio  t2_winning_ratio
0      4991      6899                1              0.00              0.00
1      6899      6908                1              0.00              0.00
2      6665      4869                1              0.00              0.00
3      4991      6899                1              1.00              0.50
4      7010      6899                1              0.00              0.33
5      6899      4991                0              0.25              1.00
如果您跟随ID为6899的球队,该队将输掉首场比赛。(第1行)。他们赢了第二场比赛(第二线)。因此,当他们进行第三场比赛时,获胜率为0.5(第四线)。他们也输掉了第三场比赛,所以当他们进入下一场比赛时,获胜率是0.33(第五线)。最后,当他们输掉第四场比赛时,获胜率是1/4=0.25


我该怎么做?提前感谢。

这里有一种方法

my_dict = {}
t1_winning_ratio_list = []
t2_winning_ratio_list = []

for pair in df[['team1_id','team2_id','is_1st_team_won']].values:
    try:
        t1_winning_ratio_list.append(my_dict[pair[0]]['won']/my_dict[pair[0]]['game_count'])
    except:
        t1_winning_ratio_list.append(0)
        
    try:
        t2_winning_ratio_list.append(my_dict[pair[1]]['won']/my_dict[pair[1]]['game_count'])
    except:
        t2_winning_ratio_list.append(0)
    
    if pair[0] in my_dict:
        my_dict[pair[0]]['game_count'] += 1
        if pair[2] == 1:
            my_dict[pair[0]]['won'] += 1
    elif pair[0] not in my_dict:
        my_dict[pair[0]] = {}
        my_dict[pair[0]]['game_count'] = 1
        if pair[2] == 1:
            my_dict[pair[0]]['won'] = 1
        else:
            my_dict[pair[0]]['won'] = 0
            
    if pair[1] in my_dict:
        my_dict[pair[1]]['game_count'] += 1
        if pair[2] == 0:
            my_dict[pair[1]]['won'] += 1
    elif pair[1] not in my_dict:
        my_dict[pair[1]] = {}
        my_dict[pair[1]]['game_count'] = 1
        if pair[2] == 0:
            my_dict[pair[1]]['won'] = 1
        else:
            my_dict[pair[1]]['won'] = 0
    
df['t1_winning_ratio'] = t1_winning_ratio_list
df['t2_winning_ratio'] = t2_winning_ratio_list
        
让我们试试:

# get the winners
df['winner'] = np.where(df['is_1st_team_won']==1, df['team1_id'], df['team2_id'])

# get the winning game with `get_dummies` so far
winning_matches = (pd.get_dummies(df['winner'])
                   .shift(fill_value=0)
                   .cumsum()
                   )

# get the game play so far
game_plays = (pd.get_dummies(df['team1_id'])
                .add(pd.get_dummies(df['team2_id']), fill_value=0)
                .cumsum().shift()
             )

# the winning ratio
winning_ratio = winning_matches.div(game_plays).fillna(0)

# now lookup:
# note that lookup is deprecated since pandas 1.2.0 for the reason I can't understand
df['t1_winning_ratio'] = winning_ratio.lookup(df.index, df['team1_id'])
df['t2_winning_ratio'] = winning_ratio.lookup(df.index, df['team2_id'])
输出:

      team1_id    team2_id    is_1st_team_won    winner    t1_winning_ratio    t2_winning_ratio
--  ----------  ----------  -----------------  --------  ------------------  ------------------
 0        4991        6899                  1      4991                0               0
 1        6899        6908                  1      6899                0               0
 2        6665        4869                  1      6665                0               0
 3        4991        6899                  1      4991                1               0.5
 4        7010        6899                  1      7010                0               0.333333
 5        6899        4991                  0      4991                0.25            1

因为我不知道你到底想要达到什么,这里有一个简单的方法来获得一个团队的比率:

import pandas as pd


t1_ids = [4991, 6899, 6665, 4991, 7010, 6899]
t2_ids = [6899, 6908, 4869, 6899, 6899, 4991]
values = [1, 1, 1, 1, 1, 0]

data = {
    'team1_id': t1_ids,
    'team2_id': t2_ids,
    'is_1st_team_won': values
}

df = pd.DataFrame(data)
df['is_2nd_team_won'] = ~df.is_1st_team_won+2


def get_ratio(_id):
    _w=0
    _m =0
    for i in range(len(df)):
        if df.team1_id[i] == _id :
            _m+=1
            if df.is_1st_team_won[i] == 1:
                _w+=1
        elif df.team2_id[i] == _id :
            _m += 1
            if df.is_2nd_team_won[i] == 1:
                _w+=1
        if _m > 0:
            df.loc[i,'team_'+str(_id)+'_winning_ratio'] = _w/_m
        else:
            df.loc[i,'team_'+str(_id)+'_winning_ratio'] = 0
    return(df)
ID = (df.team1_id.append(df.team2_id)).unique()
for _id in ID:
    df = get_ratio(_id)
df
输出


谢谢你的回答,但这不是我期望的结果。事实上,我想要两个单独的列,分别包含team1赢率和team2赢率。当你说team1对team2时,你的意思是你不关心哪个团队,即如果是6899或7010,那没关系,因为它是team1,或者你想要每个团队都有一个列(一个代表6899,一个代表7010…)不,这不是头对头的问题。我只是想知道胜率。例如,一个队打了4场比赛,赢了一场比赛。所以,这支球队的获胜率是1/4=0.25。所以我不明白我不明白的是什么。例如,6899队的获胜率应该是0,0.5,0.5,0.33,0.25,最后是0.2,不是吗?