Python 如何从不同的两列中找到累积和?
我有一个数据框,如下所示Python 如何从不同的两列中找到累积和?,python,pandas,Python,Pandas,我有一个数据框,如下所示 import pandas as pd t1_ids = [4991, 6899, 6665, 4991, 7010, 6899] t2_ids = [6899, 6908, 4869, 6899, 6899, 4991] values = [1, 1, 1, 1, 1, 0] data = { 'team1_id': t1_ids, 'team2_id': t2_ids, 'is_1st_team_won': values } df =
import pandas as pd
t1_ids = [4991, 6899, 6665, 4991, 7010, 6899]
t2_ids = [6899, 6908, 4869, 6899, 6899, 4991]
values = [1, 1, 1, 1, 1, 0]
data = {
'team1_id': t1_ids,
'team2_id': t2_ids,
'is_1st_team_won': values
}
df = pd.DataFrame(data)
print(df)
team1_id team2_id is_1st_team_won
0 4991 6899 1
1 6899 6908 1
2 6665 4869 1
3 4991 6899 1
4 7010 6899 1
5 6899 4991 0
我不能做的问题是在最后一场比赛之前找出每个队的获胜率。因此,据我所知,这个问题可以通过cumsum()
和shift()
函数来解决,但我找不到确切的解决方案。这是预期产出
team1_id team2_id is_1st_team_won t1_winning_ratio t2_winning_ratio
0 4991 6899 1 0.00 0.00
1 6899 6908 1 0.00 0.00
2 6665 4869 1 0.00 0.00
3 4991 6899 1 1.00 0.50
4 7010 6899 1 0.00 0.33
5 6899 4991 0 0.25 1.00
如果您跟随ID为6899的球队,该队将输掉首场比赛。(第1行)。他们赢了第二场比赛(第二线)。因此,当他们进行第三场比赛时,获胜率为0.5(第四线)。他们也输掉了第三场比赛,所以当他们进入下一场比赛时,获胜率是0.33(第五线)。最后,当他们输掉第四场比赛时,获胜率是1/4=0.25
我该怎么做?提前感谢。这里有一种方法
my_dict = {}
t1_winning_ratio_list = []
t2_winning_ratio_list = []
for pair in df[['team1_id','team2_id','is_1st_team_won']].values:
try:
t1_winning_ratio_list.append(my_dict[pair[0]]['won']/my_dict[pair[0]]['game_count'])
except:
t1_winning_ratio_list.append(0)
try:
t2_winning_ratio_list.append(my_dict[pair[1]]['won']/my_dict[pair[1]]['game_count'])
except:
t2_winning_ratio_list.append(0)
if pair[0] in my_dict:
my_dict[pair[0]]['game_count'] += 1
if pair[2] == 1:
my_dict[pair[0]]['won'] += 1
elif pair[0] not in my_dict:
my_dict[pair[0]] = {}
my_dict[pair[0]]['game_count'] = 1
if pair[2] == 1:
my_dict[pair[0]]['won'] = 1
else:
my_dict[pair[0]]['won'] = 0
if pair[1] in my_dict:
my_dict[pair[1]]['game_count'] += 1
if pair[2] == 0:
my_dict[pair[1]]['won'] += 1
elif pair[1] not in my_dict:
my_dict[pair[1]] = {}
my_dict[pair[1]]['game_count'] = 1
if pair[2] == 0:
my_dict[pair[1]]['won'] = 1
else:
my_dict[pair[1]]['won'] = 0
df['t1_winning_ratio'] = t1_winning_ratio_list
df['t2_winning_ratio'] = t2_winning_ratio_list
让我们试试:
# get the winners
df['winner'] = np.where(df['is_1st_team_won']==1, df['team1_id'], df['team2_id'])
# get the winning game with `get_dummies` so far
winning_matches = (pd.get_dummies(df['winner'])
.shift(fill_value=0)
.cumsum()
)
# get the game play so far
game_plays = (pd.get_dummies(df['team1_id'])
.add(pd.get_dummies(df['team2_id']), fill_value=0)
.cumsum().shift()
)
# the winning ratio
winning_ratio = winning_matches.div(game_plays).fillna(0)
# now lookup:
# note that lookup is deprecated since pandas 1.2.0 for the reason I can't understand
df['t1_winning_ratio'] = winning_ratio.lookup(df.index, df['team1_id'])
df['t2_winning_ratio'] = winning_ratio.lookup(df.index, df['team2_id'])
输出:
team1_id team2_id is_1st_team_won winner t1_winning_ratio t2_winning_ratio
-- ---------- ---------- ----------------- -------- ------------------ ------------------
0 4991 6899 1 4991 0 0
1 6899 6908 1 6899 0 0
2 6665 4869 1 6665 0 0
3 4991 6899 1 4991 1 0.5
4 7010 6899 1 7010 0 0.333333
5 6899 4991 0 4991 0.25 1
因为我不知道你到底想要达到什么,这里有一个简单的方法来获得一个团队的比率:
import pandas as pd
t1_ids = [4991, 6899, 6665, 4991, 7010, 6899]
t2_ids = [6899, 6908, 4869, 6899, 6899, 4991]
values = [1, 1, 1, 1, 1, 0]
data = {
'team1_id': t1_ids,
'team2_id': t2_ids,
'is_1st_team_won': values
}
df = pd.DataFrame(data)
df['is_2nd_team_won'] = ~df.is_1st_team_won+2
def get_ratio(_id):
_w=0
_m =0
for i in range(len(df)):
if df.team1_id[i] == _id :
_m+=1
if df.is_1st_team_won[i] == 1:
_w+=1
elif df.team2_id[i] == _id :
_m += 1
if df.is_2nd_team_won[i] == 1:
_w+=1
if _m > 0:
df.loc[i,'team_'+str(_id)+'_winning_ratio'] = _w/_m
else:
df.loc[i,'team_'+str(_id)+'_winning_ratio'] = 0
return(df)
ID = (df.team1_id.append(df.team2_id)).unique()
for _id in ID:
df = get_ratio(_id)
df
输出
谢谢你的回答,但这不是我期望的结果。事实上,我想要两个单独的列,分别包含team1赢率和team2赢率。当你说team1对team2时,你的意思是你不关心哪个团队,即如果是6899或7010,那没关系,因为它是team1,或者你想要每个团队都有一个列(一个代表6899,一个代表7010…)不,这不是头对头的问题。我只是想知道胜率。例如,一个队打了4场比赛,赢了一场比赛。所以,这支球队的获胜率是1/4=0.25。所以我不明白我不明白的是什么。例如,6899队的获胜率应该是0,0.5,0.5,0.33,0.25,最后是0.2,不是吗?