Python 如何比较数据帧中的整数值
我正在编写一个程序,需要读取一个csv/文本文件,其中包含足球分数,如下所示:Python 如何比较数据帧中的整数值,python,python-3.x,pandas,Python,Python 3.x,Pandas,我正在编写一个程序,需要读取一个csv/文本文件,其中包含足球分数,如下所示: Lions 3, Snakes 3 Tarantulas 1, FC Awesome 0 Lions 1, FC Awesome 1 Tarantulas 3, Snakes 1 Lions 4, Grouches 0 1. Tarantulas, 6 pts 2. Lions, 5 pts 3. FC Awesome, 1 pt 3. Snakes, 1 pt 4. Grouches, 0 pts
Lions 3, Snakes 3
Tarantulas 1, FC Awesome 0
Lions 1, FC Awesome 1
Tarantulas 3, Snakes 1
Lions 4, Grouches 0
1. Tarantulas, 6 pts
2. Lions, 5 pts
3. FC Awesome, 1 pt
3. Snakes, 1 pt
4. Grouches, 0 pts
如果两个队平局,每个队得1分,如果一个队获胜,他们得3分
理想情况下,输出应如下所示:
Lions 3, Snakes 3
Tarantulas 1, FC Awesome 0
Lions 1, FC Awesome 1
Tarantulas 3, Snakes 1
Lions 4, Grouches 0
1. Tarantulas, 6 pts
2. Lions, 5 pts
3. FC Awesome, 1 pt
3. Snakes, 1 pt
4. Grouches, 0 pts
这是我目前掌握的代码:
import pandas as pd
data = pd.read_csv("sample_input.csv", header=None, names=['left_team', 'right_team'])
data_dict = data.to_dict(orient='list')
def splitter(row):
left_team, right_team = row.split(',')
return {
'left_team': left_team[:-2].strip(),
'left_score': int(left_team[-2:].strip()),
'right_team': right_team[:-2].strip(),
'right_score': int(right_team[-2:].strip())
}
我的问题是如何从数据框中获取数据框来比较这些值?我也尝试过在没有熊猫的情况下编写解决方案,但我正在努力解决这个问题。任何帮助都将不胜感激!谢谢
这是我尝试过的另一个解决方案:
from collections import defaultdict
import csv
reader = csv.DictReader(open('sample_input.csv', 'r'))
dict_list = []
for line in reader:
dict_list.append(line)
data_list = [splitter(row) for row in reader]
def splitter(row):
left_team, right_team = row.split(',')
return {
'left_team': left_team[:-2].strip(),
'left_score': int(left_team[-2:].strip()),
'right_team': right_team[:-2].strip(),
'right_score': int(right_team[-2:].strip())
}
data_dicts = [splitter(row) for row in reader]
team_scores = defaultdict(int)
for game in data_dicts:
if game['left_score'] == game['right_score']:
team_scores[game['left']] += 1
team_scores[game['right']] += 1
elif game ['left_score'] > game['right_score']:
team_scores[game['left']] += 3
else:
team_scores[game['right']] += 3
teams_sorted = sorted(team_scores.items(), key=lambda team: team[1], reverse=True)
for line in teams_sorted:
print(line)
这里有一个简单的解决方案。第一步清理数据,然后只为每个团队分配分数。最后,将每个团队的所有分数相加,不管它们是显示在左侧还是右侧
import pandas as pd
import numpy as np
# Create DataFrame from your input
df = pd.read_clipboard(sep=', ', header=None)
df.columns=['l_team', 'r_team']
# Clean the data, separating teams and their score.
df[['l_team', 'l_score']] = df.l_team.str.extract('(.*)\s(\d+)')
df[['r_team', 'r_score']] = df.r_team.str.extract('(.*)\s(\d+)')
df[['l_score', 'r_score']] = df[['l_score', 'r_score']].astype('int')
现在,df
看起来像:
l_team r_team l_score r_score
0 Lions Snakes 3 3
1 Tarantulas FC Awesome 1 0
2 Lions FC Awesome 1 1
3 Tarantulas Snakes 3 1
4 Lions Grouches 4 0
确定左侧或右侧的球队得分多少,并按球队相加。我们使用Series.add
使其在索引上对齐,在groupby
后面就是团队名称
df['l_pts'] = np.select([df.l_score > df.r_score, df.l_score == df.r_score], [3, 1], 0)
df['r_pts'] = np.select([df.r_score > df.l_score, df.r_score == df.l_score], [3, 1], 0)
scores df.groupby('l_team').l_pts.sum().add(df.groupby('r_team').r_pts.sum(), fill_value=0).astype('int').sort_values(ascending=False)
输出:分数
Tarantulas 6
Lions 5
Snakes 1
FC Awesome 1
Grouches 0
dtype: int32
要精确匹配输出,您可以执行以下操作:
pd.Series(scores.index+', '+scores.values.astype('str')+' pts', index=np.arange(1,len(scores)+1,1))
#1 Tarantulas, 6 pts
#2 Lions, 5 pts
#3 Snakes, 1 pts
#4 FC Awesome, 1 pts
#5 Grouches, 0 pts
这里没有魔法。只需定义一个函数,将分数转换为分数,应用它,取消左/右属性,然后按团队分组并求和分数。也许有一个更优雅的解决方案 使用您的功能准备数据:
data = '''Lions 3, Snakes 3
Tarantulas 1, FC Awesome 0
Lions 1, FC Awesome 1
Tarantulas 3, Snakes 1
Lions 4, Grouches 0'''
def splitter(row):
left_team, right_team = row.split(',')
return {
'left_team': left_team[:-2].strip(),
'left_score': int(left_team[-2:].strip()),
'right_team': right_team[:-2].strip(),
'right_score': int(right_team[-2:].strip())
}
data = pd.DataFrame(splitter(row) for row in data.split("\n"))
print(data)
Out:
left_score left_team right_score right_team
0 3 Lions 3 Snakes
1 1 Tarantulas 0 FC Awesome
2 1 Lions 1 FC Awesome
3 3 Tarantulas 1 Snakes
4 4 Lions 0 Grouches
使用各队得分添加各队得分列
def points(left_score, right_score):
win_points = 3
draw_points = 1
lose_points = 0
if left_score < right_score:
return pd.Series({'left_points': lose_points, 'right_points': win_points})
elif left_score > right_score:
return pd.Series({'left_points': win_points, 'right_points': lose_points})
else:
return pd.Series({'left_points': draw_points, 'right_points': draw_points})
data = data.merge(
data[['left_score', 'right_score']].apply(lambda row: points(*row), axis=1),
left_index=True, right_index=True
)
print(data)
Out:
left_score left_team right_score right_team left_points right_points
0 3 Lions 3 Snakes 1 1
1 1 Tarantulas 0 FC Awesome 3 0
2 1 Lions 1 FC Awesome 1 1
3 3 Tarantulas 1 Snakes 3 0
4 4 Lions 0 Grouches 3 0
分组以获得最终结果:
result = data.groupby("team")["points"].sum()
print(result)
Out:
team
FC Awesome 1
Grouches 0
Lions 5
Snakes 1
Tarantulas 6
Name: points, dtype: int64
由于某些原因,我的输出不同。我的值是6,4,1,0,0。我唯一改变的是从剪贴板读到csv。知道为什么会这样吗?数据帧具有正确的值,因此可能是不正确的总和?