Python 创建一个数据帧,包括分组和合计
我有以下数据框:Python 创建一个数据帧,包括分组和合计,python,pandas,dataframe,Python,Pandas,Dataframe,我有以下数据框: Race Course Horse Year Month Day Amount Won/Lost 0 Aintree Red Rum 2017 5 12 11.58 won 1 Punchestown Camelot 2016 12 22 122.52 won 2 Sandown
Race Course Horse Year Month Day Amount Won/Lost
0 Aintree Red Rum 2017 5 12 11.58 won
1 Punchestown Camelot 2016 12 22 122.52 won
2 Sandown Beef of Salmon 2016 11 17 20.00 lost
3 Ayr Corbiere 2016 11 3 25.00 lost
4 Fairyhouse Red Rum 2016 12 2 65.75 won
5 Ayr Camelot 2017 3 11 12.05 won
6 Aintree Hurricane Fly 2017 5 12 11.58 won
7 Punchestown Beef or Salmon 2016 12 22 112.52 won
8 Sandown Aldaniti 2016 11 17 10.00 lost
9 Ayr Henry the Navigator 2016 11 1 15.00 lost
10 Fairyhouse Jumanji 2016 10 2 65.75 won
11 Ayr Came Second 2017 3 11 12.05 won
12 Aintree Murder 2017 5 12 5.00 lost
13 Punchestown King Arthur 2016 6 22 52.52 won
14 Sandown Filet of Fish 2016 11 17 20.00 lost
15 Ayr Denial 2016 11 3 25.00 lost
16 Fairyhouse Don't Gamble 2016 12 12 165.75 won
17 Ayr Ireland 2017 1 11 22.05 won
我正在尝试创建另一个数据帧,其中仅包括所有比赛(行)和所有获胜比赛的总和。理想情况下,它将如下所示:
total races 18
total won 11
然而,我所能做的就是按计数分组,计算总赢和总输。这就是我所尝试的:
df = df.groupby(['Won/Lost']).size().add_prefix('total')
这就是它的回报:
Won/Lost
total lost 7
total won 11
dtype: int64
我正处于死胡同,无法找到简单的解决方案。假设
races.csv的内容是:
Race Course,Horse,Year,Month,Day,Amount,Won/Lost
Aintree,Red Rum,2017,5,12,11.58,won
Punchestown,Camelot,2016,12,22,122.52,won
Sandown,Beef of Salmon,2016,11,17,20.00,lost
Ayr,Corbiere,2016,11,3,25.00,lost
Fairyhouse,Red Rum,2016,12,2,65.75,won
Ayr,Camelot,2017,3,11,12.05,won
Aintree,Hurricane Fly,2017,5,12,11.58,won
Punchestown,Beef or Salmon,2016,12,22,112.52,won
Sandown,Aldaniti,2016,11,17,10.00,lost
Ayr,Henry the Navigator,2016,11,1,15.00,lost
Fairyhouse,Jumanji,2016,10,2,65.75,won
Ayr,Came Second,2017,3,11,12.05,won
Aintree,Murder,2017,5,12,5.00,lost
Punchestown,King Arthur,2016,6,22,52.52,won
Sandown,Filet of Fish,2016,11,17,20.00,lost
Ayr,Denial,2016,11,3,25.00,lost
Fairyhouse,Don't Gamble,2016,12,12,165.75,won
Ayr,Ireland,2017,1,11,22.05,won
获取新数据帧的步骤:
>>> races_df = pd.read_csv('races.csv')
>>> races_df
Race Course Horse Year Month Day Amount Won/Lost
0 Aintree Red Rum 2017 5 12 11.58 won
1 Punchestown Camelot 2016 12 22 122.52 won
2 Sandown Beef of Salmon 2016 11 17 20.00 lost
3 Ayr Corbiere 2016 11 3 25.00 lost
4 Fairyhouse Red Rum 2016 12 2 65.75 won
5 Ayr Camelot 2017 3 11 12.05 won
6 Aintree Hurricane Fly 2017 5 12 11.58 won
7 Punchestown Beef or Salmon 2016 12 22 112.52 won
8 Sandown Aldaniti 2016 11 17 10.00 lost
9 Ayr Henry the Navigator 2016 11 1 15.00 lost
10 Fairyhouse Jumanji 2016 10 2 65.75 won
11 Ayr Came Second 2017 3 11 12.05 won
12 Aintree Murder 2017 5 12 5.00 lost
13 Punchestown King Arthur 2016 6 22 52.52 won
14 Sandown Filet of Fish 2016 11 17 20.00 lost
15 Ayr Denial 2016 11 3 25.00 lost
16 Fairyhouse Don't Gamble 2016 12 12 165.75 won
17 Ayr Ireland 2017 1 11 22.05 won
>>>
>>> total_races = len(races_df)
>>>
>>> total_win = races_df[races_df['Won/Lost'] == 'won']['Won/Lost'].count()
>>>
>>> new_df = pd.DataFrame({'total_races': total_races, 'total_win': total_win}, index=pd.RangeIndex(1))
>>>
>>> new_df
total_races total_win
0 18 11
因此,总比赛数将是len(df)
,总获胜数是(df['won/Lost']='won')。sum()
?不起作用,它返回相同值的6行数据帧,同时总获胜数返回0Hi@TomWalsh,我现在编辑的'won/Lost'列名搞错了。另外,我刚刚测试了将数据存储到csv文件中,从中创建一个数据帧races_df
,并运行上述代码,它给了我完美的结果。能否检查“赢/输”值中是否没有多余的空格。另外,您的原始数据的格式是什么?嗨,原始数据是一个csv文件,我已经读取并存储在数据框中。您认为“赢/输”值中有额外的空格是正确的。这可能会解决我在使用此数据集时遇到的许多问题。至于“new_df”,它仍然是由6行相同的行创建的,关于为什么会这样,有什么想法吗?所以我已经弄明白了,正如在对我的原始问题的评论中正确指出的那样,total_win的解决方案是(df['Won/Lost']='Won')。sum(),你的种族解决方案[races_df['Won/Lost']='Won'].count()。值返回了一个数组。另外,您的数据帧声明没有索引,并且使用标量值,因此会抛出错误。很好,您的问题已经解决了!根据您的反馈,我对代码进行了轻微修改,并再次发布了代码,以及用于测试的csv文件。希望能有帮助。