Python 如何将不同的行分组,并将类别的计数添加到pandas中的新列中?
1.删除重复项['Match_Date','Games'] i、 e 2019-07-10足球赛(F)数量+2019-07-10板球赛(C)=5(F或C) 2019-07-11足球比赛(F)数量+2019-07-11板球比赛(C)=6(F_或_C),依此类推 2.添加一列F和C,如最终数据如下所示:Python 如何将不同的行分组,并将类别的计数添加到pandas中的新列中?,python,pandas,Python,Pandas,1.删除重复项['Match_Date','Games'] i、 e 2019-07-10足球赛(F)数量+2019-07-10板球赛(C)=5(F或C) 2019-07-11足球比赛(F)数量+2019-07-11板球比赛(C)=6(F_或_C),依此类推 2.添加一列F和C,如最终数据如下所示: Game_ID Games Match_Date Total_Games_Each_Day F_or_C 1 Football 2019-07-10 5
Game_ID Games Match_Date Total_Games_Each_Day F_or_C
1 Football 2019-07-10 5 2
2 Cricket 2019-07-10 5 3
3 Cricket 2019-07-10 5 3
4 Football 2019-07-10 5 2
5 Cricket 2019-07-10 5 3
6 Football 2019-07-11 6 4
7 Cricket 2019-07-11 6 2
8 Cricket 2019-07-11 6 2
9 Football 2019-07-11 6 4
10 Football 2019-07-11 6 4
11 Football 2019-07-11 6 4
12 Football 2019-07-16 6 6
13 Football 2019-07-16 6 6
14 Football 2019-07-16 6 6
F列中的Null表示当天没有踢足球,C列中的Null表示当天没有踢足球。IIUC,您需要使用
交叉表
Game_ID Games Match_Date Total_Games_Each_Day F_or_C F C
1 Football 2019-07-10 5 2 2 Null
2 Cricket 2019-07-10 5 3 Null 3
3 Football 2019-07-11 6 4 4 Null
4 Cricket 2019-07-11 6 2 Null 2
5 Football 2019-07-16 6 6 6 Null
到目前为止,您尝试了什么?game.groupby(['Match_Date','Games'])['game ID'].count()不起作用。使用drop_副本也会带来问题
from io import StringIO
# read your dataframe
df = pd.read_csv(StringIO(your_data),sep='\s+',parse_dates=['Match_Date'])
#note the datetime column.
s = df.drop_duplicates(subset=['Match_Date','Games'])
new_df = s.join(pd.crosstab(s.index, s.Games.str[0], s["F_or_C"], aggfunc="first"))
Game_ID Games Match_Date Total_Games_Each_Day F_or_C C F
0 1 Football 2019-07-10 5 2 NaN 2.0
1 2 Cricket 2019-07-10 5 3 3.0 NaN
5 6 Football 2019-07-11 6 4 NaN 4.0
6 7 Cricket 2019-07-11 6 2 2.0 NaN
11 12 Football 2019-07-16 6 6 NaN 6.0