Python-熊猫转置游戏日志数据
我有一个数据集(nba_数据),我在转换时遇到了问题。我想要的是改变以下内容Python-熊猫转置游戏日志数据,python,pandas,transpose,Python,Pandas,Transpose,我有一个数据集(nba_数据),我在转换时遇到了问题。我想要的是改变以下内容 TEAM_ABBREVIATION GAME_DATE WinLoss HomeAway ATL 2016-10-27 W H ATL 2016-10-29 W A ATL 2016-10-31 W H ATL
TEAM_ABBREVIATION GAME_DATE WinLoss HomeAway
ATL 2016-10-27 W H
ATL 2016-10-29 W A
ATL 2016-10-31 W H
ATL 2016-11-02 L H
BKN 2016-10-26 L A
BKN 2016-10-28 W H
BKN 2016-10-29 L A
BKN 2016-10-31 L H
TEAM_ABBREVIATION GAME_DATE HomeWin HomeLoss AwayWin AwayLoss
ATL 2016-10-27 1 0 0 0
ATL 2016-10-29 1 0 1 0
ATL 2016-10-31 2 0 1 0
ATL 2016-11-02 2 1 1 0
BKN 2016-10-26 0 0 0 1
BKN 2016-10-28 1 0 0 1
BKN 2016-10-29 1 0 0 2
BKN 2016-10-31 1 1 0 2
对以下内容:
TEAM_ABBREVIATION GAME_DATE WinLoss HomeAway
ATL 2016-10-27 W H
ATL 2016-10-29 W A
ATL 2016-10-31 W H
ATL 2016-11-02 L H
BKN 2016-10-26 L A
BKN 2016-10-28 W H
BKN 2016-10-29 L A
BKN 2016-10-31 L H
TEAM_ABBREVIATION GAME_DATE HomeWin HomeLoss AwayWin AwayLoss
ATL 2016-10-27 1 0 0 0
ATL 2016-10-29 1 0 1 0
ATL 2016-10-31 2 0 1 0
ATL 2016-11-02 2 1 1 0
BKN 2016-10-26 0 0 0 1
BKN 2016-10-28 1 0 0 1
BKN 2016-10-29 1 0 0 2
BKN 2016-10-31 1 1 0 2
如果你能帮忙,那就太好了
谢谢,
汤姆
屈服
TEAM_ABBREVIATION GAME_DATE HomeWin HomeLoss AwayWin AwayLoss
0 ATL 2016-10-27 1 0 0 0
1 ATL 2016-10-29 1 0 1 0
2 ATL 2016-10-31 2 0 1 0
3 ATL 2016-11-02 2 1 1 0
4 BKN 2016-10-26 0 0 0 1
5 BKN 2016-10-28 1 0 0 1
6 BKN 2016-10-29 1 0 0 2
7 BKN 2016-10-31 1 1 0 2
第一个想法是,有4种“事件”对应于
WinLoss
和HomeAway
列中的4种可能值组合:(W,H)
,(W,A)
,(L,H)
和(L,A)
因此,想要将WinLoss
和HomeAway
列组合成一个列是很自然的:
In [111]: df['HomeAway'] + df['WinLoss']
Out[111]:
0 HW
1 AW
2 HW
3 HL
4 AL
5 HW
6 AL
7 HL
dtype: object
然后使用get_dummies
将此系列转换为1和0的表格:
In [112]: pd.get_dummies(df['HomeAway'] + df['WinLoss']).astype('int')
Out[112]:
AL AW HL HW
0 0 0 0 1
1 0 1 0 0
2 0 0 0 1
3 0 0 1 0
4 1 0 0 0
5 0 0 0 1
6 1 0 0 0
7 0 0 1 0
现在,通过与您期望的结果进行比较,我们可以看到,我们还希望获得一个累计总和,按TEAM\u缩写
分组:
In [114]: result.groupby(df['TEAM_ABBREVIATION']).transform('cumsum')
Out[114]:
AL AW HL HW
0 0 0 0 1
1 0 1 0 1
2 0 1 0 2
3 0 1 1 2
4 1 0 0 0
5 1 0 0 1
6 2 0 0 1
7 2 0 1 1
接下来的两行将对列进行重新排序和重命名:
result = result.sort_index(axis='columns', ascending=False)
result = result.rename(columns={'AL':'AwayLoss', 'AW':'AwayWin',
'HL':'HomeLoss', 'HW':'HomeWin'})
最后,我们可以使用pd.concat
将df
与result
连接起来,并构建所需的数据帧:
result = pd.concat([df[['TEAM_ABBREVIATION', 'GAME_DATE']], result], axis='columns')
这个
get\u dummies
方法很好!