Python 转换数据帧,将行值添加为列标题
我有这样一个熊猫数据框:Python 转换数据帧,将行值添加为列标题,python,pandas,dataframe,Python,Pandas,Dataframe,我有这样一个熊猫数据框: COMMIT_ID | FILE_NAME | COMMITTER | CHANGE TYPE ------------------------------------------------------------- 1 | package.json | A | MODIFY 2 | main.js | B | ADD 2 | class.java | B
COMMIT_ID | FILE_NAME | COMMITTER | CHANGE TYPE
-------------------------------------------------------------
1 | package.json | A | MODIFY
2 | main.js | B | ADD
2 | class.java | B | DELETE
我希望文件名的行值作为列标题,更改类型作为值
COMMIT_ID | package.json | main.js | class.java | COMMITTER
-----------------------------------------------------------------------------
1 | MODIFY | NONE | NONE | A
2 | NONE | ADD | DELETE | B
我试过使用pandas.pivot\u table,但不是很成功。有没有机会轻松做到这一点?我想您需要+:
带-的解决方案需要聚合函数,如sum
(连接不带分隔符的字符串)或“'.join
(连接带分隔符的字符串),如果重复:
print (df)
COMMIT_ID FILE_NAME COMMITTER CHANGE TYPE
0 1 package.json A MODIFY
1 2 main.js B ADD
2 2 class.java B DELETE
3 2 class.java B ADD
df = df.pivot_table(index=['COMMIT_ID','COMMITTER'],
columns='FILE_NAME',
values='CHANGE TYPE',
aggfunc='sum').reset_index()
print (df)
FILE_NAME COMMIT_ID COMMITTER class.java main.js package.json
0 1 A None None MODIFY
1 2 B DELETEADD ADD None
或:
使用first
进行聚合也有效,但可能会丢失重复的值:
df = df.pivot_table(index=['COMMIT_ID','COMMITTER'],
columns='FILE_NAME',
values='CHANGE TYPE',
aggfunc='first').reset_index()
print (df)
FILE_NAME COMMIT_ID COMMITTER class.java main.js package.json
0 1 A None None MODIFY
1 2 B DELETE ADD None
重命名列名称的最后添加:
严重怀疑你是熊猫机器人@jezrael。
df = df.pivot_table(index=['COMMIT_ID','COMMITTER'],
columns='FILE_NAME',
values='CHANGE TYPE',
aggfunc='_'.join).reset_index()
print (df)
FILE_NAME COMMIT_ID COMMITTER class.java main.js package.json
0 1 A None None MODIFY
1 2 B DELETE_ADD ADD None
df = df.pivot_table(index=['COMMIT_ID','COMMITTER'],
columns='FILE_NAME',
values='CHANGE TYPE',
aggfunc='first').reset_index()
print (df)
FILE_NAME COMMIT_ID COMMITTER class.java main.js package.json
0 1 A None None MODIFY
1 2 B DELETE ADD None
df = df.rename_axis(None, axis=1)
print (df)
COMMIT_ID COMMITTER class.java main.js package.json
0 1 A None None MODIFY
1 2 B DELETEADD ADD None