如何在Python中使用pd.melt
此数据帧来自csv:如何在Python中使用pd.melt,python,pandas,dataframe,Python,Pandas,Dataframe,此数据帧来自csv: id name A B C gpa 0 1111 Phineas NaN B NaN 3.0 1 1113 Tilly NaN NaN C 2.5 2 1110 Andres A NaN NaN 3.8 3 1112 Jax NaN B NaN 3.2 4 1114 Ray NaN B NaN 3.1 5 1115 Koda NaN NaN C 2.4 6
id name A B C gpa
0 1111 Phineas NaN B NaN 3.0
1 1113 Tilly NaN NaN C 2.5
2 1110 Andres A NaN NaN 3.8
3 1112 Jax NaN B NaN 3.2
4 1114 Ray NaN B NaN 3.1
5 1115 Koda NaN NaN C 2.4
6 1120 Bruno A NaN NaN 3.7
7 1134 Davis NaN NaN C 2.6
8 1102 Cassie A NaN NaN 4.0
我想要输出:
id name grade gpa
0 1111 Phineas B 3.0
1 1113 Tilly C 2.5
2 1110 Andres A 3.8
3 1112 Jax C 3.2
4 1114 Ray B 3.1
5 1115 Koda C 2.4
6 1120 Bruno A 3.7
7 1134 Davis C 2.6
8 1102 Cassie A 4.0
代码是什么?如果你不想使用melt,这个解决方案可能对你有用:因为每个学生都有A、B或C专用,你可以先将这些列中的所有
NaN
值转换为空字符串,然后使用+
操作符将A、B和C列连接在一起
导入语句和启动数据帧:
import pandas as pd
import numpy as np
df = pd.DataFrame({'id':[1111,1113],
'name':['Phineas','Tilly'],
'A':[np.NaN,np.NaN],
'B':['B',np.NaN],
'C':[np.NaN,'C'],
'gpa':[3.0,2.5]
})
# id name A B C gpa
# 0 1111 Phineas NaN B NaN 3.0
# 1 1113 Tilly NaN NaN C 2.5
按列串接和输出:
df.fillna('',inplace=True) #replaces all NaN's with ""-empty strings
df['letter_grades'] = df['A'] + df['B'] + df['C'] #concatenate
df = df[['id','name','letter_grades','gpa']] #reassign dataframe identifier
print(df)
# id name letter_grades gpa
#0 1111 Phineas B 3.0
#1 1113 Tilly C 2.5
与一起使用,在这种情况下,您不需要melt
:
df['grade'] = df['A'].combine_first(df['B']).combine_first(df['C'])
df.drop(['A','B','C'], axis=1, inplace=True)
或:
@射线使用'df[[A','B','C']]=df[[A','B','C']]]。替换为('NaN',np.NaN)'。其中np为“作为np输入numpy”。
df['grade'] = df[['A','B','C']].values[df[['A','B','C']].notnull()]
df.drop(['A','B','C'], axis=1, inplace=True)
print(df)
id name gpa grade
0 1111 Phineas 3.0 B
1 1113 Tilly 2.5 C
2 1110 Andres 3.8 A
3 1112 Jax 3.2 B
4 1114 Ray 3.1 B
5 1115 Koda 2.4 C
6 1120 Bruno 3.7 A
7 1134 Davis 2.6 C
8 1102 Cassie 4.0 A