python中的虚拟化
我试图将python中的虚拟化,python,pandas,Python,Pandas,我试图将nl列转换为6列,即转换为: id nl A 3 B 1 B 5 C 2 C 3 为此: id nl_1 nl_2 nl_3 nl_4 nl_5 nl_6 A 0 0 1 0 0 0 B 1 0 0 0 1 0 C 0 1 1 0 0 0 有了这个, import pandas as pd pd.get_dummies(df['id'], prefix
nl
列转换为6列,即转换为:
id nl
A 3
B 1
B 5
C 2
C 3
为此:
id nl_1 nl_2 nl_3 nl_4 nl_5 nl_6
A 0 0 1 0 0 0
B 1 0 0 0 1 0
C 0 1 1 0 0 0
有了这个,
import pandas as pd
pd.get_dummies(df['id'], prefix = 'nl')
df['id'].join(dummies)
我已设法获得以下信息:
id nl_1 nl_2 nl_3 nl_4 nl_5 nl_6
A 0 0 1 0 0 0
B 1 0 0 0 0 0
B 0 0 0 0 1 0
C 0 1 0 0 0 0
C 0 0 1 0 0 0
我如何跳到最后一步去得到我想要的
谢谢我认为您需要使用聚合:
全部加在一起-添加了缺失代码,可能在实际数据中没有必要:
print (df)
id nl
0 A 3
1 B 1
2 B 5
3 C 2
4 C 3
dummies = pd.get_dummies(df['nl'], prefix = 'nl')
cols =['nl_' + str(x) for x in range(1, 7)]
print (cols)
['nl_1', 'nl_2', 'nl_3', 'nl_4', 'nl_5', 'nl_6']
dummies = dummies.reindex(columns = cols, fill_value=0)
df = pd.concat([df.id, dummies], axis=1)
print (df)
id nl_1 nl_2 nl_3 nl_4 nl_5 nl_6
0 A 0 0 1 0 0 0
1 B 1 0 0 0 0 0
2 B 0 0 0 0 1 0
3 C 0 1 0 0 0 0
4 C 0 0 1 0 0 0
df1 = df.groupby('id', as_index=False).max()
print (df1)
id nl_1 nl_2 nl_3 nl_4 nl_5 nl_6
0 A 0 0 1 0 0 0
1 B 1 0 0 0 1 0
2 C 0 1 1 0 0 0
你是一台机器。:-)不,我会在几周后考虑解决方案——我会尝试逻辑and,然后我发现最好的方法是
max
如果我的答案有用,不要忘记接受它。谢谢
print (df)
id nl
0 A 3
1 B 1
2 B 5
3 C 2
4 C 3
dummies = pd.get_dummies(df['nl'], prefix = 'nl')
cols =['nl_' + str(x) for x in range(1, 7)]
print (cols)
['nl_1', 'nl_2', 'nl_3', 'nl_4', 'nl_5', 'nl_6']
dummies = dummies.reindex(columns = cols, fill_value=0)
df = pd.concat([df.id, dummies], axis=1)
print (df)
id nl_1 nl_2 nl_3 nl_4 nl_5 nl_6
0 A 0 0 1 0 0 0
1 B 1 0 0 0 0 0
2 B 0 0 0 0 1 0
3 C 0 1 0 0 0 0
4 C 0 0 1 0 0 0
df1 = df.groupby('id', as_index=False).max()
print (df1)
id nl_1 nl_2 nl_3 nl_4 nl_5 nl_6
0 A 0 0 1 0 0 0
1 B 1 0 0 0 1 0
2 C 0 1 1 0 0 0