python中的虚拟化

python中的虚拟化,python,pandas,Python,Pandas,我试图将nl列转换为6列,即转换为: id nl A 3 B 1 B 5 C 2 C 3 为此: id nl_1 nl_2 nl_3 nl_4 nl_5 nl_6 A 0 0 1 0 0 0 B 1 0 0 0 1 0 C 0 1 1 0 0 0 有了这个, import pandas as pd pd.get_dummies(df['id'], prefix

我试图将
nl
列转换为6列,即转换为:

id  nl
A   3
B   1
B   5
C   2
C   3
为此:

id   nl_1 nl_2 nl_3 nl_4 nl_5 nl_6
A    0    0    1    0    0    0
B    1    0    0    0    1    0
C    0    1    1    0    0    0
有了这个,

import pandas as pd
pd.get_dummies(df['id'], prefix = 'nl')
df['id'].join(dummies)
我已设法获得以下信息:

id   nl_1 nl_2 nl_3 nl_4 nl_5 nl_6
A    0    0    1    0    0    0
B    1    0    0    0    0    0
B    0    0    0    0    1    0
C    0    1    0    0    0    0
C    0    0    1    0    0    0
我如何跳到最后一步去得到我想要的

谢谢

我认为您需要使用聚合:

全部加在一起-添加了缺失代码,可能在实际数据中没有必要:

print (df)
  id  nl
0  A   3
1  B   1
2  B   5
3  C   2
4  C   3

dummies = pd.get_dummies(df['nl'], prefix = 'nl')

cols =['nl_' + str(x) for x in range(1, 7)]
print (cols)
['nl_1', 'nl_2', 'nl_3', 'nl_4', 'nl_5', 'nl_6']

dummies = dummies.reindex(columns = cols, fill_value=0)
df = pd.concat([df.id, dummies], axis=1)
print (df)
  id  nl_1  nl_2  nl_3  nl_4  nl_5  nl_6
0  A     0     0     1     0     0     0
1  B     1     0     0     0     0     0
2  B     0     0     0     0     1     0
3  C     0     1     0     0     0     0
4  C     0     0     1     0     0     0

df1 = df.groupby('id', as_index=False).max()
print (df1)
  id  nl_1  nl_2  nl_3  nl_4  nl_5  nl_6
0  A     0     0     1     0     0     0
1  B     1     0     0     0     1     0
2  C     0     1     1     0     0     0

你是一台机器。:-)不,我会在几周后考虑解决方案——我会尝试逻辑and,然后我发现最好的方法是
max
如果我的答案有用,不要忘记
接受它。谢谢
print (df)
  id  nl
0  A   3
1  B   1
2  B   5
3  C   2
4  C   3

dummies = pd.get_dummies(df['nl'], prefix = 'nl')

cols =['nl_' + str(x) for x in range(1, 7)]
print (cols)
['nl_1', 'nl_2', 'nl_3', 'nl_4', 'nl_5', 'nl_6']

dummies = dummies.reindex(columns = cols, fill_value=0)
df = pd.concat([df.id, dummies], axis=1)
print (df)
  id  nl_1  nl_2  nl_3  nl_4  nl_5  nl_6
0  A     0     0     1     0     0     0
1  B     1     0     0     0     0     0
2  B     0     0     0     0     1     0
3  C     0     1     0     0     0     0
4  C     0     0     1     0     0     0

df1 = df.groupby('id', as_index=False).max()
print (df1)
  id  nl_1  nl_2  nl_3  nl_4  nl_5  nl_6
0  A     0     0     1     0     0     0
1  B     1     0     0     0     1     0
2  C     0     1     1     0     0     0