Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/358.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 熊猫:将重复组添加到多个列的列_Python_Python 3.x_Pandas - Fatal编程技术网

Python 熊猫:将重复组添加到多个列的列

Python 熊猫:将重复组添加到多个列的列,python,python-3.x,pandas,Python,Python 3.x,Pandas,我有以下建议: set_id A,B A,C,E A 预期结果: set_id set_id_1 set_id_2 set_id_3 A,B A B null A,C,E A C E A A null null set_id可以有n个值。假设set_id中的最大值为100,我应该有100个新列 我试着使用多标签二值化器 df1 = pd.DataFrame() df1['

我有以下建议:

set_id
A,B
A,C,E
A
预期结果:

set_id  set_id_1 set_id_2 set_id_3
A,B      A          B       null
A,C,E    A          C        E
A        A          null     null 
set_id可以有n个值。假设set_id中的最大值为100,我应该有100个新列

我试着使用多标签二值化器

df1 = pd.DataFrame()
df1['set_id'] = df['set_id'].str.split(',')
from sklearn.preprocessing import MultiLabelBinarizer  
mlb = MultiLabelBinarizer() 
df1=df.join(pd.DataFrame(mlb.fit_transform(df['set_id'])                                  ,columns=mlb.classes_,index=df.head(100).index))
它将创建超过100K个列,因为我有超过100K个唯一记录

用于
数据帧的
expand=True

df1 = df['set_id'].str.split(',', expand=True)
具有列表理解功能的替代快速解决方案:

df1 = pd.DataFrame([x.split(',') for x in df['set_id']])


谢谢你的快速回答。。也可以用-1填充None。我得到一个错误值error:当尝试df1.fillna(-1)和df1.fillna('-1')时,填充值必须在类别中。它抛出了相同的错误。。然而,这对列表中的列(df1.columns)有效:df1[col]=df1[col].replace(np.nan,-1)。谢谢你的回答
df1.columns = [f'set_id_{x+1}' for x in df1.columns]
df1 = df.join(df1)

print (df1)
  set_id set_id_1 set_id_2 set_id_3
0    A,B        A        B     None
1  A,C,E        A        C        E
2      A        A     None     None