Python 删除不同列中的重复值
我有以下数据帧:Python 删除不同列中的重复值,python,pandas,Python,Pandas,我有以下数据帧: >>>Feature name error1 error2 error3 error4 0 1 A overlaps overlaps overlaps overlaps 1 2 B No error 2 3 C overlaps invalid invalid 3 4 D invalid overlaps ove
>>>Feature name error1 error2 error3 error4
0 1 A overlaps overlaps overlaps overlaps
1 2 B No error
2 3 C overlaps invalid invalid
3 4 D invalid overlaps overlaps
我只希望每行有唯一的错误,例如:
>>>Feature Name error1 error2 error3 error4
0 1 A overlaps
1 2 B No error
2 3 C overlaps invalid
3 4 D invalid overlaps
有什么简单的方法可以做到这一点吗?我想也许可以计算每行中每个值出现的次数,但我不确定如何删除它们想法是从
错误
列中删除重复项,添加以添加可能删除的列,然后重新分配:
cols = df.filter(like='error').columns
df[cols] = (df[cols].apply(lambda x: pd.Series(x.unique()), axis=1)
.reindex(np.arange(len(cols)), axis=1))
print (df)
Feature name error1 error2 error3 error4
0 1 A overlaps NaN NaN NaN
1 2 B No error NaN NaN
2 3 C overlaps invalid NaN NaN
3 4 D invalid overlaps NaN NaN
想法是从
error
列中删除重复项,为添加可能删除的列添加,然后重新分配:
cols = df.filter(like='error').columns
df[cols] = (df[cols].apply(lambda x: pd.Series(x.unique()), axis=1)
.reindex(np.arange(len(cols)), axis=1))
print (df)
Feature name error1 error2 error3 error4
0 1 A overlaps NaN NaN NaN
1 2 B No error NaN NaN
2 3 C overlaps invalid NaN NaN
3 4 D invalid overlaps NaN NaN
试一试
out = pd.DataFrame(list(map(pd.unique, df.loc[:,'error1':].values)),index=df.Feature)
Out[333]:
0 1 2
Feature
1 overlaps None None
2 No error None
3 overlaps invalid None
4 invalid overlaps None
试一试
out = pd.DataFrame(list(map(pd.unique, df.loc[:,'error1':].values)),index=df.Feature)
Out[333]:
0 1 2
Feature
1 overlaps None None
2 No error None
3 overlaps invalid None
4 invalid overlaps None
这两个人创建了与我开始时相同的表格我编辑了我的问题-错误和特征旁边有更多的列这两个人创建了与我开始时相同的表格我编辑了我的问题-错误和特征旁边有更多的列