Python 两列唯一字符串
我想找到Python 两列唯一字符串,python,pandas,dataframe,pandas-groupby,Python,Pandas,Dataframe,Pandas Groupby,我想找到person1和person2列的唯一组合,尽管数据框中的值相反。下面您可以找到初始数据帧示例,我想在其中查找唯一的人员: df = pd.DataFrame({"person1":["AL","IN","AN","DL","IN","AL","AL","IN","AN"], "person2":["AL","AN", np.nan,"AL","AN","AL","DL","IN","IN"]}) person1 person2 0
person1
和person2
列的唯一组合,尽管数据框中的值相反。下面您可以找到初始数据帧示例,我想在其中查找唯一的人员:
df = pd.DataFrame({"person1":["AL","IN","AN","DL","IN","AL","AL","IN","AN"],
"person2":["AL","AN", np.nan,"AL","AN","AL","DL","IN","IN"]})
person1 person2
0 AL AL
1 IN AN
2 AN NAN
3 DL AL
4 IN AN
5 AL AL
6 AL DL
7 IN IN
8 AN IN
我想要的输出如下所示:
person1 person2 person
0 AL AL AL
1 IN AN IN/AN
2 AN NAN AN
3 DL AL DL/AL
4 IN AN IN/AN
5 AL AL AL
6 AL DL DL/AL # Since it has been added as DL/AL NOT AL/DL
7 IN IN IN
8 AN IN IN/AN # Since it has been added as IN/AN NOT AN/IN
我使用了以下代码:
df['person'] = np.where(df.person1 != df.person2,
df.person1 + "/" + df.person2, df.person1)
但在我上面的例子中,它在索引6和索引8中返回AL/DL
和AN/IN
。和往常一样,当我看不到合适的方法时,我可以得到DL/AL
和IN/AN
熊猫大师,请给我指路:)如果可能,请对两列进行排序:
df1 = pd.DataFrame(np.sort(df[['person1','person2']].fillna('')),
index=df.index,
columns=['person1','person2'])
df['person'] = np.where(df1.person1 != df1.person2,
df1.person1.str.cat(df1.person2, sep="/").str.strip('/'),
df1.person1)
print (df)
person1 person2 person
0 AL AL AL
1 IN AN AN/IN
2 AN NaN AN
3 DL AL AL/DL
4 IN AN AN/IN
5 AL AL AL
6 AL DL AL/DL
7 IN IN IN
8 AN IN AN/IN
您可以使用方法
apply()
:
输出:
person1 person2 person
0 AL AL AL
1 IN AN AN/IN
2 AN NaN AN
3 DL AL AL/DL
4 IN AN AN/IN
5 AL AL AL
6 AL DL AL/DL
7 IN IN IN
8 AN IN AN/IN
如果同一两个人发生两次以上的情况会怎样?那么这个专栏应该是什么呢?其次,您真的需要将其作为数据帧列吗?否则生成组合很容易。
person1 person2 person
0 AL AL AL
1 IN AN AN/IN
2 AN NaN AN
3 DL AL AL/DL
4 IN AN AN/IN
5 AL AL AL
6 AL DL AL/DL
7 IN IN IN
8 AN IN AN/IN