Python 两列唯一字符串

Python 两列唯一字符串,python,pandas,dataframe,pandas-groupby,Python,Pandas,Dataframe,Pandas Groupby,我想找到person1和person2列的唯一组合,尽管数据框中的值相反。下面您可以找到初始数据帧示例,我想在其中查找唯一的人员: df = pd.DataFrame({"person1":["AL","IN","AN","DL","IN","AL","AL","IN","AN"], "person2":["AL","AN", np.nan,"AL","AN","AL","DL","IN","IN"]}) person1 person2 0

我想找到
person1
person2
列的唯一组合,尽管数据框中的值相反。下面您可以找到初始数据帧示例,我想在其中查找唯一的人员:

df = pd.DataFrame({"person1":["AL","IN","AN","DL","IN","AL","AL","IN","AN"],
                   "person2":["AL","AN", np.nan,"AL","AN","AL","DL","IN","IN"]})

  person1  person2
0     AL      AL
1     IN      AN
2     AN      NAN
3     DL      AL
4     IN      AN
5     AL      AL
6     AL      DL
7     IN      IN
8     AN      IN
我想要的输出如下所示:

  person1  person2  person
0     AL      AL     AL
1     IN      AN    IN/AN
2     AN      NAN    AN
3     DL      AL    DL/AL
4     IN      AN    IN/AN
5     AL      AL     AL
6     AL      DL    DL/AL  # Since it has been added as DL/AL NOT AL/DL
7     IN      IN     IN
8     AN      IN    IN/AN  # Since it has been added as IN/AN NOT AN/IN
我使用了以下代码:

df['person'] = np.where(df.person1 != df.person2,
                                     df.person1 + "/" + df.person2, df.person1)
但在我上面的例子中,它在索引6和索引8中返回
AL/DL
AN/IN
。和往常一样,当我看不到合适的方法时,我可以得到
DL/AL
IN/AN


熊猫大师,请给我指路:)

如果可能,请对两列进行排序:

df1 = pd.DataFrame(np.sort(df[['person1','person2']].fillna('')), 
                   index=df.index,
                   columns=['person1','person2'])
df['person'] = np.where(df1.person1 != df1.person2,
                        df1.person1.str.cat(df1.person2,  sep="/").str.strip('/'),
                        df1.person1)
print (df)
  person1 person2 person
0      AL      AL     AL
1      IN      AN  AN/IN
2      AN     NaN     AN
3      DL      AL  AL/DL
4      IN      AN  AN/IN
5      AL      AL     AL
6      AL      DL  AL/DL
7      IN      IN     IN
8      AN      IN  AN/IN

您可以使用方法
apply()

输出:

  person1 person2 person
0      AL      AL     AL
1      IN      AN  AN/IN
2      AN     NaN     AN
3      DL      AL  AL/DL
4      IN      AN  AN/IN
5      AL      AL     AL
6      AL      DL  AL/DL
7      IN      IN     IN
8      AN      IN  AN/IN

如果同一两个人发生两次以上的情况会怎样?那么这个专栏应该是什么呢?其次,您真的需要将其作为数据帧列吗?否则生成组合很容易。
  person1 person2 person
0      AL      AL     AL
1      IN      AN  AN/IN
2      AN     NaN     AN
3      DL      AL  AL/DL
4      IN      AN  AN/IN
5      AL      AL     AL
6      AL      DL  AL/DL
7      IN      IN     IN
8      AN      IN  AN/IN