Python 我想根据一个条件将Pandas数据帧拆分为2个数据帧

Python 我想根据一个条件将Pandas数据帧拆分为2个数据帧,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我有一个4列的基本数据框 column_A column_B column_C id 0 1 1 anna 123 1 2 1 anna 7 2 30 2 bob 42 2 20 2 bob 12 3 10 3 charlie 1 4 100 3 dav

我有一个4列的基本数据框

 column_A column_B  column_C   id  
0       1       1      anna    123
1       2       1      anna      7
2      30       2      bob      42
2      20       2      bob      12
3      10       3      charlie   1
4     100       3      david     2
我想将其拆分为具有以下属性的2个不同数据帧

数据帧1:

 column_A column_B  column_C   id  
0       1       1      anna    123
1       2       1      anna      7
2      30       2      bob      42
2      20       2      bob      12
其中,列B和列C中的两个值都匹配

数据帧2:

  column_A column_B  column_C   id
3      10       3      charlie   1
4     100       3      david     2

如果列中只有值匹配,您可以检查重复项

In [200]: dfs = {i: n for i, n in df.groupby(
                    df.duplicated(subset=['column_B', 'column_C'], keep=False))}

In [201]: dfs[True]
Out[201]:
   column_A  column_B column_C   id
0         1         1     anna  123
1         2         1     anna    7
2        30         2      bob   42
2        20         2      bob   12

In [202]: dfs[False]
Out[202]:
   column_A  column_B column_C  id
3        10         3  charlie   1
4       100         3    david   2

你可以检查是否有重复的

In [200]: dfs = {i: n for i, n in df.groupby(
                    df.duplicated(subset=['column_B', 'column_C'], keep=False))}

In [201]: dfs[True]
Out[201]:
   column_A  column_B column_C   id
0         1         1     anna  123
1         2         1     anna    7
2        30         2      bob   42
2        20         2      bob   12

In [202]: dfs[False]
Out[202]:
   column_A  column_B column_C  id
3        10         3  charlie   1
4       100         3    david   2

要保留不重复的记录(第5行和第6行),请使用
drop_duplicates
功能:

dfA = df.drop_duplicates(subset = ['column_B', 'column_C'], keep = False)
输出:

   column_A  column_B column_C  column_D
4        10         3  charlie         1
5       100         3    davis         2
   column_A  column_B column_C  column_D
0         1         1     anna       123
1         2         1     anna         7
2        30         2      bob        42
3        20         2      bob        12
要保留重复记录(第1行至第4行),请使用
duplicated
功能:

dfB = df[df.duplicated(subset = ['column_B', 'column_C'], keep = False)]
输出:

   column_A  column_B column_C  column_D
4        10         3  charlie         1
5       100         3    davis         2
   column_A  column_B column_C  column_D
0         1         1     anna       123
1         2         1     anna         7
2        30         2      bob        42
3        20         2      bob        12

要保留不重复的记录(第5行和第6行),请使用
drop_duplicates
功能:

dfA = df.drop_duplicates(subset = ['column_B', 'column_C'], keep = False)
输出:

   column_A  column_B column_C  column_D
4        10         3  charlie         1
5       100         3    davis         2
   column_A  column_B column_C  column_D
0         1         1     anna       123
1         2         1     anna         7
2        30         2      bob        42
3        20         2      bob        12
要保留重复记录(第1行至第4行),请使用
duplicated
功能:

dfB = df[df.duplicated(subset = ['column_B', 'column_C'], keep = False)]
输出:

   column_A  column_B column_C  column_D
4        10         3  charlie         1
5       100         3    davis         2
   column_A  column_B column_C  column_D
0         1         1     anna       123
1         2         1     anna         7
2        30         2      bob        42
3        20         2      bob        12