Python 我想根据一个条件将Pandas数据帧拆分为2个数据帧
我有一个4列的基本数据框Python 我想根据一个条件将Pandas数据帧拆分为2个数据帧,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我有一个4列的基本数据框 column_A column_B column_C id 0 1 1 anna 123 1 2 1 anna 7 2 30 2 bob 42 2 20 2 bob 12 3 10 3 charlie 1 4 100 3 dav
column_A column_B column_C id
0 1 1 anna 123
1 2 1 anna 7
2 30 2 bob 42
2 20 2 bob 12
3 10 3 charlie 1
4 100 3 david 2
我想将其拆分为具有以下属性的2个不同数据帧
数据帧1:
column_A column_B column_C id
0 1 1 anna 123
1 2 1 anna 7
2 30 2 bob 42
2 20 2 bob 12
其中,列B和列C中的两个值都匹配
数据帧2:
column_A column_B column_C id
3 10 3 charlie 1
4 100 3 david 2
如果列中只有值匹配,您可以检查重复项
In [200]: dfs = {i: n for i, n in df.groupby(
df.duplicated(subset=['column_B', 'column_C'], keep=False))}
In [201]: dfs[True]
Out[201]:
column_A column_B column_C id
0 1 1 anna 123
1 2 1 anna 7
2 30 2 bob 42
2 20 2 bob 12
In [202]: dfs[False]
Out[202]:
column_A column_B column_C id
3 10 3 charlie 1
4 100 3 david 2
你可以检查是否有重复的
In [200]: dfs = {i: n for i, n in df.groupby(
df.duplicated(subset=['column_B', 'column_C'], keep=False))}
In [201]: dfs[True]
Out[201]:
column_A column_B column_C id
0 1 1 anna 123
1 2 1 anna 7
2 30 2 bob 42
2 20 2 bob 12
In [202]: dfs[False]
Out[202]:
column_A column_B column_C id
3 10 3 charlie 1
4 100 3 david 2
要保留不重复的记录(第5行和第6行),请使用
drop_duplicates
功能:
dfA = df.drop_duplicates(subset = ['column_B', 'column_C'], keep = False)
输出:
column_A column_B column_C column_D
4 10 3 charlie 1
5 100 3 davis 2
column_A column_B column_C column_D
0 1 1 anna 123
1 2 1 anna 7
2 30 2 bob 42
3 20 2 bob 12
要保留重复记录(第1行至第4行),请使用duplicated
功能:
dfB = df[df.duplicated(subset = ['column_B', 'column_C'], keep = False)]
输出:
column_A column_B column_C column_D
4 10 3 charlie 1
5 100 3 davis 2
column_A column_B column_C column_D
0 1 1 anna 123
1 2 1 anna 7
2 30 2 bob 42
3 20 2 bob 12
要保留不重复的记录(第5行和第6行),请使用
drop_duplicates
功能:
dfA = df.drop_duplicates(subset = ['column_B', 'column_C'], keep = False)
输出:
column_A column_B column_C column_D
4 10 3 charlie 1
5 100 3 davis 2
column_A column_B column_C column_D
0 1 1 anna 123
1 2 1 anna 7
2 30 2 bob 42
3 20 2 bob 12
要保留重复记录(第1行至第4行),请使用duplicated
功能:
dfB = df[df.duplicated(subset = ['column_B', 'column_C'], keep = False)]
输出:
column_A column_B column_C column_D
4 10 3 charlie 1
5 100 3 davis 2
column_A column_B column_C column_D
0 1 1 anna 123
1 2 1 anna 7
2 30 2 bob 42
3 20 2 bob 12