Python 如果在另一个数据帧中找到某行的列值，则从该数据帧中删除该行_Python_Pandas_Dataframe

Python 如果在另一个数据帧中找到某行的列值，则从该数据帧中删除该行

python pandas dataframe

Python 如果在另一个数据帧中找到某行的列值，则从该数据帧中删除该行,python,pandas,dataframe,Python,Pandas,Dataframe,给定上述两个数据帧，我想做以下操作：如果在df2中可以找到来自df1的凭证，并且它们对应的单位相同，那么从df1中删除整个凭证行因此，在这种情况下，期望输出为： df1 = { 'vouchers': [100, 200, 300, 400], 'units': [11, 12, 12, 13], 'some_other_data': ['a', 'b', 'c', 'd'], } df2 = { 'vouchers': [500, 200, 600,

给定上述两个数据帧，我想做以下操作：如果在

df2

中可以找到来自

df1

的凭证，并且它们对应的单位相同，那么从

df1

中删除整个凭证行

因此，在这种情况下，期望输出为：

df1 = {
    'vouchers': [100, 200, 300, 400],
    'units': [11, 12, 12, 13],
    'some_other_data': ['a', 'b', 'c', 'd'],
    }
df2 = {
    'vouchers': [500, 200, 600, 300],
    'units': [11, 12, 12, 13],
    'some_other_data': ['b', 'd', 'c', 'a'],
    }

实现这一点的最佳方法是什么？

在我们获得需要删除的

索引后，使用删除
df1 = {
    'vouchers': [100, 300, 400],
    'units': [11, 12, 13],
    'some_other_data': ['a', 'c', 'd'],
    }

使用pd.index.isin
，您可以通过索引操作有效地实现这一点：
idx=df1.merge(df2,on=['vouchers','units'],indicator=True,how='left').\
     loc[lambda x : x['_merge']=='both'].index
df1=df1.drop(idx,axis=0)
df1
Out[374]: 
   vouchers  units some_other_data
0       100     11               a
2       300     12               c
3       400     13               d

我的解决方案：
u = df1.set_index(['vouchers', 'units'])
df1[~u.index.isin(pd.MultiIndex.from_arrays([df2.vouchers, df2.units]))]

   vouchers  units some_other_data
0       100     11               a
2       300     12               c
3       400     13               d

一种可能性是通过：
试试这个，很简单：
df = pd.concat([df1, df2], ignore_index=True)
df = df.loc[~df.duplicated(subset=['vouchers', 'units'], keep=False)]
df = df.reindex(df.index & df1.index)

print(df)

#   some_other_data  units  vouchers
# 0               a     11       100
# 2               c     12       300
# 3               d     13       400

虽然我们有很多很好的答案，但这些问题似乎很有趣，因此我非常感兴趣地承认这一点，并希望通过使用布尔表达式放置另一个看起来更简单的版本：
第一个数据帧：
excs = [] #will store the index of the values which are equal

for i, (key, value) in enumerate(zip(df1["vouchers"], df1["units"])):
  for key2, value2 in zip(df2["vouchers"], df2["units"]):
    if key == key2 and value == value2:
      excs.append(i)

for exc in excs:
  del(df1["vouchers"][exc])
  del(df1["units"][exc])

>>> df1
   vouchers  units some_other_data
0       100     11               a
1       200     12               b
2       300     12               c
3       400     13               d

>>> df2
   vouchers  units some_other_data
0       500     11               a
1       200     12               b
2       600     12               c
3       300     13               d

>>> df1.merge(df2, how='outer', indicator=True).query('_merge == "left_only"').drop('_merge', 1)
   vouchers  units some_other_data
0       100     11               a
2       300     12               c
3       400     13               d

第二个数据帧：
excs = [] #will store the index of the values which are equal

for i, (key, value) in enumerate(zip(df1["vouchers"], df1["units"])):
  for key2, value2 in zip(df2["vouchers"], df2["units"]):
    if key == key2 and value == value2:
      excs.append(i)

for exc in excs:
  del(df1["vouchers"][exc])
  del(df1["units"][exc])

>>> df1
   vouchers  units some_other_data
0       100     11               a
1       200     12               b
2       300     12               c
3       400     13               d

>>> df2
   vouchers  units some_other_data
0       500     11               a
1       200     12               b
2       600     12               c
3       300     13               d

>>> df1.merge(df2, how='outer', indicator=True).query('_merge == "left_only"').drop('_merge', 1)
   vouchers  units some_other_data
0       100     11               a
2       300     12               c
3       400     13               d

可能更简单的答案：
excs = [] #will store the index of the values which are equal

for i, (key, value) in enumerate(zip(df1["vouchers"], df1["units"])):
  for key2, value2 in zip(df2["vouchers"], df2["units"]):
    if key == key2 and value == value2:
      excs.append(i)

for exc in excs:
  del(df1["vouchers"][exc])
  del(df1["units"][exc])

>>> df1
   vouchers  units some_other_data
0       100     11               a
1       200     12               b
2       300     12               c
3       400     13               d

>>> df2
   vouchers  units some_other_data
0       500     11               a
1       200     12               b
2       600     12               c
3       300     13               d

>>> df1.merge(df2, how='outer', indicator=True).query('_merge == "left_only"').drop('_merge', 1)
   vouchers  units some_other_data
0       100     11               a
2       300     12               c
3       400     13               d

解决方案2:使用合并
+指标
+查询

>>> df1[(df1 != df2).any(1)]
   vouchers  units some_other_data
0       100     11               a
2       300     12               c
3       400     13               d

解决方案3:
excs = [] #will store the index of the values which are equal

for i, (key, value) in enumerate(zip(df1["vouchers"], df1["units"])):
  for key2, value2 in zip(df2["vouchers"], df2["units"]):
    if key == key2 and value == value2:
      excs.append(i)

for exc in excs:
  del(df1["vouchers"][exc])
  del(df1["units"][exc])

>>> df1
   vouchers  units some_other_data
0       100     11               a
1       200     12               b
2       300     12               c
3       400     13               d

>>> df2
   vouchers  units some_other_data
0       500     11               a
1       200     12               b
2       600     12               c
3       300     13               d

>>> df1.merge(df2, how='outer', indicator=True).query('_merge == "left_only"').drop('_merge', 1)
   vouchers  units some_other_data
0       100     11               a
2       300     12               c
3       400     13               d

我真的很喜欢上一个可读性解决方案，但只有当两个数据帧中的某些其他数据
相同时，它才会起作用。即使在数据不同的情况下，是否可能对其进行调整以使其正常工作？（我也会调整我的问题）@barciewicz，谢谢你喜欢它，但是第一个答案对于提供的数据更可靠，如果你喜欢，你可以随时向上投票：-）