Python 比较2个数据帧列并删除行

Python 比较2个数据帧列并删除行,python,pandas,data-science,string-comparison,data-processing,Python,Pandas,Data Science,String Comparison,Data Processing,我有两个不同长度的数据帧。我想比较并删除df1中不可用的值箭头 以下是一个例子: df1 = pd.DataFrame({'Filename':['image1','image1','image2','image3'], 'Name':['Dog','Cat','Cat', 'Cat'], 'values':['2','3','4','5'] }) df2 = pd.DataFrame({'Filename'

我有两个不同长度的数据帧。我想比较并删除df1中不可用的值箭头

以下是一个例子:

df1 = pd.DataFrame({'Filename':['image1','image1','image2','image3'], 
                    'Name':['Dog','Cat','Cat', 'Cat'],
                     'values':['2','3','4','5']  })

df2 = pd.DataFrame({'Filename':['image1','image2','image3'], 
                    'Name':['Dog','Cat', 'Cat'],
                     'values':['5','6','7']  })
df1

df2

我希望有两个dataframesdf1和df2,它们的长度和文件名与下面相同。我的目标是比较具有相同文件名和名称的df1和df2的值列

df1

df2

我尝试将每一行与相应的df进行比较,如果不可用,则删除。这显然不是办法

for i, j in df1.iterrows():
    for m, n in df1.iterrows():
        if m['Filename'] == i['Filename']:
            if m['LabelName'] == i['LabelName']:
                pass
            else:
                print('delete')
                df2=df2.drop(i)
                df1=df1.sort_values('Filename')
                df2=df2.sort_values('Filename')
                
            break
我还尝试实现groupby和compare with rows,但遇到了ValueError:由于索引不同,因此只能比较标签相同的系列对象


有人能帮我吗?我试图寻找类似的问题,但没有遇到任何问题。

嘿,我认为这项工作做得很好

df3 = df2.set_index('Filename')
df1[df1.apply(lambda x : df3.loc[x.Filename]['Name']== x.Name , axis =1 ) ]
如果您想丢失索引并重置它,可以添加

df3 = df2.set_index('Filename')
df1[df1.apply(lambda x : df3.loc[x.Filename]['Name']== x.Name , axis =1 )  ].reset_index().drop('index' , axis=1) 

这是一个不太像python的解决方案,但它确实起到了作用:

l1=[(df1.Filename.iloc[i],df1.Name.iloc[i]) for i in range(len(df1))]
l2=[(df2.Filename.iloc[i],df2.Name.iloc[i]) for i in range(len(df2))]
lfin=[i for i in l1 if i in l2]
   
for i in df1.index:
    if (df1.Filename.loc[i], df1.Name.loc[i]) not in lfin:
        df1.drop(i, inplace=True)

for i in df2.index:
    if (df2.Filename.loc[i], df2.Name.loc[i]) not in lfin:
        df2.drop(i, inplace=True)

您的预期输出是什么?++添加了更多您应该使用的描述。类似于df1.mergedf2的东西,on=['Filename','Name'],应该如何运行。然而,你的问题肯定是重复的,你可以在别处找到答案:我觉得自己很愚蠢。谢谢你,谢谢你。我想我在评论部分找到了答案
for i, j in df1.iterrows():
    for m, n in df1.iterrows():
        if m['Filename'] == i['Filename']:
            if m['LabelName'] == i['LabelName']:
                pass
            else:
                print('delete')
                df2=df2.drop(i)
                df1=df1.sort_values('Filename')
                df2=df2.sort_values('Filename')
                
            break
df3 = df2.set_index('Filename')
df1[df1.apply(lambda x : df3.loc[x.Filename]['Name']== x.Name , axis =1 ) ]
df3 = df2.set_index('Filename')
df1[df1.apply(lambda x : df3.loc[x.Filename]['Name']== x.Name , axis =1 )  ].reset_index().drop('index' , axis=1) 
l1=[(df1.Filename.iloc[i],df1.Name.iloc[i]) for i in range(len(df1))]
l2=[(df2.Filename.iloc[i],df2.Name.iloc[i]) for i in range(len(df2))]
lfin=[i for i in l1 if i in l2]
   
for i in df1.index:
    if (df1.Filename.loc[i], df1.Name.loc[i]) not in lfin:
        df1.drop(i, inplace=True)

for i in df2.index:
    if (df2.Filename.loc[i], df2.Name.loc[i]) not in lfin:
        df2.drop(i, inplace=True)