如何在python中比较两个不同大小的数据帧中的值对？_Python_Pandas_Dataframe_Compare

如何在python中比较两个不同大小的数据帧中的值对？

python pandas dataframe

如何在python中比较两个不同大小的数据帧中的值对？,python,pandas,dataframe,compare,Python,Pandas,Dataframe,Compare,我有两个不同大小的数据帧：包含列“ConceptID1”和“ConceptID2”的sdfn 列为“Gene1”和“Gene2”的jdfn 通过比较两个数据帧，我需要找到匹配对我试过这个 for index, row in sdfn.iterrows(): for index, row in jdfn.iterrows(): if ((sdfn['ConceptID1']==jdfn['Gene1']) and (sdfn['ConceptID2']==jdfn['G

我有两个不同大小的数据帧：

包含列“ConceptID1”和“ConceptID2”的sdfn

列为“Gene1”和“Gene2”的jdfn

通过比较两个数据帧，我需要找到匹配对

我试过这个

for index, row in sdfn.iterrows():
    for index, row in jdfn.iterrows():
        if ((sdfn['ConceptID1']==jdfn['Gene1']) and (sdfn['ConceptID2']==jdfn['Gene2'])) or (sdfn['ConceptID1']==jdfn['Gene2']) and ((sdfn['ConceptID2']==jdfn['Gene1'])):
            print(sdfn['ConceptID1'], jdfn['Gene1'], sdfn['ConceptID2'], jdfn['Gene2'])

结果是：

回溯（最近一次呼叫最后一次）：

文件“”，第3行，在

if ((sdfn['ConceptID1']==jdfn['Gene1']) and (sdfn['ConceptID2']==jdfn['Gene2'])) or

（sdfn['ConceptID1']==jdfn['Gene2']）和（（sdfn['ConceptID2']==jdfn['Gene1']）：文件 “/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site packages/pandas/core/ops/init.py”，第1142行，在包装器中 raise VALUERROR（“只能比较相同标记的”“系列对象”）

ValueError：只能比较标签相同的系列对象

这里的问题是，您没有正确地为循环变量使用或命名

，并试图直接比较每个数据帧列的整体
sdfn['ConceptID1']
，sdfn['ConceptID2']
，jdfn['Gene1']
，jdfn['Gene2']

将引用整个dataframe列，该列定义为Series
类型对象，因此在错误消息中提到了Series
标签不匹配
您需要首先为循环变量重命名，然后在搜索中使用它们：
for sind, srow in sdfn.iterrows():
    for jind, jrow in jdfn.iterrows():
        if ((srow['ConceptID1']==jrow['Gene1']) and (srow['ConceptID2']==jrow['Gene2'])) or (srow['ConceptID1']==jrow['Gene2']) and ((srow['ConceptID2']==jrow['Gene1'])):
            print(srow['ConceptID1'], jrow['Gene1'], srow['ConceptID2'], jrow['Gene2'])

请注意，在发布的代码中，索引
和行
变量在外循环中声明和赋值，但在内循环中修改。因此，没有两对循环变量，只有一对变量被递增和覆盖，因此无法比较适当的数据
希望这有帮助
 这里的问题是，您没有正确地为
循环变量使用或命名，并试图直接比较每个数据帧列的整体
sdfn['ConceptID1']
，sdfn['ConceptID2']
，jdfn['Gene1']
，jdfn['Gene2']

将引用整个dataframe列，该列定义为Series
类型对象，因此在错误消息中提到了Series
标签不匹配
您需要首先为循环变量重命名，然后在搜索中使用它们：
for sind, srow in sdfn.iterrows():
    for jind, jrow in jdfn.iterrows():
        if ((srow['ConceptID1']==jrow['Gene1']) and (srow['ConceptID2']==jrow['Gene2'])) or (srow['ConceptID1']==jrow['Gene2']) and ((srow['ConceptID2']==jrow['Gene1'])):
            print(srow['ConceptID1'], jrow['Gene1'], srow['ConceptID2'], jrow['Gene2'])

请注意，在发布的代码中，索引
和行
变量在外循环中声明和赋值，但在内循环中修改。因此，没有两对循环变量，只有一对变量被递增和覆盖，因此无法比较适当的数据
希望这有帮助
for sind, srow in sdfn.iterrows():
    for jind, jrow in jdfn.iterrows():
        if ((srow['ConceptID1']==jrow['Gene1']) and (srow['ConceptID2']==jrow['Gene2'])) or (srow['ConceptID1']==jrow['Gene2']) and ((srow['ConceptID2']==jrow['Gene1'])):
            print(srow['ConceptID1'], jrow['Gene1'], srow['ConceptID2'], jrow['Gene2'])