Python 比较两个csv文件中的内容

Python 比较两个csv文件中的内容,python,csv,Python,Csv,所以我有两个csv文件Book1.csv比相似性.csv有更多的数据,因此我想在Book1.csv中拉出在相似性.csv中不出现的行 with open('Book1.csv', 'rb') as csvMasterForDiff: with open('similarities.csv', 'rb') as csvSlaveForDiff: masterReaderDiff = csv.reader(csvMasterForDiff)

所以我有两个csv文件
Book1.csv
相似性.csv
有更多的数据,因此我想在
Book1.csv
中拉出
相似性.csv中不出现的行

    with open('Book1.csv', 'rb') as csvMasterForDiff:
        with open('similarities.csv', 'rb') as csvSlaveForDiff:
            masterReaderDiff = csv.reader(csvMasterForDiff)
            slaveReaderDiff = csv.reader(csvSlaveForDiff)        

            testNotInCount = 0
            testInCount = 0
            for row in masterReaderDiff:
                if row not in slaveReaderDiff:
                    testNotInCount = testNotInCount + 1
                else :
                    testInCount = testInCount + 1


print('Not in file: '+ str(testNotInCount))
print('Exists in file: '+ str(testInCount))
然而,结果是

Not in file: 2093
Exists in file: 0

我知道这是不正确的,因为
Book1.csv
中至少前16个条目不存在于
complications.csv
中,而不是所有条目。我做错了什么?

一个
csv.reader
对象是一个迭代器,这意味着您只能对它进行一次迭代。您应该使用列表/集合进行安全壳检查,例如:

slave_rows = set(slaveReaderDiff)

for row in masterReaderDiff:
    if row not in slave_rows:
        testNotInCount += 1
    else:
        testInCount += 1

将其转换为
集合
后,无需编写大量代码,即可执行大量
集合
相关和有用的操作

slave_rows = set(slaveReaderDiff)
master_rows = set(masterReaderDiff)

master_minus_slave_rows = master_rows - slave_rows
common_rows = master_rows & slave_rows

print('Not in file: '+ str(len(master_minus_slave_rows)))
print('Exists in file: '+ str(len(common_rows)))
这里有很多你可以做的事情