Python 比较两个csv文件中的内容
所以我有两个csv文件Python 比较两个csv文件中的内容,python,csv,Python,Csv,所以我有两个csv文件Book1.csv比相似性.csv有更多的数据,因此我想在Book1.csv中拉出在相似性.csv中不出现的行 with open('Book1.csv', 'rb') as csvMasterForDiff: with open('similarities.csv', 'rb') as csvSlaveForDiff: masterReaderDiff = csv.reader(csvMasterForDiff)
Book1.csv
比相似性.csv
有更多的数据,因此我想在Book1.csv
中拉出在相似性.csv中不出现的行
with open('Book1.csv', 'rb') as csvMasterForDiff:
with open('similarities.csv', 'rb') as csvSlaveForDiff:
masterReaderDiff = csv.reader(csvMasterForDiff)
slaveReaderDiff = csv.reader(csvSlaveForDiff)
testNotInCount = 0
testInCount = 0
for row in masterReaderDiff:
if row not in slaveReaderDiff:
testNotInCount = testNotInCount + 1
else :
testInCount = testInCount + 1
print('Not in file: '+ str(testNotInCount))
print('Exists in file: '+ str(testInCount))
然而,结果是
Not in file: 2093
Exists in file: 0
我知道这是不正确的,因为Book1.csv
中至少前16个条目不存在于complications.csv
中,而不是所有条目。我做错了什么?一个csv.reader
对象是一个迭代器,这意味着您只能对它进行一次迭代。您应该使用列表/集合进行安全壳检查,例如:
slave_rows = set(slaveReaderDiff)
for row in masterReaderDiff:
if row not in slave_rows:
testNotInCount += 1
else:
testInCount += 1
将其转换为集合
后,无需编写大量代码,即可执行大量集合
相关和有用的操作
slave_rows = set(slaveReaderDiff)
master_rows = set(masterReaderDiff)
master_minus_slave_rows = master_rows - slave_rows
common_rows = master_rows & slave_rows
print('Not in file: '+ str(len(master_minus_slave_rows)))
print('Exists in file: '+ str(len(common_rows)))
这里有很多你可以做的事情