Python CSV比较_Python_Csv_Comparison

Python CSV比较

python csv

Python CSV比较,python,csv,comparison,Python,Csv,Comparison,此脚本比较两个csv文件…具有两列。如果sample1.csv和sample2.csv的列数超过2列或1列，请帮助我修改此脚本 f1_in = open("sample1.csv","r") next(f1_in,None) f1_dict = {} for line in f1_in: l = line.split(',') f1_dict[l[0]. strip()] = l[1]. strip() l.sort() f1_in.close() f2_in = open("s

此脚本比较两个csv文件…具有两列。如果sample1.csv和sample2.csv的列数超过2列或1列，请帮助我修改此脚本

f1_in = open("sample1.csv","r")
next(f1_in,None)
f1_dict = {}
for line in f1_in:
  l = line.split(',')
  f1_dict[l[0]. strip()] = l[1]. strip() 
  l.sort()
f1_in.close()

f2_in = open("sample2.csv","r")
next(f2_in,None)
f2_dict = {}
for line in f2_in:
  l = line.split(',')
  f2_dict[l[0]. strip()] = l[1]. strip()
  l.sort()
f2_in.close()


f_same = open("same.txt","w")
f_different = open("different.txt","w")

for k1 in f1_dict.keys():
  if k1 in f2_dict.keys() \
      and f2_dict[k1] == f1_dict[k1]:
    f_same.write("{0}, {1}\n". format(str(k1)+" "+str(f1_dict[k1]),
                                    str(k1)+" "+str(f2_dict[k1])))

  elif not k1 in f2_dict.keys():
    f_different.write("{0}, {1}\n". format(str(k1)+" "+str(f1_dict[k1]),
                                           "------"))
  elif not f2_dict[k1] == f1_dict[k1]:
    f_different.write("{0}, {1}\n". format(str(k1)+" "+str(f1_dict[k1]),
                                           str(k1)+" "+str(f2_dict[k1])))

f_same.close()
f_different.close()

例如：如果我的源文件以名称和薪水作为标题，值为A 20000 B 15000 C 10000 D 10000，目标文件也以名称和薪水作为标题，值为A 40000 D 10000 B 15000 C 10000 E 8000…我的输出应该是不同的行：A 20000 A 40000 D 10000-----（目标中没有文件）--（源中没有文件）E 8000和公共行，如B 15000 B 15000，C 10000 C 10000如果将列视为字典中的键/值对，则不能将代码扩展到两列以上也就不足为奇了

您必须将它们视为“集合中的元素”。我理解这就是为什么您不使用

csv

模块或

difflib

模块的原因：因为您不关心行在两个文件中是否（几乎）以相同的顺序出现，而是关心它们是否出现

以下是一个例子：

import itertools


def compare(first_filename, second_filename):
    lines1 = set()
    lines2 = set()
    with open(first_filename, 'r') as file1, \
            open(second_filename, 'r') as file2:
        for line1, line2 in itertools.izip_longest(file1, file2):
            if line1:
                lines1.add(line1)
            if line2:
                lines2.add(line2)
    print "Different lines"
    for line in lines1 ^ lines2:
        print line,
    print "---"
    print "Common lines"
    for line in lines1 & lines2:
        print line,

请注意，这段代码将在两个文件上找到差异，而不仅仅是f1上存在的内容，而不是f2上存在的内容，就像您的示例所做的那样。然而，它无法判断差异来自何处（因为这似乎不是问题的要求）

检查它是否工作

那么在这些情况下你有什么问题？您是否收到错误或意外输出？到目前为止，您尝试了什么使代码更通用？Hy…jornshape如果我在源代码和目标代码中只有两列要比较，那么我得到了正确的结果…如果有一列和两列以上也有…itz只需要两列进行比较…那么，再次；到目前为止你试过什么？您认为代码的哪些位与列数有关？你认为什么样的数据结构适合处理任意数量的列？我对python非常陌生，甚至这段代码也是我从stackoverflow获得的…我猜最后一部分是从f1_dict.keys（）中的k1开始进行比较的：是应该进行更改以读取n个列的位置…我建议您花更多的精力理解您现在拥有的代码。输入一些

print

s，弄清楚发生了什么，然后你就可以知道如何修改它了。这不是代码编写服务。logc…我们应该在哪里输入文件名？@user3514648:我不确定我是否理解你的问题。如果要比较的CSV文件名为“sample1.CSV”和“sample2.CSV”，则在同一目录中打开Python控制台，将上面的代码片段剪切并粘贴到控制台，然后写入

compare（“sample1.CSV”、“sample2.CSV”）

。另一种方法是：在module

example.py

中不缩进地编写前面提到的代码段和行，然后在文件所在的同一目录中运行

python example.py

。谢谢…itz工作正常，但不是以我想要的方式运行。例如：如果我的源文件的标题为Name和Salary，其值为A 20000 B 15000 C 10000 D 10000，目标文件的名称和Salary的标题为A 40000 D 10000 B 15000 C 10000 E 8000…我的输出应该是不同的行：A 20000 A 40000 D 10000-----（目标中没有文件）--（源中没有文件）E 8000和公共行，如B 15000 B 15000，C 10000 C10000@user3514648例如我认为你应该把这些信息写在问题中。这将帮助除我之外的其他人帮助你。对不起…你能帮我吗？

In [40]: !cat sample1.csv
bacon, eggs, mortar
whatever, however, whenever
spam, spam, spam

In [41]: !cat sample2.csv
guido, van, rossum
spam, spam, spam

In [42]: compare("sample1.csv", "sample2.csv")
Different lines
whatever, however, whenever
guido, van, rossum
bacon, eggs, mortar
---
Common lines
spam, spam, spam