Python 比较两个csv文件和输出是否存在差异?
我正在比较两个csv文件,但是Python 比较两个csv文件和输出是否存在差异?,python,csv,difference,Python,Csv,Difference,我正在比较两个csv文件,但是update.csv文件与new.csv import csv with open('old.csv', 'r') as t1: old_csv = t1.readlines() with open('new.csv', 'r') as t2: new_csv = t2.readlines() with open('update.csv', 'w') as out_file: line_in_new = 0 li
update.csv
文件与new.csv
import csv
with open('old.csv', 'r') as t1:
old_csv = t1.readlines()
with open('new.csv', 'r') as t2:
new_csv = t2.readlines()
with open('update.csv', 'w') as out_file:
line_in_new = 0
line_in_old = 0
while line_in_new < len(new_csv) and line_in_old < len(old_csv):
if old_csv[line_in_old] != new_csv[line_in_new]:
out_file.write(new_csv[line_in_new])
else:
line_in_old += 1
line_in_new += 1
new.csv
a,b,c
1,2,3
5,6,7
8,9,7
输出:
update.csv
4,5,6,deleted
5,6,7,new added
8,9,9,change
请帮助我获得使用pandas的解决方案的唯一区别:
import pandas as pd
df1 = pd.read_csv('old.csv')
df2 = pd.read_csv('new.csv')
df1['flag'] = 'old'
df2['flag'] = 'new'
df = pd.concat([df1, df2])
dups_dropped = df.drop_duplicates(df.columns.difference(['flag']), keep=False)
dups_dropped.to_csv('update.csv', index=False)
输入:
old.csv
a,b,c
1,2,3
4,5,6
a,b,c
1,2,3
5,6,7
a,b,c,flag
4,5,6,old
5,6,7,new
new.csv
a,b,c
1,2,3
4,5,6
a,b,c
1,2,3
5,6,7
a,b,c,flag
4,5,6,old
5,6,7,new
输出:
update.csv
a,b,c
1,2,3
4,5,6
a,b,c
1,2,3
5,6,7
a,b,c,flag
4,5,6,old
5,6,7,new
使用熊猫的解决方案:
import pandas as pd
df1 = pd.read_csv('old.csv')
df2 = pd.read_csv('new.csv')
df1['flag'] = 'old'
df2['flag'] = 'new'
df = pd.concat([df1, df2])
dups_dropped = df.drop_duplicates(df.columns.difference(['flag']), keep=False)
dups_dropped.to_csv('update.csv', index=False)
输入:
old.csv
a,b,c
1,2,3
4,5,6
a,b,c
1,2,3
5,6,7
a,b,c,flag
4,5,6,old
5,6,7,new
new.csv
a,b,c
1,2,3
4,5,6
a,b,c
1,2,3
5,6,7
a,b,c,flag
4,5,6,old
5,6,7,new
输出:
update.csv
a,b,c
1,2,3
4,5,6
a,b,c
1,2,3
5,6,7
a,b,c,flag
4,5,6,old
5,6,7,new
你所说的差异是什么意思?请张贴清晰的输入样本和期望的输出。你所说的差异是什么意思?请发布一个清晰的输入示例和所需的输出。谢谢,Ashish,如果我只想显示差异意味着旧的和新的。如何获取Hanks,Ashish,如果我只想显示差异意味着旧的和新的。如何获取