Python 在两个csv文件中使用readlines并以某种方式跳过比较中的第三列

Python 在两个csv文件中使用readlines并以某种方式跳过比较中的第三列,python,csv,Python,Csv,Old.csv: name,department leona,IT name,department,timestamp leona,IT,07/20/2020 <--- Existing value lewis,Tax,08/25/2020 <--- New value from New.csv New.csv: name,department leona,IT lewis,Tax name,department,timestamp leona,IT,07

Old.csv:

name,department
leona,IT
name,department,timestamp
leona,IT,07/20/2020       <--- Existing value
lewis,Tax,08/25/2020      <--- New value from New.csv
New.csv:

name,department
leona,IT
lewis,Tax
name,department,timestamp
leona,IT,07/20/2020
leona,IT,07/25/2020
lewis,Tax,08/25/2020
使用相同的两列,从new.csv中查找新值并使用这些值更新Old.csv,使用下面的代码可以很好地工作

feed = []
headers = []
   

with open("Old.csv", 'r') as t1, open("New.csv", 'r') as t2:
        

for header in t1.readline().split(','):
    headers.append(header.rstrip())

fileone = t1.readlines()
filetwo = t2.readlines()[1:]  # Skip csv fieldnames

for line in filetwo:

    if line not in fileone:
        
        lineItems = {}
        feed.append(line.strip())  # For old file update
        
新问题:

1/添加第三列以存储时间戳值

2/跳过两个文件中的第3列(时间戳),仍然需要根据第1列和第2列比较两个文件的差异

3/旧文件将使用所有3列上的新值进行更新

我尝试了切片方法split(“,”)[0:2],但似乎根本不起作用。我觉得对现有代码只进行了一些小的更新,但不确定如何实现这一点

预期成果:

Old.csv:

name,department
leona,IT
name,department,timestamp
leona,IT,07/20/2020       <--- Existing value
lewis,Tax,08/25/2020      <--- New value from New.csv

您可以自己完成,但是为什么不使用Python内置的工具呢

from csv import reader

feed = []

with open('Old.csv', 'r') as t1, open('New.csv', 'r') as t2:
    old = reader(t1)
    new = reader(t2)
    headers = next(old)
    # skip header in new
    next(new)

    # relevant data is only the first two columns
    old_data = [rec[:2] for rec in old]

    for rec in new:
        if rec[:2] not in old_data:
            feed.append(rec)

print(headers)
print(feed)
结果:

['name', 'department']
[['lewis', 'Tax']]
请注意,您将使用提供的数据获得此结果,但如果添加第三列,代码仍会按预期工作,并将该数据添加到
提要
结果中

要使提要成为字典列表(您可以轻松地将其转换为JSON),可以执行以下操作:

feed.append(dict(zip(headers, rec)))
将提要转换为json非常简单:

import json

print(json.dumps(feed))
整个解决方案:

import json
from csv import reader

feed = []

with open('Old.csv', 'r') as t1, open('New.csv', 'r') as t2:
    old = reader(t1)
    new = reader(t2)
    headers = next(old)
    # skip header in new
    next(new)

    # relevant data is only the first two columns
    old_data = [rec[:2] for rec in old]

    for rec in new:
        if rec[:2] not in old_data:
            feed.append(dict(zip(headers, rec)))

print(json.dumps(feed))
输出如下:

[{"name": "lewis", "department": "Tax", "timestamp": "08/25/2020"}]

这是有效的。老实说,我以前从未用过“阅读器”。还有一件事,也许我在这里把事情复杂化了,但是有没有一种方法也可以用json输出结果呢。我在zip中尝试了[code>jdata=[{'name':I,'department':j,'timestamp':k}(rec[::3],rec[1::3],rec[2::3]),但它只显示了
[{'name':'lewis','department':'Tax','timestamp':'8/25/2020'}
而不是
[{'name':'jessica','department':'it','timestamp':'8/15/2020'},{'name':'lewis','department':'Tax','timestamp':'8/25/2020'}]
。我猜这与双方括号有关……谢谢格里斯玛。