Python 在两个csv文件中使用readlines并以某种方式跳过比较中的第三列
Old.csv:Python 在两个csv文件中使用readlines并以某种方式跳过比较中的第三列,python,csv,Python,Csv,Old.csv: name,department leona,IT name,department,timestamp leona,IT,07/20/2020 <--- Existing value lewis,Tax,08/25/2020 <--- New value from New.csv New.csv: name,department leona,IT lewis,Tax name,department,timestamp leona,IT,07
name,department
leona,IT
name,department,timestamp
leona,IT,07/20/2020 <--- Existing value
lewis,Tax,08/25/2020 <--- New value from New.csv
New.csv:
name,department
leona,IT
lewis,Tax
name,department,timestamp
leona,IT,07/20/2020
leona,IT,07/25/2020
lewis,Tax,08/25/2020
使用相同的两列,从new.csv中查找新值并使用这些值更新Old.csv,使用下面的代码可以很好地工作
feed = []
headers = []
with open("Old.csv", 'r') as t1, open("New.csv", 'r') as t2:
for header in t1.readline().split(','):
headers.append(header.rstrip())
fileone = t1.readlines()
filetwo = t2.readlines()[1:] # Skip csv fieldnames
for line in filetwo:
if line not in fileone:
lineItems = {}
feed.append(line.strip()) # For old file update
新问题:
1/添加第三列以存储时间戳值
2/跳过两个文件中的第3列(时间戳),仍然需要根据第1列和第2列比较两个文件的差异
3/旧文件将使用所有3列上的新值进行更新
我尝试了切片方法split(“,”)[0:2],但似乎根本不起作用。我觉得对现有代码只进行了一些小的更新,但不确定如何实现这一点
预期成果:
Old.csv:
name,department
leona,IT
name,department,timestamp
leona,IT,07/20/2020 <--- Existing value
lewis,Tax,08/25/2020 <--- New value from New.csv
您可以自己完成,但是为什么不使用Python内置的工具呢
from csv import reader
feed = []
with open('Old.csv', 'r') as t1, open('New.csv', 'r') as t2:
old = reader(t1)
new = reader(t2)
headers = next(old)
# skip header in new
next(new)
# relevant data is only the first two columns
old_data = [rec[:2] for rec in old]
for rec in new:
if rec[:2] not in old_data:
feed.append(rec)
print(headers)
print(feed)
结果:
['name', 'department']
[['lewis', 'Tax']]
请注意,您将使用提供的数据获得此结果,但如果添加第三列,代码仍会按预期工作,并将该数据添加到提要
结果中
要使提要成为字典列表(您可以轻松地将其转换为JSON),可以执行以下操作:
feed.append(dict(zip(headers, rec)))
将提要转换为json非常简单:
import json
print(json.dumps(feed))
整个解决方案:
import json
from csv import reader
feed = []
with open('Old.csv', 'r') as t1, open('New.csv', 'r') as t2:
old = reader(t1)
new = reader(t2)
headers = next(old)
# skip header in new
next(new)
# relevant data is only the first two columns
old_data = [rec[:2] for rec in old]
for rec in new:
if rec[:2] not in old_data:
feed.append(dict(zip(headers, rec)))
print(json.dumps(feed))
输出如下:
[{"name": "lewis", "department": "Tax", "timestamp": "08/25/2020"}]
这是有效的。老实说,我以前从未用过“阅读器”。还有一件事,也许我在这里把事情复杂化了,但是有没有一种方法也可以用json输出结果呢。我在zip中尝试了[code>jdata=[{'name':I,'department':j,'timestamp':k}(rec[::3],rec[1::3],rec[2::3]),但它只显示了
[{'name':'lewis','department':'Tax','timestamp':'8/25/2020'}
而不是[{'name':'jessica','department':'it','timestamp':'8/15/2020'},{'name':'lewis','department':'Tax','timestamp':'8/25/2020'}]
。我猜这与双方括号有关……谢谢格里斯玛。