Python 如何从file1中搜索特定字符串并更新csv文件_Python

Python 如何从file1中搜索特定字符串并更新csv文件

python

Python 如何从file1中搜索特定字符串并更新csv文件,python,Python,我有两个非常大的文件： File1 is formatted as such: thisismy@email.com:20110708 thisisnotmy@email.com:20110908 thisisyour@email.com:20090807 ... File2是一个csv文件，在第[0]行字段中有相同的电子邮件地址，我需要将日期输入第[5]行字段我了解如何正确读取和解析csv，以及如何读取文件1并正确剪切它我需要的帮助是如何正确搜索CSV文件中的任何电子邮件地址实例，并用相

我有两个非常大的文件：

File1 is formatted as such:
thisismy@email.com:20110708
thisisnotmy@email.com:20110908
thisisyour@email.com:20090807
...

File2是一个csv文件，在第[0]行字段中有相同的电子邮件地址，我需要将日期输入第[5]行字段

我了解如何正确读取和解析csv，以及如何读取文件1并正确剪切它

我需要的帮助是如何正确搜索CSV文件中的任何电子邮件地址实例，并用相应的日期更新CSV

谢谢您的帮助。

您可能需要使用模块

re

：：

import re
emails = re.findall(r'^(.*\@.*?):', open('filename.csv').read())

这将为您提供所有电子邮件。

如果您必须替换的数据具有固定大小，在您的示例中似乎就是这样。你可以用。在读取文件查找值时，获取光标位置并从所需位置写入替换数据

Cf:

但是，如果您正在处理超大文件，使用命令行工具（如

sed

）可以节省大量处理时间。

下面的示例在Python 2.7上测试：

import csv

# 'b' flag for binary is necessary if on Windows otherwise crlf hilarity ensues
with open('/path/to/file1.txt','rb') as fin:
  csv_reader = csv.reader(fin, delimiter=":")
  # Header in line 1? Skip over. Otherwise no need for next line.
  csv_reader.next() 
  # populate dict with email address as key and date as value
  # dictionary comprehensions supported in 2.7+
  # on a lower version? use: d = dict((line[0],line[1]) for line in csv_reader)
  email_address_dict = {line[0]: line[1] for line in csv_reader}

# there are ways to modify a file in-place
# but it's easier to write to a new file 
with open('/path/to/file2.txt','rb') as fin, \
     open('/path/to/file3.txt','wb') as fou:
  csv_reader = csv.reader(fin, delimiter=":")
  csv_writer = csv.writer(fou, delimiter=":")
  # Header in line 1? Skip over. Otherwise no need for next line.
  csv_writer.writerow( csv_reader.next() ) 
  for line in csv_reader:
    # construct new line 
    # looking up date value in just-created dict
    # the new date value is inserted position 5 (zero-based)
    newline = line[0:5]
    newline.append(email_address_dict[line[0]])
    newline.extend(line[6:])
    csv_writer.writerow(newline)

尝试python CSV模块：多大是非常大？对于50万条或更多的记录，您应该能够做得很好，只需使用一个普通的旧

dict

，csv readerI可能没有描述我打算正确地做什么。我在排队阅读，并将电子邮件地址拆分出来。然后我需要在CSV文件中搜索电子邮件地址的任何实例，并更新日期字段。问题是我正在处理巨大的文件，并在寻找最好的方法。你能先用字典将文件加载到内存中吗？然后更新字典并重建CSV文件？CSV的文件大小是多少？如果需要这样的查询，最好将它们导入SQL。CSV大约为150mb。（它是巨大的）并且带有email/date的文件大约是8mb（也相当大），在这种情况下使用数据库更合适。尝试将文件加载到中。这将更快更容易。：-）