Python 解析非常大的CSV文件。需要将一个字段拆分为许多较小的行&；在每行中保留ID。_Python_Python 3.x_Csv

Python 解析非常大的CSV文件。需要将一个字段拆分为许多较小的行&；在每行中保留ID。

python python-3.x csv

Python 解析非常大的CSV文件。需要将一个字段拆分为许多较小的行&；在每行中保留ID。,python,python-3.x,csv,Python,Python 3.x,Csv,我有一个大的CSV，它由一个“ID”列和一个“历史”列组成 ID很简单，只是一个整数不过，历史记录是一个单元格，由数百个条目组成，这些条目在文本区域中用*注*分隔我想用Python和CSV模块对此进行解析，以读取数据并将其导出为新的CSV，如下所示现有数据结构： ID,History 56457827, "*** NOTE *** 2014-02-25 Long note here. This is just a stand in to give you an idea *** NOT

我有一个大的CSV，它由一个“ID”列和一个“历史”列组成

ID很简单，只是一个整数

不过，历史记录是一个单元格，由数百个条目组成，这些条目在文本区域中用*注*分隔

我想用Python和CSV模块对此进行解析，以读取数据并将其导出为新的CSV，如下所示

现有数据结构：

ID,History

56457827, "*** NOTE ***
2014-02-25
Long note here.  This is just a stand in to give you an idea
*** NOTE ***
2014-02-20
Another example.
This one has carriage returns.

Demonstrates they're all a bit different, though are really just text based"
56457896, "*** NOTE ***
2015-03-26
Another example of a note here.  This is the text portion.
*** NOTE ***
2015-05-24
Another example yet again."

ID, Date, History

56457827, 2014-02-25, "Long note here.  This is just a stand in to give you an idea"
56457827, 2014-02-20, "Another example.
This one has carriage returns.

Demonstrates they're all a bit different, though are really just text based"
56457896, 2015-03-26, "Another example of a note here.  This is the text portion."
56457896, 2015-05-24, "Another example yet again."

所需数据结构：

ID,History

56457827, "*** NOTE ***
2014-02-25
Long note here.  This is just a stand in to give you an idea
*** NOTE ***
2014-02-20
Another example.
This one has carriage returns.

Demonstrates they're all a bit different, though are really just text based"
56457896, "*** NOTE ***
2015-03-26
Another example of a note here.  This is the text portion.
*** NOTE ***
2015-05-24
Another example yet again."

ID, Date, History

56457827, 2014-02-25, "Long note here.  This is just a stand in to give you an idea"
56457827, 2014-02-20, "Another example.
This one has carriage returns.

Demonstrates they're all a bit different, though are really just text based"
56457896, 2015-03-26, "Another example of a note here.  This is the text portion."
56457896, 2015-05-24, "Another example yet again."

所以我需要掌握一些命令。我猜是一个循环，它会带来我可以管理的数据，但我需要分析数据

我相信我需要：

1开始在CSV结构中循环
2记下第一个ID
3在历史记录字段中搜索*注释*
4以某种方式抓住日期字符串并记下来
5将我们在日期字符串之后找到的所有以下字符串数据添加到变量中（我们称之为“HistoryShapper”），直到
6。。。直到我找到下一个*注意*
7从新变量“HistoryShapper”中删除所有*注意*
8将ID和“HistoryShapper”写入新CSV文件中的新行
9重复步骤2-8，直到历史记录字段结束
这个文件大约是5MB。这是最好的方法吗？我对编程和数据处理还比较陌生，所以在今晚打开笔记本电脑深入研究之前，我愿意接受任何建设性的批评
非常感谢，非常感谢所有反馈

with open('data.csv') as f:
    header = f.readline()    # skip headers line
    blank_line = f.readline()    # blank line

    current_record = None
    s = f.readline()    # blank line
    while s:
        if not current_record:
            current_record = s
        else:
            current_record += s
            if s.rstrip().endswith('"'):
                # Remove line breaks
                current_record = current_record.replace('\r', ' ').replace('\n', ' ')
                # Get date and history
                ID, history = current_record.split(',', 1)
                # dequote history
                history = history.strip(' "')
                # split history into items
                items = [note.strip().split(' ', 1) for note in history.split('*** NOTE ***') if note]
                for datetime, message in items:
                    print ('{}, {}, {}'.format(ID, datetime, message))

                current_record = None

        s = f.readline()

csv

skipinitialspace

'***注意***'

with open(input_file_name, newline = '') as fd, \
     open(output_file_name, "w", newline='') as fdout:
    rd = csv.reader(fd, skipinitialspace=True)
    ID, Hist = next(rd)    # skip header line
    wr = csv.writer(fdout)
    _ = wr.writerow((ID, 'Date', Hist))  # write header of output file
    for row in rd:
        # print(row)      # uncomment for debug traces
        hists = row[1].split('*** NOTE ***')
        for h in hists:
            h = h.strip()
            if len(h) == 0:     # skip initial empty note
                continue
            # should begin with a data line
            date, h2 = h.split('\n', 1)
            _ = wr.writerow((row[0], date.strip(), h2.strip()))

open（输出文件名，“w”，换行=”）作为fdout，^SyntaxError:invalid syntax

fdout