Python 解析具有日期标题和多个条目的文本文件

Python 解析具有日期标题和多个条目的文本文件,python,string,dictionary,parsing,text,Python,String,Dictionary,Parsing,Text,我一直在尝试解析一个大的文本文件,并将其转换成字典进行进一步分析。以下是文本文件的示例: Mar 2 (2020, year not always present) first paragraph second line of first paragraph second paragraph second line of second paragraph Mar 3 More lines these two should be grouped together because they do

我一直在尝试解析一个大的文本文件,并将其转换成字典进行进一步分析。以下是文本文件的示例:

Mar 2 (2020, year not always present)
first paragraph
second line of first paragraph

second paragraph
second line of second paragraph

Mar 3
More lines
these two should be grouped together
because they don't have a blank line in between them

however this line is a start of a new "entry"

sometimes they only have one line, sometimes many.
理想情况下,这将生成以下Python字典:

{"Mar 2": ["first paragraph\nsecond line of first paragraph", "second paragraph\nsecond line of second paragraph"], "Mar 3": ["More lines\nthese two should be grouped together\nbecause they\ndon't have a blank line in between them", "however this line is a start of a new \"entry\"", "sometimes they only have one line, sometimes many."]}
我尝试过使用下面的代码,它几乎可以正常工作,但我不确定出了什么问题

def isdate(行):
返回行.lower().在(“一月”、“二月”、“三月”、“四月”、“五月”、“六月”、“七月”、“八月”、“九月”、“十月”、“十一月”、“十二月”)中拆分(“”[0”)
数据=。。。
行=数据。拆分(“\n”)
i=0
数据={}
而i
我的代码的问题是,它输出正确的数据,但是当它到达末尾时会崩溃。很抱歉之前没有包括这一点,我只是意识到它实际上输出了正确的数据,但只是崩溃了

我不认为这是一个重复,虽然它是很难搜索到这样具体的东西,但这个一般(如果你明白我的意思),虽然我相信一些这样的Wizzard会纠正我


提前感谢。

您可以这样做:

def isdate(line):
    return line.lower().split(" ")[0] in ("jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sept", "oct", "nov", "dec")

with open("file.txt") as f:
    lines = f.readlines()

data = {}
line_concat = ""
for line in lines:
    if isdate(line): # new key
        if line_concat: # save previous element to old data point
            data[key].append(line_concat)
        month, day = line.split(" ")[0:2]
        key = month+" "+day
        data[key] = []
        continue
    if line.strip(): # if not empty line
        line_concat += line # add line to actual value as memory
    else:
        data[key].append(line_concat) # add the element to the list
        line_concat = "" # set memory empty
data[key].append(line_concat)
for d in data:
    print(d,data[d])
输出:
您可以这样做:

def isdate(line):
    return line.lower().split(" ")[0] in ("jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sept", "oct", "nov", "dec")

with open("file.txt") as f:
    lines = f.readlines()

data = {}
line_concat = ""
for line in lines:
    if isdate(line): # new key
        if line_concat: # save previous element to old data point
            data[key].append(line_concat)
        month, day = line.split(" ")[0:2]
        key = month+" "+day
        data[key] = []
        continue
    if line.strip(): # if not empty line
        line_concat += line # add line to actual value as memory
    else:
        data[key].append(line_concat) # add the element to the list
        line_concat = "" # set memory empty
data[key].append(line_concat)
for d in data:
    print(d,data[d])
输出:
我尝试过使用以下代码,它几乎可以正常工作,但我不确定出了什么问题。
。您需要提供预期输出和实际输出output@JammyDodger好的,我会的promptly@JammyDodger我刚刚重新检查了我的代码,现在我意识到它可以工作,但最后崩溃了(索引超出范围)。我是否应该结束这个问题,因为我认为我现在应该能够自己解决它?如果你这样做,包括修复,以防其他人有这个问题如何?你有回溯吗?
我试过使用下面的代码,它几乎可以正常工作,但我不确定出了什么问题。
。您需要提供预期输出和实际输出output@JammyDodger好的,我会的promptly@JammyDodger我刚刚重新检查了我的代码,现在我意识到它可以工作,但最后崩溃了(索引超出范围)。我是否应该结束这个问题,因为我认为我现在应该能够自己解决它?如果你这样做,包括修复,以防其他人有这个问题如何?你有追踪记录吗?