Python 解析具有日期标题和多个条目的文本文件_Python_String_Dictionary_Parsing_Text

Python 解析具有日期标题和多个条目的文本文件

python string dictionary parsing text

Python 解析具有日期标题和多个条目的文本文件,python,string,dictionary,parsing,text,Python,String,Dictionary,Parsing,Text,我一直在尝试解析一个大的文本文件，并将其转换成字典进行进一步分析。以下是文本文件的示例： Mar 2 (2020, year not always present) first paragraph second line of first paragraph second paragraph second line of second paragraph Mar 3 More lines these two should be grouped together because they do

我一直在尝试解析一个大的文本文件，并将其转换成字典进行进一步分析。以下是文本文件的示例：

Mar 2 (2020, year not always present)
first paragraph
second line of first paragraph

second paragraph
second line of second paragraph

Mar 3
More lines
these two should be grouped together
because they don't have a blank line in between them

however this line is a start of a new "entry"

sometimes they only have one line, sometimes many.

理想情况下，这将生成以下Python字典：

{"Mar 2": ["first paragraph\nsecond line of first paragraph", "second paragraph\nsecond line of second paragraph"], "Mar 3": ["More lines\nthese two should be grouped together\nbecause they\ndon't have a blank line in between them", "however this line is a start of a new \"entry\"", "sometimes they only have one line, sometimes many."]}

我尝试过使用下面的代码，它几乎可以正常工作，但我不确定出了什么问题

def isdate（行）：
返回行.lower（）.在（“一月”、“二月”、“三月”、“四月”、“五月”、“六月”、“七月”、“八月”、“九月”、“十月”、“十一月”、“十二月”）中拆分（“”[0”）
数据=。。。
行=数据。拆分（“\n”）
i=0
数据={}
而i


我的代码的问题是，它输出正确的数据，但是当它到达末尾时会崩溃。很抱歉之前没有包括这一点，我只是意识到它实际上输出了正确的数据，但只是崩溃了
我不认为这是一个重复，虽然它是很难搜索到这样具体的东西，但这个一般（如果你明白我的意思），虽然我相信一些这样的Wizzard会纠正我
提前感谢。
您可以这样做：
def isdate(line):
    return line.lower().split(" ")[0] in ("jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sept", "oct", "nov", "dec")

with open("file.txt") as f:
    lines = f.readlines()

data = {}
line_concat = ""
for line in lines:
    if isdate(line): # new key
        if line_concat: # save previous element to old data point
            data[key].append(line_concat)
        month, day = line.split(" ")[0:2]
        key = month+" "+day
        data[key] = []
        continue
    if line.strip(): # if not empty line
        line_concat += line # add line to actual value as memory
    else:
        data[key].append(line_concat) # add the element to the list
        line_concat = "" # set memory empty
data[key].append(line_concat)
for d in data:
    print(d,data[d])

输出：
您可以这样做：
def isdate(line):
    return line.lower().split(" ")[0] in ("jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sept", "oct", "nov", "dec")

with open("file.txt") as f:
    lines = f.readlines()

data = {}
line_concat = ""
for line in lines:
    if isdate(line): # new key
        if line_concat: # save previous element to old data point
            data[key].append(line_concat)
        month, day = line.split(" ")[0:2]
        key = month+" "+day
        data[key] = []
        continue
    if line.strip(): # if not empty line
        line_concat += line # add line to actual value as memory
    else:
        data[key].append(line_concat) # add the element to the list
        line_concat = "" # set memory empty
data[key].append(line_concat)
for d in data:
    print(d,data[d])

输出：
我尝试过使用以下代码，它几乎可以正常工作，但我不确定出了什么问题。
。您需要提供预期输出和实际输出output@JammyDodger好的，我会的promptly@JammyDodger我刚刚重新检查了我的代码，现在我意识到它可以工作，但最后崩溃了（索引超出范围）。我是否应该结束这个问题，因为我认为我现在应该能够自己解决它？如果你这样做，包括修复，以防其他人有这个问题如何？你有回溯吗？我试过使用下面的代码，它几乎可以正常工作，但我不确定出了什么问题。
。您需要提供预期输出和实际输出output@JammyDodger好的，我会的promptly@JammyDodger我刚刚重新检查了我的代码，现在我意识到它可以工作，但最后崩溃了（索引超出范围）。我是否应该结束这个问题，因为我认为我现在应该能够自己解决它？如果你这样做，包括修复，以防其他人有这个问题如何？你有追踪记录吗？