Python 将txt文件解析为JSON,仅获取最后一条记录

Python 将txt文件解析为JSON,仅获取最后一条记录,python,parsing,Python,Parsing,我有一个格式化的文本文件,由outlook电子邮件组成 “发件人:的指示一封新电子邮件 我正在尝试解析发件人、主题(多个字段),然后阅读其余内容,直到新发件人指示下一封新电子邮件: 首先,我试图对它进行暴力攻击,因为这是一个概念验证的测试,然而,我只收到链中最后一封电子邮件 l = [] with open(r'transcripts.txt', 'r') as transcripts: for line in transcripts: is_new_subject = line.l

我有一个格式化的文本文件,由outlook电子邮件组成

“发件人:的
指示一封新电子邮件

我正在尝试解析发件人、主题(多个字段),然后阅读其余内容,直到新发件人指示下一封新电子邮件:

首先,我试图对它进行暴力攻击,因为这是一个概念验证的测试,然而,我只收到链中最后一封电子邮件

l = []
with open(r'transcripts.txt', 'r') as transcripts:

for line in transcripts:
    is_new_subject = line.lower().startswith('from')
    if is_new_subject:
        record = {}
        record['from'] = line.split(':')[1]
    for line in transcripts:

        if line.lower().startswith('subject'):
            subject = line.split(':')[1]
            record['subject'] = subject
            split_it = subject.split('.')
            record['show'] = split_it[0]
            record['air_date'] = split_it[1]
            record['hour'] = split_it[2]
            record['content'] = ""
            for line in transcripts:
                record['content'] += line
                is_new_subject = line.lower().startswith('from')
                if is_new_subject:
                    l.append(record)
                    break
with open('output.json', 'w') as outfile:
    json.dump(l, outfile, indent=4)

如果您有任何想法,我将从头开始重新编写它

您的代码有点难读,我认为如果您将其分解为函数,调试起来会容易得多。另外,我建议使用python的re库进行这种类型的文本处理,因为它比只测试静态字符串灵活得多。例如:

import re

def parse_emails_from_list(email_list):
    """returns a list of emails from an email list"""
    return re.compile("From:").split(email_list)

def parse_email_details_from_email(email):
    """do some more processing here"""
    email = {}
    email['subject'] = #parse your email details here
    #...
    #...
    return email

if __name__ == "main":
    """main loop"""
    parsed_emails = []
    with open(r'transcripts.txt', 'r') as email_list:
        email_list = parse_emails_from_list(transcripts)
        [parsed_emails.append(parse_email_details_from_email(email)) for email in email_list]

    with open('output.json', 'w') as outfile:
        json.dump(parsed_emails, outfile, indent=4)

在仔细查看代码之后,很明显,循环逻辑肯定是您遇到问题的地方。

您的代码有点难读,我认为如果将其分解为函数,则调试起来会容易得多。另外,我建议使用python的re库进行这种类型的文本处理,因为它比只测试静态字符串灵活得多。例如:

import re

def parse_emails_from_list(email_list):
    """returns a list of emails from an email list"""
    return re.compile("From:").split(email_list)

def parse_email_details_from_email(email):
    """do some more processing here"""
    email = {}
    email['subject'] = #parse your email details here
    #...
    #...
    return email

if __name__ == "main":
    """main loop"""
    parsed_emails = []
    with open(r'transcripts.txt', 'r') as email_list:
        email_list = parse_emails_from_list(transcripts)
        [parsed_emails.append(parse_email_details_from_email(email)) for email in email_list]

    with open('output.json', 'w') as outfile:
        json.dump(parsed_emails, outfile, indent=4)
在仔细查看代码之后,很明显,循环逻辑肯定是您遇到问题的地方。

您应该试试。这很容易使用。由于某些原因,此电子邮件无法与多部分电子邮件一起使用。因此,我使用了@Max Paymar创建的拆分函数。谢谢@Max Paymar

import email
import re


def parse_emails_from_list(email_list):
    """returns a list of emails from an email list"""
    return re.compile("From:").split(email_list)

a=open('sampleEmail.txt','r')
email_list = parse_emails_from_list(a.read())

for E_mail in email_list:
    msg = email.message_from_string('From:'+E_mail)
    print msg['Subject']
    print msg['From']
    print msg.get_payload()
你应该试试。这很容易使用。由于某些原因,此电子邮件无法与多部分电子邮件一起使用。因此,我使用了@Max Paymar创建的拆分函数。谢谢@Max Paymar

import email
import re


def parse_emails_from_list(email_list):
    """returns a list of emails from an email list"""
    return re.compile("From:").split(email_list)

a=open('sampleEmail.txt','r')
email_list = parse_emails_from_list(a.read())

for E_mail in email_list:
    msg = email.message_from_string('From:'+E_mail)
    print msg['Subject']
    print msg['From']
    print msg.get_payload()

这大大简化了事情,而不需要循环逻辑。感谢您的指导这大大简化了事情,而不需要循环逻辑。感谢您的指导这是很好的指导,感谢您的投入。下面的答案利用了我不知道的内置电子邮件包,这使整个任务变得更加容易。再次感谢这是很好的指导,感谢您的投入。下面的答案利用了我不知道的内置电子邮件包,这使整个任务变得更加容易。再次感谢