Python 将txt文件解析为JSON,仅获取最后一条记录
我有一个格式化的文本文件,由outlook电子邮件组成 “发件人:的Python 将txt文件解析为JSON,仅获取最后一条记录,python,parsing,Python,Parsing,我有一个格式化的文本文件,由outlook电子邮件组成 “发件人:的指示一封新电子邮件 我正在尝试解析发件人、主题(多个字段),然后阅读其余内容,直到新发件人指示下一封新电子邮件: 首先,我试图对它进行暴力攻击,因为这是一个概念验证的测试,然而,我只收到链中最后一封电子邮件 l = [] with open(r'transcripts.txt', 'r') as transcripts: for line in transcripts: is_new_subject = line.l
指示一封新电子邮件
我正在尝试解析发件人、主题(多个字段),然后阅读其余内容,直到新发件人指示下一封新电子邮件:
首先,我试图对它进行暴力攻击,因为这是一个概念验证的测试,然而,我只收到链中最后一封电子邮件
l = []
with open(r'transcripts.txt', 'r') as transcripts:
for line in transcripts:
is_new_subject = line.lower().startswith('from')
if is_new_subject:
record = {}
record['from'] = line.split(':')[1]
for line in transcripts:
if line.lower().startswith('subject'):
subject = line.split(':')[1]
record['subject'] = subject
split_it = subject.split('.')
record['show'] = split_it[0]
record['air_date'] = split_it[1]
record['hour'] = split_it[2]
record['content'] = ""
for line in transcripts:
record['content'] += line
is_new_subject = line.lower().startswith('from')
if is_new_subject:
l.append(record)
break
with open('output.json', 'w') as outfile:
json.dump(l, outfile, indent=4)
如果您有任何想法,我将从头开始重新编写它您的代码有点难读,我认为如果您将其分解为函数,调试起来会容易得多。另外,我建议使用python的re库进行这种类型的文本处理,因为它比只测试静态字符串灵活得多。例如:
import re
def parse_emails_from_list(email_list):
"""returns a list of emails from an email list"""
return re.compile("From:").split(email_list)
def parse_email_details_from_email(email):
"""do some more processing here"""
email = {}
email['subject'] = #parse your email details here
#...
#...
return email
if __name__ == "main":
"""main loop"""
parsed_emails = []
with open(r'transcripts.txt', 'r') as email_list:
email_list = parse_emails_from_list(transcripts)
[parsed_emails.append(parse_email_details_from_email(email)) for email in email_list]
with open('output.json', 'w') as outfile:
json.dump(parsed_emails, outfile, indent=4)
在仔细查看代码之后,很明显,循环逻辑肯定是您遇到问题的地方。您的代码有点难读,我认为如果将其分解为函数,则调试起来会容易得多。另外,我建议使用python的re库进行这种类型的文本处理,因为它比只测试静态字符串灵活得多。例如:
import re
def parse_emails_from_list(email_list):
"""returns a list of emails from an email list"""
return re.compile("From:").split(email_list)
def parse_email_details_from_email(email):
"""do some more processing here"""
email = {}
email['subject'] = #parse your email details here
#...
#...
return email
if __name__ == "main":
"""main loop"""
parsed_emails = []
with open(r'transcripts.txt', 'r') as email_list:
email_list = parse_emails_from_list(transcripts)
[parsed_emails.append(parse_email_details_from_email(email)) for email in email_list]
with open('output.json', 'w') as outfile:
json.dump(parsed_emails, outfile, indent=4)
在仔细查看代码之后,很明显,循环逻辑肯定是您遇到问题的地方。您应该试试。这很容易使用。由于某些原因,此电子邮件无法与多部分电子邮件一起使用。因此,我使用了@Max Paymar创建的拆分函数。谢谢@Max Paymar
import email
import re
def parse_emails_from_list(email_list):
"""returns a list of emails from an email list"""
return re.compile("From:").split(email_list)
a=open('sampleEmail.txt','r')
email_list = parse_emails_from_list(a.read())
for E_mail in email_list:
msg = email.message_from_string('From:'+E_mail)
print msg['Subject']
print msg['From']
print msg.get_payload()
你应该试试。这很容易使用。由于某些原因,此电子邮件无法与多部分电子邮件一起使用。因此,我使用了@Max Paymar创建的拆分函数。谢谢@Max Paymar
import email
import re
def parse_emails_from_list(email_list):
"""returns a list of emails from an email list"""
return re.compile("From:").split(email_list)
a=open('sampleEmail.txt','r')
email_list = parse_emails_from_list(a.read())
for E_mail in email_list:
msg = email.message_from_string('From:'+E_mail)
print msg['Subject']
print msg['From']
print msg.get_payload()
这大大简化了事情,而不需要循环逻辑。感谢您的指导这大大简化了事情,而不需要循环逻辑。感谢您的指导这是很好的指导,感谢您的投入。下面的答案利用了我不知道的内置电子邮件包,这使整个任务变得更加容易。再次感谢这是很好的指导,感谢您的投入。下面的答案利用了我不知道的内置电子邮件包,这使整个任务变得更加容易。再次感谢