使用Python将电子邮件的主题和正文提取到词典中
我想以{subject:body}格式从电子邮件存档(一个.txt文件)中提取主题和电子邮件正文。下面是我的txt文件 testing.txt 这是我的python文件 test.py使用Python将电子邮件的主题和正文提取到词典中,python,Python,我想以{subject:body}格式从电子邮件存档(一个.txt文件)中提取主题和电子邮件正文。下面是我的txt文件 testing.txt 这是我的python文件 test.py 请帮我解释一下逻辑。预期的输出格式{Subject:Body}对我来说就像一本字典,所以我建议您坚持使用字典作为容器。以下内容将跳过以“To:”、“From:”或“\n”开头的任何行。如果遇到主题行,它会在字典中为该主题行生成一个条目,并将后续行的连接添加到当前主题行,直到下一个主题行作为值 with open(
请帮我解释一下逻辑。预期的输出格式
{Subject:Body}
对我来说就像一本字典,所以我建议您坚持使用字典作为容器。以下内容将跳过以“To:”、“From:”或“\n”开头的任何行。如果遇到主题行,它会在字典中为该主题行生成一个条目,并将后续行的连接添加到当前主题行,直到下一个主题行作为值
with open("testing.txt") as f:
data = {}
for line in f:
if any(line.startswith(kw) for kw in ("From:", "To:", "\n")):
continue
if line.startswith("Subject:"):
current_subject = line.split(":")[-1].strip()
else:
data.setdefault(current_subject, "")
data[current_subject] += line
print(data)
# {'This is the sample request one...': 'Hey there, \nThis is the smaple email just for the test purpose.\nNo intentions to hurt somebodys feleings at all.\nThanks, \n',
# 'This is the sample request second...': 'Hey there, \nthis is another sample mail body and just to test the py script working\nthis si the part of the data preprocesing \nthanks'}
如果您认为合适,请随意从行中删除不需要的字符
我希望这能有所帮助。第一个版本就是这个版本,但如果主题与第二部分相同,你可以把它放在字典里 第一部分 第二部分
这里有什么问题?此外,为了使文件具有适当的范围,您应该使用带有open(txt)的
作为f:for-line in f.readlines():#dostuff
Hmm,这些文件不是普通的邮件消息。根据RFC5322,应使用空行将主体部分与收割台部分分开。你真的需要解析这样一个伪造的格式文件吗?此外,在普通存档文件中,发件人的行(From`后的no:
)标记新邮件的开始。同样,你确定你的示例格式吗?事实上,原始格式有很大不同。我只想知道这些问题的逻辑,因为我是Python新手,从未真正处理过此类问题before@sajalarora,只要文件是有限的,这个程序就不能以无限循环结束。除了文件的内容外,没有其他内容被循环通过。因此,如果文件的内容是有限的(我确信是这样),那么程序应该退出。如果你仍然处于无限循环中,请编辑你的帖子并显示导致该循环的代码,因为它不可能是上面的代码。嘿,对不起,实际上…txt文件有问题。它与正确的文件配合得很好。无论如何谢谢你。。。
txt = "testing.txt"
file = open(txt)
body = ""
body_list = list()
subject_list = list()
for line in file:
line = line.rstrip()
if line.startswith("From:") or line.startswith("To:"):
continue
if line.startswith("Subject:"):
subject_list.append(line)
if not line.startswith("Subject:"):
body = body + line
with open("testing.txt") as f:
data = {}
for line in f:
if any(line.startswith(kw) for kw in ("From:", "To:", "\n")):
continue
if line.startswith("Subject:"):
current_subject = line.split(":")[-1].strip()
else:
data.setdefault(current_subject, "")
data[current_subject] += line
print(data)
# {'This is the sample request one...': 'Hey there, \nThis is the smaple email just for the test purpose.\nNo intentions to hurt somebodys feleings at all.\nThanks, \n',
# 'This is the sample request second...': 'Hey there, \nthis is another sample mail body and just to test the py script working\nthis si the part of the data preprocesing \nthanks'}
subjects =[]
bodys = []
with open("test.txt") as file:
body = ""
for line in file:
if line.startswith("From:") or line.startswith("To:"):
continue
if line.startswith("Subject:"):
if body != '':
bodys.append(body)
body = ""
subjects.append(line.split("Subject:")[1])
if not line.startswith("Subject:"):
body +=line
bodys.append(body) #appends the last body of the mail
body = ""
print(subjects)
print(bodys)
SB ={}
with open("test.txt") as file:
body = ""
subject = ""
for line in file:
if line.startswith("From:") or line.startswith("To:"):
continue
if line.startswith("Subject:"):
if body != '':
SB[subject] = body
body = ""
subject = line.split("Subject:")[1]
SB[subject]=''
if not line.startswith("Subject:"):
body +=line
SB[subject] = body
body = ""
print(SB)
email_data, subject, body = {}, "", ""
with open("emails.txt", "r") as records:
for record in records:
if record.startswith("Subject:"):
subject = record.split("Subject:")[1].strip()
elif not record.startswith("To:") and not record.startswith("From:"):
body += record
else:
subject, body = "", ""
continue
email_data[subject] = body
print(email_data)