使用Python将电子邮件的主题和正文提取到词典中_Python

使用Python将电子邮件的主题和正文提取到词典中

python

使用Python将电子邮件的主题和正文提取到词典中,python,Python,我想以{subject:body}格式从电子邮件存档（一个.txt文件）中提取主题和电子邮件正文。下面是我的txt文件 testing.txt 这是我的python文件 test.py 请帮我解释一下逻辑。预期的输出格式{Subject:Body}对我来说就像一本字典，所以我建议您坚持使用字典作为容器。以下内容将跳过以“To:”、“From:”或“\n”开头的任何行。如果遇到主题行，它会在字典中为该主题行生成一个条目，并将后续行的连接添加到当前主题行，直到下一个主题行作为值 with open(

我想以{subject:body}格式从电子邮件存档（一个.txt文件）中提取主题和电子邮件正文。下面是我的txt文件

testing.txt

这是我的python文件

test.py

请帮我解释一下逻辑。

预期的输出格式

{Subject:Body}

对我来说就像一本字典，所以我建议您坚持使用字典作为容器。以下内容将跳过以“To:”、“From:”或“\n”开头的任何行。如果遇到主题行，它会在字典中为该主题行生成一个条目，并将后续行的连接添加到当前主题行，直到下一个主题行作为值

with open("testing.txt") as f:
    data = {}
    for line in f:
        if any(line.startswith(kw) for kw in ("From:", "To:", "\n")):
            continue
        if line.startswith("Subject:"):
            current_subject = line.split(":")[-1].strip()
        else:
            data.setdefault(current_subject, "")
            data[current_subject] += line

print(data)

# {'This is the sample request one...': 'Hey there, \nThis is the smaple email just for the test purpose.\nNo intentions to hurt somebodys feleings at all.\nThanks, \n',
# 'This is the sample request second...': 'Hey there, \nthis is another sample mail body and just to test the py script working\nthis si the part of the data preprocesing \nthanks'}

如果您认为合适，请随意从行中删除不需要的字符

我希望这能有所帮助。

第一个版本就是这个版本，但如果主题与第二部分相同，你可以把它放在字典里

第一部分第二部分

这里有什么问题？此外，为了使文件具有适当的范围，您应该使用带有open（txt）的

作为f:for-line in f.readlines（）：#dostuff

Hmm，这些文件不是普通的邮件消息。根据RFC5322，应使用空行将主体部分与收割台部分分开。你真的需要解析这样一个伪造的格式文件吗？此外，在普通存档文件中，发件人的

行（From`后的no：）标记新邮件的开始。同样，你确定你的示例格式吗？事实上，原始格式有很大不同。我只想知道这些问题的逻辑，因为我是Python新手，从未真正处理过此类问题before@sajalarora，只要文件是有限的，这个程序就不能以无限循环结束。除了文件的内容外，没有其他内容被循环通过。因此，如果文件的内容是有限的（我确信是这样），那么程序应该退出。如果你仍然处于无限循环中，请编辑你的帖子并显示导致该循环的代码，因为它不可能是上面的代码。嘿，对不起，实际上…txt文件有问题。它与正确的文件配合得很好。无论如何谢谢你。。。
txt = "testing.txt"
file = open(txt)
body = ""
body_list = list()
subject_list = list()
for line in file:
    line = line.rstrip()
    if line.startswith("From:") or line.startswith("To:"):
        continue
    if line.startswith("Subject:"):
        subject_list.append(line)
    if not line.startswith("Subject:"):
        body = body + line

with open("testing.txt") as f:
    data = {}
    for line in f:
        if any(line.startswith(kw) for kw in ("From:", "To:", "\n")):
            continue
        if line.startswith("Subject:"):
            current_subject = line.split(":")[-1].strip()
        else:
            data.setdefault(current_subject, "")
            data[current_subject] += line

print(data)

# {'This is the sample request one...': 'Hey there, \nThis is the smaple email just for the test purpose.\nNo intentions to hurt somebodys feleings at all.\nThanks, \n',
# 'This is the sample request second...': 'Hey there, \nthis is another sample mail body and just to test the py script working\nthis si the part of the data preprocesing \nthanks'}

  subjects =[]
    bodys = []
    with open("test.txt") as file:
        body = ""
        for line in file:
            if line.startswith("From:") or line.startswith("To:"):
                continue
            if line.startswith("Subject:"):
               if body != '':
                   bodys.append(body)
                   body = ""
               subjects.append(line.split("Subject:")[1])
            if not line.startswith("Subject:"):
                body +=line
        bodys.append(body) #appends the last body of the mail
        body = ""
    print(subjects)
    print(bodys)

    SB ={}
    with open("test.txt") as file:
        body = ""
        subject = ""
        for line in file:
            if line.startswith("From:") or line.startswith("To:"):
                continue
            if line.startswith("Subject:"):
               if body != '':
                   SB[subject] = body
                   body = ""
               subject = line.split("Subject:")[1]
               SB[subject]=''
            if not line.startswith("Subject:"):
                body +=line
        SB[subject] = body
        body = ""

    print(SB)

email_data, subject, body = {}, "", ""
with open("emails.txt", "r") as records:
    for record in records:
        if record.startswith("Subject:"):
            subject = record.split("Subject:")[1].strip()
        elif not record.startswith("To:") and not record.startswith("From:"):
            body += record
        else:
            subject, body = "", ""
            continue
        email_data[subject] = body
print(email_data)