Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/324.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用Python将电子邮件的主题和正文提取到词典中_Python - Fatal编程技术网

使用Python将电子邮件的主题和正文提取到词典中

使用Python将电子邮件的主题和正文提取到词典中,python,Python,我想以{subject:body}格式从电子邮件存档(一个.txt文件)中提取主题和电子邮件正文。下面是我的txt文件 testing.txt 这是我的python文件 test.py 请帮我解释一下逻辑。预期的输出格式{Subject:Body}对我来说就像一本字典,所以我建议您坚持使用字典作为容器。以下内容将跳过以“To:”、“From:”或“\n”开头的任何行。如果遇到主题行,它会在字典中为该主题行生成一个条目,并将后续行的连接添加到当前主题行,直到下一个主题行作为值 with open(

我想以{subject:body}格式从电子邮件存档(一个.txt文件)中提取主题和电子邮件正文。下面是我的txt文件

testing.txt

这是我的python文件

test.py


请帮我解释一下逻辑。

预期的输出格式
{Subject:Body}
对我来说就像一本字典,所以我建议您坚持使用字典作为容器。以下内容将跳过以“To:”、“From:”或“\n”开头的任何行。如果遇到主题行,它会在字典中为该主题行生成一个条目,并将后续行的连接添加到当前主题行,直到下一个主题行作为值

with open("testing.txt") as f:
    data = {}
    for line in f:
        if any(line.startswith(kw) for kw in ("From:", "To:", "\n")):
            continue
        if line.startswith("Subject:"):
            current_subject = line.split(":")[-1].strip()
        else:
            data.setdefault(current_subject, "")
            data[current_subject] += line

print(data)

# {'This is the sample request one...': 'Hey there, \nThis is the smaple email just for the test purpose.\nNo intentions to hurt somebodys feleings at all.\nThanks, \n',
# 'This is the sample request second...': 'Hey there, \nthis is another sample mail body and just to test the py script working\nthis si the part of the data preprocesing \nthanks'}
如果您认为合适,请随意从行中删除不需要的字符


我希望这能有所帮助。

第一个版本就是这个版本,但如果主题与第二部分相同,你可以把它放在字典里

第一部分 第二部分
这里有什么问题?此外,为了使文件具有适当的范围,您应该使用带有open(txt)的
作为f:for-line in f.readlines():#dostuff
Hmm,这些文件不是普通的邮件消息。根据RFC5322,应使用空行将主体部分与收割台部分分开。你真的需要解析这样一个伪造的格式文件吗?此外,在普通存档文件中,发件人的
行(From`后的no
)标记新邮件的开始。同样,你确定你的示例格式吗?事实上,原始格式有很大不同。我只想知道这些问题的逻辑,因为我是Python新手,从未真正处理过此类问题before@sajalarora,只要文件是有限的,这个程序就不能以无限循环结束。除了文件的内容外,没有其他内容被循环通过。因此,如果文件的内容是有限的(我确信是这样),那么程序应该退出。如果你仍然处于无限循环中,请编辑你的帖子并显示导致该循环的代码,因为它不可能是上面的代码。嘿,对不起,实际上…txt文件有问题。它与正确的文件配合得很好。无论如何谢谢你。。。
txt = "testing.txt"
file = open(txt)
body = ""
body_list = list()
subject_list = list()
for line in file:
    line = line.rstrip()
    if line.startswith("From:") or line.startswith("To:"):
        continue
    if line.startswith("Subject:"):
        subject_list.append(line)
    if not line.startswith("Subject:"):
        body = body + line
with open("testing.txt") as f:
    data = {}
    for line in f:
        if any(line.startswith(kw) for kw in ("From:", "To:", "\n")):
            continue
        if line.startswith("Subject:"):
            current_subject = line.split(":")[-1].strip()
        else:
            data.setdefault(current_subject, "")
            data[current_subject] += line

print(data)

# {'This is the sample request one...': 'Hey there, \nThis is the smaple email just for the test purpose.\nNo intentions to hurt somebodys feleings at all.\nThanks, \n',
# 'This is the sample request second...': 'Hey there, \nthis is another sample mail body and just to test the py script working\nthis si the part of the data preprocesing \nthanks'}
  subjects =[]
    bodys = []
    with open("test.txt") as file:
        body = ""
        for line in file:
            if line.startswith("From:") or line.startswith("To:"):
                continue
            if line.startswith("Subject:"):
               if body != '':
                   bodys.append(body)
                   body = ""
               subjects.append(line.split("Subject:")[1])
            if not line.startswith("Subject:"):
                body +=line
        bodys.append(body) #appends the last body of the mail
        body = ""
    print(subjects)
    print(bodys)
    SB ={}
    with open("test.txt") as file:
        body = ""
        subject = ""
        for line in file:
            if line.startswith("From:") or line.startswith("To:"):
                continue
            if line.startswith("Subject:"):
               if body != '':
                   SB[subject] = body
                   body = ""
               subject = line.split("Subject:")[1]
               SB[subject]=''
            if not line.startswith("Subject:"):
                body +=line
        SB[subject] = body
        body = ""

    print(SB)
email_data, subject, body = {}, "", ""
with open("emails.txt", "r") as records:
    for record in records:
        if record.startswith("Subject:"):
            subject = record.split("Subject:")[1].strip()
        elif not record.startswith("To:") and not record.startswith("From:"):
            body += record
        else:
            subject, body = "", ""
            continue
        email_data[subject] = body
print(email_data)