Python 迭代多个文件，将日期提取到另一个文件_Python_Python 3.4

Python 迭代多个文件，将日期提取到另一个文件

python

Python 迭代多个文件，将日期提取到另一个文件,python,python-3.4,Python,Python 3.4,好的，我有一个有多个文件夹的源目录。每个文件夹都有一个名为tvshow.nfo的文件，我想从中提取数据。我写了以下内容- import sys import os import re from pathlib import Path L = [] my_dir = "./source/" for item in Path(my_dir).glob('./*/tvshow.nfo'): M = str(item).splitlines() for i in M:

好的，我有一个有多个文件夹的源目录。每个文件夹都有一个名为tvshow.nfo的文件，我想从中提取数据。我写了以下内容-

import sys
import os
import re
from pathlib import Path

L = []
my_dir = "./source/"
for item in Path(my_dir).glob('./*/tvshow.nfo'):
    M = str(item).splitlines()
    for i in M:
        f = open(i, "r")
        for i in f:
            for j in re.findall("<title>(.+)</title>", i):
                L.append(j)
            for j in re.findall("<year>(.+)</year>", i):
                L.append(j)
            for j in re.findall("<status>(.+)</status>", i):
                L.append(j)
            for j in re.findall("<studio>(.+)</studio>", i):
                L.append(j)
        for i in L:
            print (i)
        f.close()

我希望将输出导出到新文件，如下所示：

APB 2017美国安琪·特里贝卡2016继续TBS Arrow 2012继续CW

有人能帮我吗？还有比我尝试的更好的方法吗？

根据你展示的，你可以试试这个

import sys
import os
import re
from pathlib import Path

info = []
my_dir = "./source/"
for item in Path(my_dir).glob('./*/tvshow.nfo'):
    M = str(item).splitlines()
    for i in M:
        L = []
        f = open(i, "r")
        for i in f:
            for j in re.findall("<title>(.+)</title>", i):
                L.append(j)
            for j in re.findall("<year>(.+)</year>", i):
                L.append(j)
            for j in re.findall("<status>(.+)</status>", i):
                L.append(j)
            for j in re.findall("<studio>(.+)</studio>", i):
                L.append(j)
        f.close()
        info.append(' '.join(L))
with open("new_file", "w") as w:
    for i in info:
        w.write(i + "\n")

您不应该为每个节目制作一个包含所有不同属性的列表，而应该以更易于阅读的方式组织数据。一种可能是列表列表，顶级列表中每个节目都有一个条目，内部列表中包含一个节目的标题、年份、状态和工作室属性。您可以非常轻松地修改现有代码以完成以下任务：

    for i in f:
        show_attributes = []
        for j in re.findall("<title>(.+)</title>", i):
            show_attributes.append(j)
        for j in re.findall("<year>(.+)</year>", i):
            show_attributes.append(j)
        for j in re.findall("<status>(.+)</status>", i):
            show_attributes.append(j)
        for j in re.findall("<studio>(.+)</studio>", i):
            show_attributes.append(j)
        L.append(show_attributes)
    for i in L:
        for attribute in i:
            print(attribute, end=' ')
    f.close()

从您的示例来看，每个节目的所有标记似乎都在一行上

如果一个节目的所有标签都在一行上，我认为这样做可能会有所帮助：

import sys
import os
import re
from pathlib import Path


def find_tag(tag, l):
    ''' returns result of findall on a tag on line l'''
    full_tag = "<" + tag + ">(.+)</" + tag + ">"
    return re.findall(full_tag, l)


L = []
my_dir = "./source/"
for item in Path(my_dir).glob('./*/tvshow.nfo'):
    # changed the file variable to data_file
    M = str(item).splitlines()
    for data_file in M:
        # use with to open the file without needing to close it
        with open(data_file, "r") as f:

            for line in f:
                title = find_tag("title", line)
                year = find_tag("year", line)
                status = find_tag("status", line)
                studio = find_tag("studio", line)
                L.append(' '.join(str(d[0]) for d in [title, year, status, studio] if d))

# print the data or whatever else you're doing with it
for data in L:
    print(data)

这使用with打开文件，而不需要使用try-catch并自己关闭它。有关的信息可在此处找到：

将组列表项从re.findall更改为字符串需要strd[0]。如果该行中缺少标记，并且我可能误解了标记在文件中的放置方式，那么if d就在那里，如果我是，很抱歉

也可以使用列表来构建L： L=[find_tagtitle，line，find_tagyear，line，find_tagstatus，line，find_tagstudio，f中每行的行]而不是附加到列表中

然后，在打印列表时可以使用join方法：print“”。如果数据为d，则数据中d的joinstrd[0]

你是否想这样做取决于你有多喜欢列表理解

我还创建了一个find_标记函数，但这主要是因为我试图弄清楚到底发生了什么

如果不知道文件的外观，很难判断是否应该在单独的行中查找每个文件。也很难判断订单是否重要，或者是否需要进行任何错误处理

很抱歉输入错误，它应该是title中的数据，而不是date.Off-topic:Path.glob返回一个匹配列表，因此M=stritem.splitlines是不必要的，因为item将始终是一个单独的Path子类实例，这意味着M中的i:将只执行一次迭代，搜索python XMLmodule@martineau我尝试不使用str和splitlines，但得到了错误TypeError:invalid file:PosixPath…@martineau-aah-yes！！！知道了！非常感谢，这很有效。若它只是函数，我通常可以这样做，但我在列表和在哪里定义它、追加等方面有困难。。需要更多地练习不同类型的例子。谢谢，一旦我更熟悉列表，我一定会尝试这种方法