Python：提取定义字符之间的字符串，并将其存储到具有列表的disctionary中_Python

Python：提取定义字符之间的字符串，并将其存储到具有列表的disctionary中

python

Python：提取定义字符之间的字符串，并将其存储到具有列表的disctionary中,python,Python,我有一个文件夹定义为 /主/文件夹此文件夹中有许多文件，例如： folder ├── this.py ├── that └── something 在这些文件中，有大量文本，其中存储了6个字符之间的行，如下所示： cat this.py ###### This is what I'm looking for ###### ###### This is the second line that I'm looking for ###### 我想做的是从这6个数组中提取文本，

我有一个文件夹定义为

/主/文件夹

此文件夹中有许多文件，例如：

folder
├── this.py
├── that
└── something

在这些文件中，有大量文本，其中存储了6个字符之间的行，如下所示：

cat this.py

    ###### This is what I'm looking for ######
    ###### This is the second line that I'm looking for ######

我想做的是从这6个数组中提取文本，并将它们存储到数组字典中，数组字典应具有以下结构：

my_list = {file1: [string1, string2, string3], file2: [string1, string2, string3]}

以上述内容为例：

my_list = {'this.py': ['This is what I'm looking for', 'This is the second line that I'm looking for'}

所以我需要没有完整路径的文件名和没有6#的相应字符串。另外，我只想存储具有有效字符串的文件。那些不遵守的不应该出现在字典里

我已使用此函数遍历目录，但我不确定如何进一步解决此问题：

def get_directory_structure(rootdir):
    """
    Creates a nested dictionary that represents the folder structure of rootdir
    """
    dir = {}
    rootdir = rootdir.rstrip(os.sep)
    start = rootdir.rfind(os.sep) + 1
    for path, dirs, files in os.walk(rootdir):
        folders = path[start:].split(os.sep)
        subdir = dict.fromkeys(files)
        parent = reduce(dict.get, folders[:-1], dir)
        parent[folders[-1]] = subdir
    return dir

如果只有一级目录（没有要遍历的子目录），可以尝试以下操作：

import os

def extract(name):
    with open(name, "rt") as f:
        a = []
        for line in f:
            line = line.rstrip("\r\n")
            if line.startswith("###### ") and line.endswith(" ######"):
                a.append(line[7:-7])
        return a

def create_dict(path):
    h = {}
    for name in os.listdir(path):
        a = extract(os.path.join(path, name))
        if a:
            h[name] = a
    return h

如果有子目录要遍历，可以使用而不是

create\u dict

：

def create_dict_walk(path):
    h = {}
    for dirname, subdirs, files in os.walk(path):
        for filename in files:
            a = extract(os.path.join(dirname, filename))
            if a:
                h[filename] = a
    return h

但是，请注意，如果文件的编码不正常，或者只需要检查某些文件（根据扩展名、日期等），则可能需要进行一些检查。此外，在这里，函数将只使用文件名，并且在不同的子目录中可能存在重复项

这些函数仅在实际有一些#行时才存储列表。如果您想要一个空列表，而没有空列表，只需删除测试。

谢谢。但是，如果我希望它只包含符合规则的文件名，该怎么办？例如，如果有一个文件不包含任何介于6之间的字符串，我不希望该名称包含在字典中。@Sorin MihaiOprea我更新了

create_dict

以添加测试。现在这两个函数只存储一个列表，如果它不是空的。你太棒了。谢谢。：）