Python 如何动态迭代子目录_Python_Iteration

Python 如何动态迭代子目录

python

Python 如何动态迭代子目录,python,iteration,Python,Iteration,我在多种情况下都遇到过这个问题这里的一个用例是，假设我有一个目录结构，可以包含未知的子目录层，我想得到rootdir下的文件总数。动态遍历此树的最佳方法是什么这是文件夹结构的一个示例： rootdir -> subdir1 ->file1 -> subsubdir1 -> file1 -> file2 -> subsubdir2

我在多种情况下都遇到过这个问题

这里的一个用例是，假设我有一个目录结构，可以包含未知的子目录层，我想得到rootdir下的文件总数。动态遍历此树的最佳方法是什么

这是文件夹结构的一个示例：

rootdir
   -> subdir1
     ->file1
          -> subsubdir1
                 -> file1
                 -> file2
          -> subsubdir2
                 -> file1
          -> subsubdir3
                 -> file1
                 -> subsubsubdir
                    -> file1
   -> subdir2
          -> subsubdirA
                 -> file1
                 -> file2
          -> subsubdirB
                 -> file1
                 -> file2

我通过API调用获取文件夹结构，而不是直接从文件系统获取。这是api响应的一部分。假设这是调用rootdir得到的结果，然后我想保存子文件夹id[1,2]，然后进入每个子文件夹，重复相同的过程，查找子文件夹是否存在，同时保持文件计数

响应包括总计数，即项目数（一个子文件夹将计为1）。因此，我需要跟踪子文件夹id，并为每个子文件夹启动一个新的api调用，以获取每个子文件夹（以及可能的子子文件夹）中的文件数，同时跟踪文件总数。（希望我能解释清楚。如果有任何不清楚的地方，请随时发表评论。）

到目前为止，这就是我所拥有的，但我不确定如何跟踪每个子文件夹并动态迭代它们。感谢您的帮助

def total_file_count(client, folder_id):
    total_file_count = 0
    subfolder_ids = []
    folder = client.get_folder(folder_id=folder_id)
    item_count = folder['item_collection']['total_count']
    subfolder = True

    if item_count > 0:
        while subfolder:
            for i in folder['item_collection']['entries']:
                if i['type']=='folder':
                    subfolder_ids.append(i['id'])
                elif i['type']=='file':
                    total_file_count += 1

                subfolder = False if not subfolder_ids

    return total_file_count

这里有一个使用while循环的通用方法。这个想法是从文件夹ID的列表（作为根目录提供）开始，然后从您获得的条目中，添加任何要搜索的文件夹到该列表中。因此，尽管仍有文件夹需要检查，但您仍会不断发出请求并累计文件数

def get_file_count(client, folder_id):
  count = 0
  folders = [folder_id]
  while len(folders) > 0:
    id = folders.pop(0)
    data = client.get_folder(id)
    entries = data["item_collection"]["entries"]
    for entry in entries:
      if entry["type"] == "folder":
        folders.append(entry["id"])
      else:
        count += 1    
  return count

您可以复制和粘贴它，也可以不复制和粘贴它，但这只是为了进行说明

理想情况下，如果有一个API可以同时为您提供所有条目，那就太好了，但我可以想象很多不可能实现的用例，因此您必须一次又一次地单独发出请求

解决方案没有经过优化。

我不确定是否完全获得了用例，但这应该是可行的。它将遵循文件夹序列，直到到达仅包含文件的文件夹然后返回到上一个父级并再次工作。递归结束当函数尝试备份超过根节点时

如果您在实现它时遇到问题，请告诉我，没有完整的测试用例，我无法进行调试

我根据你的例子做了一些假设

1） id是简单的整数和严格的基数

2）第一个目录的id为0-可以将其更改为其他整数

3）您只是在查找文件计数

如果其中一些不正确，我可以尝试重新设计我的解决方案。但我希望这能让你从正确的方向开始

def iterdir(client, root, viewed=list(), steps=0, filecount=0):
    if root < 0:
        return filecount
    else:
        folder = client.get_folder(root)
        viewed.append(root)
        subdirs = [int(item['id']) for item in folder['item_collection']['entries'] if item['type'] == 'folder' and int(item['id']) not in viewed]
        if len(subdirs) == 0:
            iterdir(client=client, root=root - 1, steps = steps - 1, viewed=viewed, 
                    filecount=filecount+len([item for item in folder['item_collection']['entries'] if item['type'] == 'file']))       
        else:
            nfiles = len(folder['item_collection']['entries']) - len(subdirs)
            iterdir(client=client,
                    root=subdirs.pop(),
                    steps = steps + 1,
                    viewed=viewed, filecount = filecount + nfiles)

def iterdir（客户端、根用户、已查看=列表（）、步骤=0、文件计数=0）：
如果根<0：
返回文件计数
其他：
folder=client.get_文件夹（根目录）
已查看。追加（根）
如果项目['type']='folder'和int（项目['id']）未在查看中，则文件夹['item_collection']['entries']中的项目的subdirs=[int（项目['id']）]
如果len（细分曲面）==0：
iterdir（客户端=客户端，根=根-1，步骤=步骤-1，已查看=已查看，
filecount=filecount+len（[item for item in folder['item_collection']['entries']如果item['type']=='file']））
其他：
nfiles=len（文件夹['item_collection']['entries']）-len（子文件夹）
iterdir（客户端=客户端，
root=subdirs.pop（），
步骤=步骤+1，
已查看=已查看，文件计数=文件计数+n文件）

发布的数据似乎没有完全嵌套。您可以发布包含多个子目录和文件的示例是什么样子的吗？澄清一下，您不是直接使用文件系统，而是使用文件系统的JSON表示形式？您的示例数据表明，您可以简单地获取条目数。它不指示如何表示子文件夹和子文件夹中的文件。例如，“total_count”是否给出了文件（包括文件夹）的总数？@MxyL感谢您的澄清。我已经编辑了我的帖子。好的，我正在使用JSON，而不是直接使用文件系统。总计数是项目的计数，因此子文件夹将计为1，但我想进入每个子文件夹，并获取文件总数。每个文件夹在绝对上下文中是否有唯一的id，或者id相对于父目录是否唯一？我只是想指出-我认为这是一个无限循环，因为文件夹列表的长度始终大于0。@DanTemkin文件夹列表一旦出现，应该变为空没有要检查的文件夹了。哦，好的！我现在看到了。这是一种广度优先的方法。我先去深度。谢谢你@那个雨伞家伙！抱歉，我花了一些时间才回到这个话题。我只是想知道有没有什么算法可以解决这类问题？

def iterdir(client, root, viewed=list(), steps=0, filecount=0):
    if root < 0:
        return filecount
    else:
        folder = client.get_folder(root)
        viewed.append(root)
        subdirs = [int(item['id']) for item in folder['item_collection']['entries'] if item['type'] == 'folder' and int(item['id']) not in viewed]
        if len(subdirs) == 0:
            iterdir(client=client, root=root - 1, steps = steps - 1, viewed=viewed, 
                    filecount=filecount+len([item for item in folder['item_collection']['entries'] if item['type'] == 'file']))       
        else:
            nfiles = len(folder['item_collection']['entries']) - len(subdirs)
            iterdir(client=client,
                    root=subdirs.pop(),
                    steps = steps + 1,
                    viewed=viewed, filecount = filecount + nfiles)