Python Google驱动器API-列出整个驱动器文件树

Python Google驱动器API-列出整个驱动器文件树,python,google-api,google-drive-api,Python,Google Api,Google Drive Api,我正在构建一个使用Google drive API的python应用程序,因此开发很好,但检索整个Google drive文件树有一个问题,我需要它用于两个目的: 检查路径是否存在,因此如果我想在root/folder1/folder2下上载test.txt,我想检查文件是否已经存在,并在这种情况下更新它 建立一个可视化的文件浏览器,现在我知道谷歌提供了他自己的(我现在不记得名字了,但我知道存在),但我想限制文件浏览器到特定的文件夹 目前,我有一个函数可以获取Gdrive的根,我可以通过递归调用

我正在构建一个使用Google drive API的python应用程序,因此开发很好,但检索整个Google drive文件树有一个问题,我需要它用于两个目的:

  • 检查路径是否存在,因此如果我想在root/folder1/folder2下上载test.txt,我想检查文件是否已经存在,并在这种情况下更新它
  • 建立一个可视化的文件浏览器,现在我知道谷歌提供了他自己的(我现在不记得名字了,但我知道存在),但我想限制文件浏览器到特定的文件夹
  • 目前,我有一个函数可以获取Gdrive的根,我可以通过递归调用一个函数来构建这三个函数,该函数向我列出单个文件夹的内容,但速度非常慢,可能会向google发出数千个请求,这是不可接受的

    下面是获取根目录的函数:

    def drive_get_root():
        """Retrieve a root list of File resources.
           Returns:
             List of dictionaries.
        """
        
        #build the service, the driveHelper module will take care of authentication and credential storage
        drive_service = build('drive', 'v2', driveHelper.buildHttp())
        # the result will be a list
        result = []
        page_token = None
        while True:
            try:
                param = {}
                if page_token:
                    param['pageToken'] = page_token
                files = drive_service.files().list(**param).execute()
                #add the files in the list
                result.extend(files['items'])
                page_token = files.get('nextPageToken')
                if not page_token:
                    break
            except errors.HttpError, _error:
                print 'An error occurred: %s' % _error
            break
        return result
    
    这里是从文件夹中获取文件的方法

    def drive_files_in_folder(folder_id):
        """Print files belonging to a folder.
           Args:
             folder_id: ID of the folder to get files from.
        """
        #build the service, the driveHelper module will take care of authentication and credential storage
        drive_service = build('drive', 'v2', driveHelper.buildHttp())
        # the result will be a list
        result = []
        #code from google, is working so I didn't touch it
        page_token = None
        while True:
            try:
                param = {}
    
                if page_token:
                    param['pageToken'] = page_token
    
                children = drive_service.children().list(folderId=folder_id, **param).execute()
    
                for child in children.get('items', []):
                    result.append(drive_get_file(child['id']))
    
                page_token = children.get('nextPageToken')
                if not page_token:
                    break
            except errors.HttpError, _error:
                print 'An error occurred: %s' % _error
                break       
        return result
    
    例如,现在要检查文件是否存在,我使用以下方法:

    def drive_path_exist(file_path, list = False):
        """
        This is a recursive function to che check if the given path exist
        """
    
        #if the list param is empty set the list as the root of Gdrive
        if list == False:
            list = drive_get_root()
    
        #split the string to get the first item and check if is in the root
        file_path = string.split(file_path, "/")
    
        #if there is only one element in the filepath we are at the actual filename
        #so if is in this folder we can return it
        if len(file_path) == 1:
            exist = False
            for elem in list:
                if elem["title"] == file_path[0]:
                    #set exist = to the elem because the elem is a dictionary with all the file info
                    exist = elem
    
            return exist
        #if we are not at the last element we have to keep searching
        else:
            exist = False
            for elem in list:
                #check if the current item is in the folder
                if elem["title"] == file_path[0]:
                    exist = True
                    folder_id = elem["id"]
                    #delete the first element and keep searching
                    file_path.pop(0)
    
            if exist:
                #recursive call, we have to rejoin the filpath as string an passing as list the list
                #from the drive_file_exist function
                return drive_path_exist("/".join(file_path), drive_files_in_folder(folder_id))
    
    你知道怎么解决我的问题吗?我在这里看到了一些关于溢出的讨论,在一些回答中,人们写道这是可能的,但当然他们并没有说怎么做


    谢谢

    除了非常小的树,它永远不会像那样工作。你必须重新考虑云应用程序的整个算法(你编写它就像你拥有机器的桌面应用程序一样),因为它很容易超时。
    您需要事先镜像树(任务队列和数据存储),这不仅是为了避免超时,也是为了避免驱动器速率限制,并以某种方式保持同步(注册推送等)。一点也不容易。我以前做过驱动器树查看器。

    不要把驱动器看作是一个树结构。事实并非如此。“文件夹”只是标签,例如一个文件可以有多个父文件

    为了在应用程序中构建树的表示,您需要执行以下操作

  • 运行驱动器列表查询以检索所有文件夹
  • 迭代结果数组并检查parents属性以构建内存中的层次结构
  • 运行第二个驱动器列表查询以获取所有非文件夹(即文件)
  • 对于返回的每个文件,将其放在内存树中
  • 如果您只是想检查文件夹B中是否存在文件A,那么方法取决于名称“文件夹B”是否保证唯一

    如果它是唯一的,只需对title='file-a'执行一个FilesList查询,然后对其每个父级执行一个Files Get,并查看其中是否有任何一个被称为“folder-B”

    如果“folder-B”可以同时存在于“folder-C”和“folder-D”下,那么它就更复杂了,您需要从上面的步骤1和步骤2构建内存中的层次结构

    您不能说这些文件和文件夹是由您的应用程序创建的,还是由使用Google Drive Webapp的用户创建的。如果你的应用程序是这些文件/文件夹的创建者,你可以使用一个技巧将搜索限制为单个根。说你有

    MyDrive/app_root/folder-C/folder-B/file-A
    
    您可以将文件夹-C、文件夹-B和文件-A的所有子项设置为app_root

    这样,您可以约束所有查询以包括

    and 'app_root_id' in parents
    

    检查文件是否存在于特定路径中的一种简单方法是: drive_service.files().list(q=“'THE_ID\u OF_SPECIFIC_PATH'in parents and title='a file')。execute())

    要遍历所有文件夹和文件,请执行以下操作:

    import sys, os
    import socket
    
    import googleDriveAccess
    
    import logging
    logging.basicConfig()
    
    FOLDER_TYPE = 'application/vnd.google-apps.folder'
    
    def getlist(ds, q, **kwargs):
      result = None
      npt = ''
      while not npt is None:
        if npt != '': kwargs['pageToken'] = npt
        entries = ds.files().list(q=q, **kwargs).execute()
        if result is None: result = entries
        else: result['items'] += entries['items']
        npt = entries.get('nextPageToken')
      return result
    
    def uenc(u):
      if isinstance(u, unicode): return u.encode('utf-8')
      else: return u
    
    def walk(ds, folderId, folderName, outf, depth):
      spc = ' ' * depth
      outf.write('%s+%s\n%s  %s\n' % (spc, uenc(folderId), spc, uenc(folderName)))
      q = "'%s' in parents and mimeType='%s'" % (folderId, FOLDER_TYPE)
      entries = getlist(ds, q, **{'maxResults': 200})
      for folder in entries['items']:
        walk(ds, folder['id'], folder['title'], outf, depth + 1)
      q = "'%s' in parents and mimeType!='%s'" % (folderId, FOLDER_TYPE)
      entries = getlist(ds, q, **{'maxResults': 200})
      for f in entries['items']:
        outf.write('%s -%s\n%s   %s\n' % (spc, uenc(f['id']), spc, uenc(f['title'])))
    
    def main(basedir):
      da = googleDriveAccess.DAClient(basedir) # clientId=None, script=False
      f = open(os.path.join(basedir, 'hierarchy.txt'), 'wb')
      walk(da.drive_service, 'root', u'root', f, 0)
      f.close()
    
    if __name__ == '__main__':
      logging.getLogger().setLevel(getattr(logging, 'INFO'))
      try:
        main(os.path.dirname(__file__))
      except (socket.gaierror, ), e:
        sys.stderr.write('socket.gaierror')
    

    使用googleDriveAccess github.com/HatsuneMiku/googleDriveAccess

    我同意@pinoyyid-Google drive不是典型的树结构

    <> P>但是,为了打印文件夹结构,我仍然会考虑使用树可视化库(例如类似)。 下面是递归打印google drive文件系统的完整解决方案

    from treelib import Node, Tree
    
    from pydrive.auth import GoogleAuth
    from pydrive.drive import GoogleDrive
    
    gauth = GoogleAuth()
    gauth.LocalWebserverAuth()
    drive = GoogleDrive(gauth)
    
    ### Helper functions ### 
    def get_children(root_folder_id):
        str = "\'" + root_folder_id + "\'" + " in parents and trashed=false"
        file_list = drive.ListFile({'q': str}).GetList()
        return file_list
    
    def get_folder_id(root_folder_id, root_folder_title):
        file_list = get_children(root_folder_id)
        for file in file_list:
            if(file['title'] == root_folder_title):
                return file['id']
    
    def add_children_to_tree(tree, file_list, parent_id):
        for file in file_list:
            tree.create_node(file['title'], file['id'], parent=parent_id)
            print('parent: %s, title: %s, id: %s' % (parent_id, file['title'], file['id']))
    
    ### Recursion over all children ### 
    def populate_tree_recursively(tree,parent_id):
        children = get_children(parent_id)
        add_children_to_tree(tree, children, parent_id)
        if(len(children) > 0):
            for child in children:
                populate_tree_recursively(tree, child['id'])
    
    
    ### Create tree and start populating from root ###
    def main():
        root_folder_title = "your-root-folder"
        root_folder_id = get_folder_id("root", root_folder_title)
    
        tree = Tree()
        tree.create_node(root_folder_title, root_folder_id)
        populate_tree_recursively(tree, root_folder_id)
        tree.show()
    
    if __name__ == "__main__":
        main()
    

    事实上,这是一个桌面应用程序,我知道我的实际代码永远不会工作,但它必须是一种简单的方法来检查文件是否存在于特定路径中,您是如何完成的?根本不回答问题。这不回答问题,因为它不包括后续子目录中的文件。