Python Google驱动器API-列出整个驱动器文件树_Python_Google Api_Google Drive Api

Python Google驱动器API-列出整个驱动器文件树

python google-api google-drive-api

Python Google驱动器API-列出整个驱动器文件树,python,google-api,google-drive-api,Python,Google Api,Google Drive Api,我正在构建一个使用Google drive API的python应用程序，因此开发很好，但检索整个Google drive文件树有一个问题，我需要它用于两个目的：检查路径是否存在，因此如果我想在root/folder1/folder2下上载test.txt，我想检查文件是否已经存在，并在这种情况下更新它建立一个可视化的文件浏览器，现在我知道谷歌提供了他自己的（我现在不记得名字了，但我知道存在），但我想限制文件浏览器到特定的文件夹目前，我有一个函数可以获取Gdrive的根，我可以通过递归调用

我正在构建一个使用Google drive API的python应用程序，因此开发很好，但检索整个Google drive文件树有一个问题，我需要它用于两个目的：

检查路径是否存在，因此如果我想在root/folder1/folder2下上载test.txt，我想检查文件是否已经存在，并在这种情况下更新它

建立一个可视化的文件浏览器，现在我知道谷歌提供了他自己的（我现在不记得名字了，但我知道存在），但我想限制文件浏览器到特定的文件夹

目前，我有一个函数可以获取Gdrive的根，我可以通过递归调用一个函数来构建这三个函数，该函数向我列出单个文件夹的内容，但速度非常慢，可能会向google发出数千个请求，这是不可接受的

下面是获取根目录的函数：

def drive_get_root():
    """Retrieve a root list of File resources.
       Returns:
         List of dictionaries.
    """
    
    #build the service, the driveHelper module will take care of authentication and credential storage
    drive_service = build('drive', 'v2', driveHelper.buildHttp())
    # the result will be a list
    result = []
    page_token = None
    while True:
        try:
            param = {}
            if page_token:
                param['pageToken'] = page_token
            files = drive_service.files().list(**param).execute()
            #add the files in the list
            result.extend(files['items'])
            page_token = files.get('nextPageToken')
            if not page_token:
                break
        except errors.HttpError, _error:
            print 'An error occurred: %s' % _error
        break
    return result

这里是从文件夹中获取文件的方法

def drive_files_in_folder(folder_id):
    """Print files belonging to a folder.
       Args:
         folder_id: ID of the folder to get files from.
    """
    #build the service, the driveHelper module will take care of authentication and credential storage
    drive_service = build('drive', 'v2', driveHelper.buildHttp())
    # the result will be a list
    result = []
    #code from google, is working so I didn't touch it
    page_token = None
    while True:
        try:
            param = {}

            if page_token:
                param['pageToken'] = page_token

            children = drive_service.children().list(folderId=folder_id, **param).execute()

            for child in children.get('items', []):
                result.append(drive_get_file(child['id']))

            page_token = children.get('nextPageToken')
            if not page_token:
                break
        except errors.HttpError, _error:
            print 'An error occurred: %s' % _error
            break       
    return result

例如，现在要检查文件是否存在，我使用以下方法：

def drive_path_exist(file_path, list = False):
    """
    This is a recursive function to che check if the given path exist
    """

    #if the list param is empty set the list as the root of Gdrive
    if list == False:
        list = drive_get_root()

    #split the string to get the first item and check if is in the root
    file_path = string.split(file_path, "/")

    #if there is only one element in the filepath we are at the actual filename
    #so if is in this folder we can return it
    if len(file_path) == 1:
        exist = False
        for elem in list:
            if elem["title"] == file_path[0]:
                #set exist = to the elem because the elem is a dictionary with all the file info
                exist = elem

        return exist
    #if we are not at the last element we have to keep searching
    else:
        exist = False
        for elem in list:
            #check if the current item is in the folder
            if elem["title"] == file_path[0]:
                exist = True
                folder_id = elem["id"]
                #delete the first element and keep searching
                file_path.pop(0)

        if exist:
            #recursive call, we have to rejoin the filpath as string an passing as list the list
            #from the drive_file_exist function
            return drive_path_exist("/".join(file_path), drive_files_in_folder(folder_id))

你知道怎么解决我的问题吗？我在这里看到了一些关于溢出的讨论，在一些回答中，人们写道这是可能的，但当然他们并没有说怎么做

谢谢

除了非常小的树，它永远不会像那样工作。你必须重新考虑云应用程序的整个算法（你编写它就像你拥有机器的桌面应用程序一样），因为它很容易超时。

您需要事先镜像树（任务队列和数据存储），这不仅是为了避免超时，也是为了避免驱动器速率限制，并以某种方式保持同步（注册推送等）。一点也不容易。我以前做过驱动器树查看器。

不要把驱动器看作是一个树结构。事实并非如此。“文件夹”只是标签，例如一个文件可以有多个父文件

为了在应用程序中构建树的表示，您需要执行以下操作

运行驱动器列表查询以检索所有文件夹

迭代结果数组并检查parents属性以构建内存中的层次结构

运行第二个驱动器列表查询以获取所有非文件夹（即文件）

对于返回的每个文件，将其放在内存树中

如果您只是想检查文件夹B中是否存在文件A，那么方法取决于名称“文件夹B”是否保证唯一

如果它是唯一的，只需对title='file-a'执行一个FilesList查询，然后对其每个父级执行一个Files Get，并查看其中是否有任何一个被称为“folder-B”

如果“folder-B”可以同时存在于“folder-C”和“folder-D”下，那么它就更复杂了，您需要从上面的步骤1和步骤2构建内存中的层次结构

您不能说这些文件和文件夹是由您的应用程序创建的，还是由使用Google Drive Webapp的用户创建的。如果你的应用程序是这些文件/文件夹的创建者，你可以使用一个技巧将搜索限制为单个根。说你有

MyDrive/app_root/folder-C/folder-B/file-A

您可以将文件夹-C、文件夹-B和文件-A的所有子项设置为app_root

这样，您可以约束所有查询以包括

and 'app_root_id' in parents

检查文件是否存在于特定路径中的一种简单方法是： drive_service.files（）.list（q=“'THE_ID\u OF_SPECIFIC_PATH'in parents and title='a file'）。execute（））

要遍历所有文件夹和文件，请执行以下操作：

import sys, os
import socket

import googleDriveAccess

import logging
logging.basicConfig()

FOLDER_TYPE = 'application/vnd.google-apps.folder'

def getlist(ds, q, **kwargs):
  result = None
  npt = ''
  while not npt is None:
    if npt != '': kwargs['pageToken'] = npt
    entries = ds.files().list(q=q, **kwargs).execute()
    if result is None: result = entries
    else: result['items'] += entries['items']
    npt = entries.get('nextPageToken')
  return result

def uenc(u):
  if isinstance(u, unicode): return u.encode('utf-8')
  else: return u

def walk(ds, folderId, folderName, outf, depth):
  spc = ' ' * depth
  outf.write('%s+%s\n%s  %s\n' % (spc, uenc(folderId), spc, uenc(folderName)))
  q = "'%s' in parents and mimeType='%s'" % (folderId, FOLDER_TYPE)
  entries = getlist(ds, q, **{'maxResults': 200})
  for folder in entries['items']:
    walk(ds, folder['id'], folder['title'], outf, depth + 1)
  q = "'%s' in parents and mimeType!='%s'" % (folderId, FOLDER_TYPE)
  entries = getlist(ds, q, **{'maxResults': 200})
  for f in entries['items']:
    outf.write('%s -%s\n%s   %s\n' % (spc, uenc(f['id']), spc, uenc(f['title'])))

def main(basedir):
  da = googleDriveAccess.DAClient(basedir) # clientId=None, script=False
  f = open(os.path.join(basedir, 'hierarchy.txt'), 'wb')
  walk(da.drive_service, 'root', u'root', f, 0)
  f.close()

if __name__ == '__main__':
  logging.getLogger().setLevel(getattr(logging, 'INFO'))
  try:
    main(os.path.dirname(__file__))
  except (socket.gaierror, ), e:
    sys.stderr.write('socket.gaierror')

使用googleDriveAccess github.com/HatsuneMiku/googleDriveAccess

我同意@pinoyyid-Google drive不是典型的树结构

<> P>但是，为了打印文件夹结构，我仍然会考虑使用树可视化库（例如类似）。下面是递归打印google drive文件系统的完整解决方案

from treelib import Node, Tree from pydrive.auth import GoogleAuth from pydrive.drive import GoogleDrive gauth = GoogleAuth() gauth.LocalWebserverAuth() drive = GoogleDrive(gauth) ### Helper functions ### def get_children(root_folder_id): str = "\'" + root_folder_id + "\'" + " in parents and trashed=false" file_list = drive.ListFile({'q': str}).GetList() return file_list def get_folder_id(root_folder_id, root_folder_title): file_list = get_children(root_folder_id) for file in file_list: if(file['title'] == root_folder_title): return file['id'] def add_children_to_tree(tree, file_list, parent_id): for file in file_list: tree.create_node(file['title'], file['id'], parent=parent_id) print('parent: %s, title: %s, id: %s' % (parent_id, file['title'], file['id'])) ### Recursion over all children ### def populate_tree_recursively(tree,parent_id): children = get_children(parent_id) add_children_to_tree(tree, children, parent_id) if(len(children) > 0): for child in children: populate_tree_recursively(tree, child['id']) ### Create tree and start populating from root ### def main(): root_folder_title = "your-root-folder" root_folder_id = get_folder_id("root", root_folder_title) tree = Tree() tree.create_node(root_folder_title, root_folder_id) populate_tree_recursively(tree, root_folder_id) tree.show() if __name__ == "__main__": main()

事实上，这是一个桌面应用程序，我知道我的实际代码永远不会工作，但它必须是一种简单的方法来检查文件是否存在于特定路径中，您是如何完成的？根本不回答问题。这不回答问题，因为它不包括后续子目录中的文件。