谷歌硬盘上的wget/curl大文件

谷歌硬盘上的wget/curl大文件,curl,google-drive-api,google-colaboratory,wget,google-docs,Curl,Google Drive Api,Google Colaboratory,Wget,Google Docs,我正试图以脚本的形式从谷歌硬盘下载一个文件,但这样做有点困难。我要下载的文件是 我上网查了很多遍,终于找到一本下载下来。我得到了文件的UID,较小的文件(1.6MB)可以正常下载,但是较大的文件(3.7GB)总是重定向到一个页面,该页面会询问我是否希望在不进行病毒扫描的情况下继续下载。有人能帮我通过那个屏幕吗 这是我如何让第一个文件工作的- curl -L "https://docs.google.com/uc?export=download&id=0Bz-w5tutuZIYeDU0VD

我正试图以脚本的形式从谷歌硬盘下载一个文件,但这样做有点困难。我要下载的文件是

我上网查了很多遍,终于找到一本下载下来。我得到了文件的UID,较小的文件(1.6MB)可以正常下载,但是较大的文件(3.7GB)总是重定向到一个页面,该页面会询问我是否希望在不进行病毒扫描的情况下继续下载。有人能帮我通过那个屏幕吗

这是我如何让第一个文件工作的-

curl -L "https://docs.google.com/uc?export=download&id=0Bz-w5tutuZIYeDU0VDRFWG9IVUE" > phlat-1.0.tar.gz
当我在另一个文件上运行相同的

curl -L "https://docs.google.com/uc?export=download&id=0Bz-w5tutuZIYY3h5YlMzTjhnbGM" > index4phlat.tar.gz
我得到以下输出-

我注意到在链接的最后一行的第三行,有一个
&confirm=JwkK
,它是一个随机的4个字符的字符串,但建议有一种方法可以将确认添加到我的URL中。我访问的其中一个链接建议
&confirm=no\u antivirus
,但这不起作用


我希望这里有人能帮上忙

警告:此功能已弃用。请参阅下面评论中的警告


看看这个问题:

基本上,您必须创建一个公共目录,并通过使用类似

wget https://googledrive.com/host/LARGEPUBLICFOLDERID/index4phlat.tar.gz

或者,您可以使用以下脚本:

我无法使用Nanoix的perl脚本,或者我看到的其他curl示例,所以我开始自己用python研究api。这对于小文件来说效果很好,但是大文件阻塞了可用的ram,所以我找到了一些其他不错的分块代码,使用api的部分下载功能。要点如下:

注意关于从API接口下载客户机机密json文件到本地目录的部分

来源
$ cat gdrive_dl.py
from pydrive.auth import GoogleAuth  
from pydrive.drive import GoogleDrive    

"""API calls to download a very large google drive file.  The drive API only allows downloading to ram 
   (unlike, say, the Requests library's streaming option) so the files has to be partially downloaded
   and chunked.  Authentication requires a google api key, and a local download of client_secrets.json
   Thanks to Radek for the key functions: http://stackoverflow.com/questions/27617258/memoryerror-how-to-download-large-file-via-google-drive-sdk-using-python
"""

def partial(total_byte_len, part_size_limit):
    s = []
    for p in range(0, total_byte_len, part_size_limit):
        last = min(total_byte_len - 1, p + part_size_limit - 1)
        s.append([p, last])
    return s

def GD_download_file(service, file_id):
  drive_file = service.files().get(fileId=file_id).execute()
  download_url = drive_file.get('downloadUrl')
  total_size = int(drive_file.get('fileSize'))
  s = partial(total_size, 100000000) # I'm downloading BIG files, so 100M chunk size is fine for me
  title = drive_file.get('title')
  originalFilename = drive_file.get('originalFilename')
  filename = './' + originalFilename
  if download_url:
      with open(filename, 'wb') as file:
        print "Bytes downloaded: "
        for bytes in s:
          headers = {"Range" : 'bytes=%s-%s' % (bytes[0], bytes[1])}
          resp, content = service._http.request(download_url, headers=headers)
          if resp.status == 206 :
                file.write(content)
                file.flush()
          else:
            print 'An error occurred: %s' % resp
            return None
          print str(bytes[1])+"..."
      return title, filename
  else:
    return None          


gauth = GoogleAuth()
gauth.CommandLineAuth() #requires cut and paste from a browser 

FILE_ID = 'SOMEID' #FileID is the simple file hash, like 0B1NzlxZ5RpdKS0NOS0x0Ym9kR0U

drive = GoogleDrive(gauth)
service = gauth.service
#file = drive.CreateFile({'id':FILE_ID})    # Use this to get file metadata
GD_download_file(service, FILE_ID) 

google drive的默认行为是扫描文件中的病毒。如果文件太大,它会提示用户,并通知用户无法扫描该文件

目前,我找到的唯一解决办法是与web共享文件并创建web资源

从google drive帮助页面中引用:

使用Drive,您可以将web资源(如HTML、CSS和Javascript文件)作为网站进行查看

要使用驱动器托管网页,请执行以下操作:

  • 在Drive.google.com上打开驱动器并选择一个文件
  • 单击页面顶部的共享按钮
  • 单击共享框右下角的高级
  • 单击更改…
  • 选择“网上公开”并单击“保存”
  • 在关闭共享框之前,请从“链接到共享”下面字段中的URL复制文档ID。文档ID是URL中斜杠之间大小写字母和数字的字符串
  • 共享类似“www.googledrive.com/host/[doc id]的URL,其中[doc id]被您在步骤6中复制的文档id替换。
    现在任何人都可以查看您的网页
  • 可在此处找到:

    例如,当您在google drive上公开共享文件时,sharelink如下所示:

    https://drive.google.com/file/d/0B5IRsLTwEO6CVXFURmpQZ1Jxc0U/view?usp=sharing
    
    https://www.googledrive.com/host/0B5IRsLTwEO6CVXFURmpQZ1Jxc0U
    
    然后复制文件id并创建一个googledrive.com链接,如下所示:

    https://drive.google.com/file/d/0B5IRsLTwEO6CVXFURmpQZ1Jxc0U/view?usp=sharing
    
    https://www.googledrive.com/host/0B5IRsLTwEO6CVXFURmpQZ1Jxc0U
    

    这里有一个快速的方法

    确保链接是共享的,它看起来会像这样:

    https://drive.google.com/file/d/0B5IRsLTwEO6CVXFURmpQZ1Jxc0U/view?usp=sharing
    
    https://www.googledrive.com/host/0B5IRsLTwEO6CVXFURmpQZ1Jxc0U
    

    然后,复制该文件ID并像这样使用它

    wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=FILEID' -O FILENAME
    
    如果文件较大并触发病毒检查页面,则可以使用此操作(但它将下载两个文件,一个html文件和实际文件):


    有一个开源的多平台客户端,是用Go编写的:。它非常好,功能齐全,并且正在积极开发中

    $ drive help pull
    Name
            pull - pulls remote changes from Google Drive
    Description
            Downloads content from the remote drive or modifies
             local content to match that on your Google Drive
    
    Note: You can skip checksum verification by passing in flag `-ignore-checksum`
    
    * For usage flags: `drive pull -h`
    

    您可以使用开源Linux/Unix命令行工具

    要安装它:

  • 二进制文件。选择适合您的体系结构的文件,例如
    gdrive-linux-x64

  • 将其复制到您的路径。

    sudo cp gdrive-linux-x64 /usr/local/bin/gdrive;
    sudo chmod a+x /usr/local/bin/gdrive;
    
  • 使用它:

  • 确定Google驱动器文件ID。为此,右键单击Google驱动器网站中所需的文件,然后选择“获取链接…”…“。它将返回类似
    https://drive.google.com/open?id=0B7_OwkDsUIgFWXA1B2FPQfV5S8H
    。获取
    ?id=
    后面的字符串并将其复制到剪贴板。这是文件的ID

  • 下载文件。当然,在下面的命令中使用文件的ID

    gdrive download 0B7_OwkDsUIgFWXA1B2FPQfV5S8H
    
  • 首次使用时,该工具需要获得对Google Drive API的访问权限。为此,它将向您显示一个您必须在浏览器中访问的链接,然后您将获得一个验证代码以复制并粘贴回该工具。然后下载会自动开始。没有进度指示器,但您可以在文件管理器或第二个终端中观察进度

    来源:关于这里的另一个答案

    附加技巧:速率限制。要使用
    gdrive
    以有限的最大速率下载(不淹没网络…),可以使用如下命令(
    pv
    is):


    这将显示下载的数据量(
    -b
    )和下载速率(
    -r
    ),并将该速率限制为90 kiB/s(
    -L 90k
    )。

    以下是解决方法,我将文件从Google Drive下载到我的Google Cloud Linux shell

  • 使用高级共享将文件共享给公众并具有编辑权限
  • 您将获得一个具有ID的共享链接。请参阅该链接:- drive.google.com/file/d/[ID]/view?usp=共享
  • 复制该ID并将其粘贴到以下链接中:-
  • googledrive.com/host/[ID]

  • 上面的链接将是我们的下载链接
  • 使用wget下载文件:-
  • wget

  • 此命令将下载名为[ID]的文件,该文件不带扩展名,但在运行该命令的同一位置具有相同的文件大小
    curl -sc /tmp/gcokie "${ggURL}&id=${ggID}" >/dev/null  
    getcode="$(awk '/_warning_/ {print $NF}' /tmp/gcokie)"  
    curl -LOJb /tmp/gcokie "${ggURL}&confirm=${getcode}&id=${ggID}" 
    
    echo "gURL" | egrep -o '(\w|-){26,}'  
    # match more than 26 word characters  
    
    echo "gURL" | sed 's/[^A-Za-z0-9_-]/\n/g' | sed -rn '/.{26}/p'  
    # replace non-word characters with new line,   
    # print only line with more than 26 word characters 
    
    curl -L https://drive.google.com/uc?id={FileID}
    
    #!/usr/bin/env bash
    fileid="$1"
    destination="$2"
    
    # try to download the file
    curl -c /tmp/cookie -L -o /tmp/probe.bin "https://drive.google.com/uc?export=download&id=${fileid}"
    probeSize=`du -b /tmp/probe.bin | cut -f1`
    
    # did we get a virus message?
    # this will be the first line we get when trying to retrive a large file
    bigFileSig='<!DOCTYPE html><html><head><title>Google Drive - Virus scan warning</title><meta http-equiv="content-type" content="text/html; charset=utf-8"/>'
    sigSize=${#bigFileSig}
    
    if (( probeSize <= sigSize )); then
      virusMessage=false
    else
      firstBytes=$(head -c $sigSize /tmp/probe.bin)
      if [ "$firstBytes" = "$bigFileSig" ]; then
        virusMessage=true
      else
        virusMessage=false
      fi
    fi
    
    if [ "$virusMessage" = true ] ; then
      confirm=$(tr ';' '\n' </tmp/probe.bin | grep confirm)
      confirm=${confirm:8:4}
      curl -C - -b /tmp/cookie -L -o "$destination" "https://drive.google.com/uc?export=download&id=${fileid}&confirm=${confirm}"
    else
      mv /tmp/probe.bin "$destination"
    fi
    
    curl 'https://doc-0s-80-docs.googleusercontent.com/docs/securesc/aa51s66fhf9273i....................blah blah blah...............gEIqZ3KAQ==' --compressed
    
    #!/bin/bash
    
    SOURCE="$1"
    if [ "${SOURCE}" == "" ]; then
        echo "Must specify a source url"
        exit 1
    fi
    
    DEST="$2"
    if [ "${DEST}" == "" ]; then
        echo "Must specify a destination filename"
        exit 1
    fi
    
    FILEID=$(echo $SOURCE | rev | cut -d= -f1 | rev)
    COOKIES=$(mktemp)
    
    CODE=$(wget --save-cookies $COOKIES --keep-session-cookies --no-check-certificate "https://docs.google.com/uc?export=download&id=${FILEID}" -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/Code: \1\n/p')
    
    # cleanup the code, format is 'Code: XXXX'
    CODE=$(echo $CODE | rev | cut -d: -f1 | rev | xargs)
    
    wget --load-cookies $COOKIES "https://docs.google.com/uc?export=download&confirm=${CODE}&id=${FILEID}" -O $DEST
    
    rm -f $COOKIES
    
    wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1HlzTR1-YVoBPlXo0gMFJ_xY4ogMnfzDi' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1HlzTR1-YVoBPlXo0gMFJ_xY4ogMnfzDi" -O besteyewear.zip && rm -rf /tmp/cookies.txt
    
    #!/bin/bash
    fileid="FILEIDENTIFIER"
    filename="FILENAME"
    curl -c ./cookie -s -L "https://drive.google.com/uc?export=download&id=${fileid}" > /dev/null
    curl -Lb ./cookie "https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=${fileid}" -o ${filename}
    
    rm -rf /home/**********user***********/URLS_DECODED.txt
    COUNTER=0
    while read p; do 
        string=$p
        hash="${string#*id=}"
        hash="${hash%&*}"
        hash="${hash#*file/d/}"
        hash="${hash%/*}"
        let COUNTER=COUNTER+1
        echo "Enlace "$COUNTER" id="$hash
        URL_TO_DOWNLOAD=$(wget --spider --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id='$hash -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id="$hash 2>&1 | grep *******Localización***********: | head -c-13 | cut -c16-)
        rm -rf /tmp/cookies.txt
        echo -e "$URL_TO_DOWNLOAD\r" >> /home/**********user***********/URLS_DECODED.txt
        echo "Enlace "$COUNTER" URL="$URL_TO_DOWNLOAD
    done < /home/**********user***********/URLS.txt
    
    #!/bin/bash
    
    # Get files from Google Drive
    
    # $1 = file ID
    # $2 = file name
    
    URL="https://docs.google.com/uc?export=download&id=$1"
    
    wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate $URL -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=$1" -O $2 && rm -rf /tmp/cookies.txt
    
    ./wgetgdrive.sh <file ID> <filename>
    
    ./wgetgdrive.sh 1lsDPURlTNzS62xEOAIG98gsaW6x2PYd2 images.zip
    
    gdown https://drive.google.com/uc?id=0B7EVK8r0v71pOXBhSUdJWU1MYUk
    
    sudo python2.7 -m pip install --upgrade youtube_dl 
    # or 
    # sudo python3.6 -m pip install --upgrade youtube_dl
    
    https://drive.google.com/file/d/3PIY9dCoWRs-930HHvY-3-FOOPrIVoBAR/view?usp=sharing       
    (This is not a real file address)
    
    3PIY9dCoWRs-930HHvY-3-FOOPrIVoBAR
    
    youtube-dl https://drive.google.com/open?id=
    
    youtube-dl https://drive.google.com/open?id=3PIY9dCoWRs-930HHvY-3-FOOPrIVoBAR
    
    [GoogleDrive] 3PIY9dCoWRs-930HHvY-3-FOOPrIVoBAR: Downloading webpage
    [GoogleDrive] 3PIY9dCoWRs-930HHvY-3-FOOPrIVoBAR: Requesting source file
    [download] Destination: your_requested_filename_here-3PIY9dCoWRs-930HHvY-3-FOOPrIVoBAR
    [download] 240.37MiB at  2321.53MiB/s (00:01)
    
    pip install gdown
    
    import gdown
    
    url = 'https://drive.google.com/uc?id=0B9P1L--7Wd2vU3VUVlFnbTgtS2c'
    output = 'spam.txt'
    gdown.download(url, output, quiet=False)
    
    fileid='0B9P1L7Wd2vU3VUVlFnbTgtS2c'
    
    gdown https://drive.google.com/uc?id=+fileid
    
    curl -L "https://drive.google.com/uc?id=AgOATNfjpovfFrft9QYa-P1IeF9e7GWcH&export=download" > phlat-1.0.tar.gz
    
    function curl_gdrive {
    
        GDRIVE_FILE_ID=$1
        DEST_PATH=$2
    
        curl -c ./cookie -s -L "https://drive.google.com/uc?export=download&id=${GDRIVE_FILE_ID}" > /dev/null
        curl -Lb ./cookie "https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=${GDRIVE_FILE_ID}" -o ${DEST_PATH}
        rm -f cookie
    }
    
       $ curl_gdrive 153bpzybhfqDspyO_gdbcG5CMlI19ASba imagenet.tar
    
    # this is used for drive directly downloads
    function download-google(){
      echo "https://drive.google.com/uc?export=download&id=$1"
      mkdir -p .tmp
      curl -c .tmp/$1cookies "https://drive.google.com/uc?export=download&id=$1" > .tmp/$1intermezzo.html;
      curl -L -b .tmp/$1cookies "$(egrep -o "https.+download" .tmp/$1intermezzo.html)" > $2;
    }
    
    # some files are shared using an indirect download
    function download-google-2(){
      echo "https://drive.google.com/uc?export=download&id=$1"
      mkdir -p .tmp
      curl -c .tmp/$1cookies "https://drive.google.com/uc?export=download&id=$1" > .tmp/$1intermezzo.html;
      code=$(egrep -o "confirm=(.+)&amp;id=" .tmp/$1intermezzo.html | cut -d"=" -f2 | cut -d"&" -f1)
      curl -L -b .tmp/$1cookies "https://drive.google.com/uc?export=download&confirm=$code&id=$1" > $2;
    }
    
    # used like this
    download-google <id> <name of item.extension>
    
    rclone copy mygoogledrive:path/to/file /path/to/file/on/local/machine -P