谷歌硬盘上的wget/curl大文件
我正试图以脚本的形式从谷歌硬盘下载一个文件,但这样做有点困难。我要下载的文件是 我上网查了很多遍,终于找到一本下载下来。我得到了文件的UID,较小的文件(1.6MB)可以正常下载,但是较大的文件(3.7GB)总是重定向到一个页面,该页面会询问我是否希望在不进行病毒扫描的情况下继续下载。有人能帮我通过那个屏幕吗 这是我如何让第一个文件工作的-谷歌硬盘上的wget/curl大文件,curl,google-drive-api,google-colaboratory,wget,google-docs,Curl,Google Drive Api,Google Colaboratory,Wget,Google Docs,我正试图以脚本的形式从谷歌硬盘下载一个文件,但这样做有点困难。我要下载的文件是 我上网查了很多遍,终于找到一本下载下来。我得到了文件的UID,较小的文件(1.6MB)可以正常下载,但是较大的文件(3.7GB)总是重定向到一个页面,该页面会询问我是否希望在不进行病毒扫描的情况下继续下载。有人能帮我通过那个屏幕吗 这是我如何让第一个文件工作的- curl -L "https://docs.google.com/uc?export=download&id=0Bz-w5tutuZIYeDU0VD
curl -L "https://docs.google.com/uc?export=download&id=0Bz-w5tutuZIYeDU0VDRFWG9IVUE" > phlat-1.0.tar.gz
当我在另一个文件上运行相同的
curl -L "https://docs.google.com/uc?export=download&id=0Bz-w5tutuZIYY3h5YlMzTjhnbGM" > index4phlat.tar.gz
我得到以下输出-
我注意到在链接的最后一行的第三行,有一个&confirm=JwkK
,它是一个随机的4个字符的字符串,但建议有一种方法可以将确认添加到我的URL中。我访问的其中一个链接建议&confirm=no\u antivirus
,但这不起作用
我希望这里有人能帮上忙 警告:此功能已弃用。请参阅下面评论中的警告
看看这个问题: 基本上,您必须创建一个公共目录,并通过使用类似
wget https://googledrive.com/host/LARGEPUBLICFOLDERID/index4phlat.tar.gz
或者,您可以使用以下脚本:我无法使用Nanoix的perl脚本,或者我看到的其他curl示例,所以我开始自己用python研究api。这对于小文件来说效果很好,但是大文件阻塞了可用的ram,所以我找到了一些其他不错的分块代码,使用api的部分下载功能。要点如下: 注意关于从API接口下载客户机机密json文件到本地目录的部分 来源
$ cat gdrive_dl.py
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
"""API calls to download a very large google drive file. The drive API only allows downloading to ram
(unlike, say, the Requests library's streaming option) so the files has to be partially downloaded
and chunked. Authentication requires a google api key, and a local download of client_secrets.json
Thanks to Radek for the key functions: http://stackoverflow.com/questions/27617258/memoryerror-how-to-download-large-file-via-google-drive-sdk-using-python
"""
def partial(total_byte_len, part_size_limit):
s = []
for p in range(0, total_byte_len, part_size_limit):
last = min(total_byte_len - 1, p + part_size_limit - 1)
s.append([p, last])
return s
def GD_download_file(service, file_id):
drive_file = service.files().get(fileId=file_id).execute()
download_url = drive_file.get('downloadUrl')
total_size = int(drive_file.get('fileSize'))
s = partial(total_size, 100000000) # I'm downloading BIG files, so 100M chunk size is fine for me
title = drive_file.get('title')
originalFilename = drive_file.get('originalFilename')
filename = './' + originalFilename
if download_url:
with open(filename, 'wb') as file:
print "Bytes downloaded: "
for bytes in s:
headers = {"Range" : 'bytes=%s-%s' % (bytes[0], bytes[1])}
resp, content = service._http.request(download_url, headers=headers)
if resp.status == 206 :
file.write(content)
file.flush()
else:
print 'An error occurred: %s' % resp
return None
print str(bytes[1])+"..."
return title, filename
else:
return None
gauth = GoogleAuth()
gauth.CommandLineAuth() #requires cut and paste from a browser
FILE_ID = 'SOMEID' #FileID is the simple file hash, like 0B1NzlxZ5RpdKS0NOS0x0Ym9kR0U
drive = GoogleDrive(gauth)
service = gauth.service
#file = drive.CreateFile({'id':FILE_ID}) # Use this to get file metadata
GD_download_file(service, FILE_ID)
google drive的默认行为是扫描文件中的病毒。如果文件太大,它会提示用户,并通知用户无法扫描该文件 目前,我找到的唯一解决办法是与web共享文件并创建web资源 从google drive帮助页面中引用: 使用Drive,您可以将web资源(如HTML、CSS和Javascript文件)作为网站进行查看 要使用驱动器托管网页,请执行以下操作:
现在任何人都可以查看您的网页
https://drive.google.com/file/d/0B5IRsLTwEO6CVXFURmpQZ1Jxc0U/view?usp=sharing
https://www.googledrive.com/host/0B5IRsLTwEO6CVXFURmpQZ1Jxc0U
然后复制文件id并创建一个googledrive.com链接,如下所示:
https://drive.google.com/file/d/0B5IRsLTwEO6CVXFURmpQZ1Jxc0U/view?usp=sharing
https://www.googledrive.com/host/0B5IRsLTwEO6CVXFURmpQZ1Jxc0U
这里有一个快速的方法 确保链接是共享的,它看起来会像这样:
https://drive.google.com/file/d/0B5IRsLTwEO6CVXFURmpQZ1Jxc0U/view?usp=sharing
https://www.googledrive.com/host/0B5IRsLTwEO6CVXFURmpQZ1Jxc0U
然后,复制该文件ID并像这样使用它
wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=FILEID' -O FILENAME
如果文件较大并触发病毒检查页面,则可以使用此操作(但它将下载两个文件,一个html文件和实际文件):
有一个开源的多平台客户端,是用Go编写的:。它非常好,功能齐全,并且正在积极开发中
$ drive help pull
Name
pull - pulls remote changes from Google Drive
Description
Downloads content from the remote drive or modifies
local content to match that on your Google Drive
Note: You can skip checksum verification by passing in flag `-ignore-checksum`
* For usage flags: `drive pull -h`
您可以使用开源Linux/Unix命令行工具 要安装它:
gdrive-linux-x64
sudo cp gdrive-linux-x64 /usr/local/bin/gdrive;
sudo chmod a+x /usr/local/bin/gdrive;
https://drive.google.com/open?id=0B7_OwkDsUIgFWXA1B2FPQfV5S8H
。获取?id=
后面的字符串并将其复制到剪贴板。这是文件的IDgdrive download 0B7_OwkDsUIgFWXA1B2FPQfV5S8H
gdrive
以有限的最大速率下载(不淹没网络…),可以使用如下命令(pv
is):
这将显示下载的数据量(
-b
)和下载速率(-r
),并将该速率限制为90 kiB/s(-L 90k
)。以下是解决方法,我将文件从Google Drive下载到我的Google Cloud Linux shell
curl -sc /tmp/gcokie "${ggURL}&id=${ggID}" >/dev/null
getcode="$(awk '/_warning_/ {print $NF}' /tmp/gcokie)"
curl -LOJb /tmp/gcokie "${ggURL}&confirm=${getcode}&id=${ggID}"
echo "gURL" | egrep -o '(\w|-){26,}'
# match more than 26 word characters
echo "gURL" | sed 's/[^A-Za-z0-9_-]/\n/g' | sed -rn '/.{26}/p'
# replace non-word characters with new line,
# print only line with more than 26 word characters
curl -L https://drive.google.com/uc?id={FileID}
#!/usr/bin/env bash
fileid="$1"
destination="$2"
# try to download the file
curl -c /tmp/cookie -L -o /tmp/probe.bin "https://drive.google.com/uc?export=download&id=${fileid}"
probeSize=`du -b /tmp/probe.bin | cut -f1`
# did we get a virus message?
# this will be the first line we get when trying to retrive a large file
bigFileSig='<!DOCTYPE html><html><head><title>Google Drive - Virus scan warning</title><meta http-equiv="content-type" content="text/html; charset=utf-8"/>'
sigSize=${#bigFileSig}
if (( probeSize <= sigSize )); then
virusMessage=false
else
firstBytes=$(head -c $sigSize /tmp/probe.bin)
if [ "$firstBytes" = "$bigFileSig" ]; then
virusMessage=true
else
virusMessage=false
fi
fi
if [ "$virusMessage" = true ] ; then
confirm=$(tr ';' '\n' </tmp/probe.bin | grep confirm)
confirm=${confirm:8:4}
curl -C - -b /tmp/cookie -L -o "$destination" "https://drive.google.com/uc?export=download&id=${fileid}&confirm=${confirm}"
else
mv /tmp/probe.bin "$destination"
fi
curl 'https://doc-0s-80-docs.googleusercontent.com/docs/securesc/aa51s66fhf9273i....................blah blah blah...............gEIqZ3KAQ==' --compressed
#!/bin/bash
SOURCE="$1"
if [ "${SOURCE}" == "" ]; then
echo "Must specify a source url"
exit 1
fi
DEST="$2"
if [ "${DEST}" == "" ]; then
echo "Must specify a destination filename"
exit 1
fi
FILEID=$(echo $SOURCE | rev | cut -d= -f1 | rev)
COOKIES=$(mktemp)
CODE=$(wget --save-cookies $COOKIES --keep-session-cookies --no-check-certificate "https://docs.google.com/uc?export=download&id=${FILEID}" -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/Code: \1\n/p')
# cleanup the code, format is 'Code: XXXX'
CODE=$(echo $CODE | rev | cut -d: -f1 | rev | xargs)
wget --load-cookies $COOKIES "https://docs.google.com/uc?export=download&confirm=${CODE}&id=${FILEID}" -O $DEST
rm -f $COOKIES
wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1HlzTR1-YVoBPlXo0gMFJ_xY4ogMnfzDi' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1HlzTR1-YVoBPlXo0gMFJ_xY4ogMnfzDi" -O besteyewear.zip && rm -rf /tmp/cookies.txt
#!/bin/bash
fileid="FILEIDENTIFIER"
filename="FILENAME"
curl -c ./cookie -s -L "https://drive.google.com/uc?export=download&id=${fileid}" > /dev/null
curl -Lb ./cookie "https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=${fileid}" -o ${filename}
rm -rf /home/**********user***********/URLS_DECODED.txt
COUNTER=0
while read p; do
string=$p
hash="${string#*id=}"
hash="${hash%&*}"
hash="${hash#*file/d/}"
hash="${hash%/*}"
let COUNTER=COUNTER+1
echo "Enlace "$COUNTER" id="$hash
URL_TO_DOWNLOAD=$(wget --spider --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id='$hash -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id="$hash 2>&1 | grep *******Localización***********: | head -c-13 | cut -c16-)
rm -rf /tmp/cookies.txt
echo -e "$URL_TO_DOWNLOAD\r" >> /home/**********user***********/URLS_DECODED.txt
echo "Enlace "$COUNTER" URL="$URL_TO_DOWNLOAD
done < /home/**********user***********/URLS.txt
#!/bin/bash
# Get files from Google Drive
# $1 = file ID
# $2 = file name
URL="https://docs.google.com/uc?export=download&id=$1"
wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate $URL -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=$1" -O $2 && rm -rf /tmp/cookies.txt
./wgetgdrive.sh <file ID> <filename>
./wgetgdrive.sh 1lsDPURlTNzS62xEOAIG98gsaW6x2PYd2 images.zip
gdown https://drive.google.com/uc?id=0B7EVK8r0v71pOXBhSUdJWU1MYUk
sudo python2.7 -m pip install --upgrade youtube_dl
# or
# sudo python3.6 -m pip install --upgrade youtube_dl
https://drive.google.com/file/d/3PIY9dCoWRs-930HHvY-3-FOOPrIVoBAR/view?usp=sharing
(This is not a real file address)
3PIY9dCoWRs-930HHvY-3-FOOPrIVoBAR
youtube-dl https://drive.google.com/open?id=
youtube-dl https://drive.google.com/open?id=3PIY9dCoWRs-930HHvY-3-FOOPrIVoBAR
[GoogleDrive] 3PIY9dCoWRs-930HHvY-3-FOOPrIVoBAR: Downloading webpage
[GoogleDrive] 3PIY9dCoWRs-930HHvY-3-FOOPrIVoBAR: Requesting source file
[download] Destination: your_requested_filename_here-3PIY9dCoWRs-930HHvY-3-FOOPrIVoBAR
[download] 240.37MiB at 2321.53MiB/s (00:01)
pip install gdown
import gdown
url = 'https://drive.google.com/uc?id=0B9P1L--7Wd2vU3VUVlFnbTgtS2c'
output = 'spam.txt'
gdown.download(url, output, quiet=False)
fileid='0B9P1L7Wd2vU3VUVlFnbTgtS2c'
gdown https://drive.google.com/uc?id=+fileid
curl -L "https://drive.google.com/uc?id=AgOATNfjpovfFrft9QYa-P1IeF9e7GWcH&export=download" > phlat-1.0.tar.gz
function curl_gdrive {
GDRIVE_FILE_ID=$1
DEST_PATH=$2
curl -c ./cookie -s -L "https://drive.google.com/uc?export=download&id=${GDRIVE_FILE_ID}" > /dev/null
curl -Lb ./cookie "https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=${GDRIVE_FILE_ID}" -o ${DEST_PATH}
rm -f cookie
}
$ curl_gdrive 153bpzybhfqDspyO_gdbcG5CMlI19ASba imagenet.tar
# this is used for drive directly downloads
function download-google(){
echo "https://drive.google.com/uc?export=download&id=$1"
mkdir -p .tmp
curl -c .tmp/$1cookies "https://drive.google.com/uc?export=download&id=$1" > .tmp/$1intermezzo.html;
curl -L -b .tmp/$1cookies "$(egrep -o "https.+download" .tmp/$1intermezzo.html)" > $2;
}
# some files are shared using an indirect download
function download-google-2(){
echo "https://drive.google.com/uc?export=download&id=$1"
mkdir -p .tmp
curl -c .tmp/$1cookies "https://drive.google.com/uc?export=download&id=$1" > .tmp/$1intermezzo.html;
code=$(egrep -o "confirm=(.+)&id=" .tmp/$1intermezzo.html | cut -d"=" -f2 | cut -d"&" -f1)
curl -L -b .tmp/$1cookies "https://drive.google.com/uc?export=download&confirm=$code&id=$1" > $2;
}
# used like this
download-google <id> <name of item.extension>
rclone copy mygoogledrive:path/to/file /path/to/file/on/local/machine -P