Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/357.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 有没有办法在Colaboratory中打开10GB文件?_Python_Google Drive Api_Google Colaboratory - Fatal编程技术网

Python 有没有办法在Colaboratory中打开10GB文件?

Python 有没有办法在Colaboratory中打开10GB文件?,python,google-drive-api,google-colaboratory,Python,Google Drive Api,Google Colaboratory,在Colaboratory中,在Python3中,我为GPU启用了运行时更改运行时类型 然后我做了这个代码: import pandas as pd import numpy as np # Code to read csv file into Colaboratory: !pip install -U -q PyDrive from pydrive.auth import GoogleAuth from pydrive.drive import GoogleDrive from google

在Colaboratory中,在Python3中,我为GPU启用了运行时更改运行时类型

然后我做了这个代码:

import pandas as pd
import numpy as np

# Code to read csv file into Colaboratory:
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

#Link of a 10GB file in Google Drive
link = ''

fluff, id = link.split('=')
print (id) # Verify that you have everything after '='

downloaded = drive.CreateFile({'id':id}) 
downloaded.GetContentFile('empresa.csv') 
但由于内存不足,我无法打开该文件:您的会话在使用所有可用RAM后崩溃

我有:

连接到“Python 3谷歌计算引擎后端(GPU)” RAM:0.64 GB/12.72 GB磁盘:25.14 GB/358.27 GB

请问,有没有办法增加实验室的内存

免费还是付费

-/-

我尝试过另一种方法,将驱动器作为文件系统装载

from google.colab import drive
drive.mount('/content/gdrive')

with open('/content/gdrive/My Drive/foo.txt', 'w') as f:
  f.write('Hello Google Drive!')
!cat /content/gdrive/My\ Drive/foo.txt

# Drive REST API
from google.colab import auth
auth.authenticate_user()

# Construct a Drive API client
from googleapiclient.discovery import build
drive_service = build('drive', 'v3')

# Downloading data from a Drive file into Python
file_id = ''

import io
from googleapiclient.http import MediaIoBaseDownload

request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
  # _ is a placeholder for a progress object that we ignore.
  # (Our file is small, so we skip reporting progress.)
  _, done = downloader.next_chunk()

downloaded.seek(0)
print('Downloaded file contents are: {}'.format(downloaded.read()))

但问题仍然存在:您的会话在使用所有可用RAM后崩溃

始终可以连接到

我的建议是不要尝试将文件完全加载到内存中


然后,您可以从文件系统中一次一块地增量地直接读取CSV。

非常感谢@Bob Smith。抱歉,我不知道我是否正确理解了教程,所以我更改了上面的问题,并进行了“将驱动器装入文件系统”测试。但是问题一直存在,在一台内存为16GB的Windows计算机或8GB的Linux上,我能够打开这个文件,非常耗时,而且所有东西都关闭了。但这是有效的。我使用块来读取pandas文件中的数据帧,我想知道在Colaboratory中,这是否可能更快,因为机器并非总是如此可用,而且也能够在不关闭所有其他程序的情况下执行此操作