通过Python（Jupyter）从Google云存储中读取.gz文件_Python_Pandas_Google Cloud Storage

通过Python（Jupyter）从Google云存储中读取.gz文件

python pandas google-cloud-storage

通过Python（Jupyter）从Google云存储中读取.gz文件,python,pandas,google-cloud-storage,Python,Pandas,Google Cloud Storage,我正试图通过Jupyter笔记本上的Python从Google云存储中读取一个.gz文件我通过第一个代码得到错误 TypeError:无法将str转换为字节我通过第二个代码得到第二个错误 UnicodeDecodeError:“utf-8”编解码器无法解码位置1中的字节0x8b:无效的开始字节请建议。幸运的是我自己解决了。我希望这对其他人有帮助 client = storage.Client() # get the bucket bucket = client.get_bucket(&

我正试图通过Jupyter笔记本上的Python从Google云存储中读取一个.gz文件

我通过第一个代码得到错误

TypeError:无法将str转换为字节

我通过第二个代码得到第二个错误

UnicodeDecodeError:“utf-8”编解码器无法解码位置1中的字节0x8b:无效的开始字节

请建议。

幸运的是我自己解决了。我希望这对其他人有帮助

client = storage.Client()

# get the bucket
bucket = client.get_bucket("nttcomware")

# get the blob object
blob_name = "test.csv.gz"
blob = bucket.get_blob(blob_name)

# convert blob into string and consider as BytesIO object. Still compressed by gzip
data = io.BytesIO(blob.download_as_string())

# open gzip into csv
with gzip.open(data) as gz:
    #still byte type string
    file = gz.read()
    # erase the .gz extension and get the blob object
    blob_decompress = bucket.blob(blob_name.replace('.gz',''))
    # convert into byte type again
    blob_decompress = blob_decompress.download_as_string()
    # decode the byte type into string by utf-8
    blob_decompress = blob_decompress.decode('utf-8')
    # StringIO object
    s = StringIO(blob_decompress)
    

df = pd.read_csv(s, float_precision="high")
df.head()

问题是在解码gz文件时，请尝试将

s=str（bt，“utf-8”）

替换为

s=str（unicode（bt，errors='replace'））

，然后告诉我它是否有效。感谢您的回答。但是没用。。

from google.cloud import storage
import pandas as pd
from io import StringIO

client = storage.Client()
bucket = client.get_bucket("nttcomware")
blob = bucket.get_blob(f"test.csv.gz")
bt = blob.download_as_string()
s = str(bt, "utf-8")
s = StringIO(s)
df = pd.read_csv(s, compression='gzip', float_precision="high")
df.head()

client = storage.Client()

# get the bucket
bucket = client.get_bucket("nttcomware")

# get the blob object
blob_name = "test.csv.gz"
blob = bucket.get_blob(blob_name)

# convert blob into string and consider as BytesIO object. Still compressed by gzip
data = io.BytesIO(blob.download_as_string())

# open gzip into csv
with gzip.open(data) as gz:
    #still byte type string
    file = gz.read()
    # erase the .gz extension and get the blob object
    blob_decompress = bucket.blob(blob_name.replace('.gz',''))
    # convert into byte type again
    blob_decompress = blob_decompress.download_as_string()
    # decode the byte type into string by utf-8
    blob_decompress = blob_decompress.decode('utf-8')
    # StringIO object
    s = StringIO(blob_decompress)
    

df = pd.read_csv(s, float_precision="high")
df.head()