Python 从url加载压缩(.gz).csv文件时出现问题
我正在尝试直接从url加载csv文件。csv文件被压缩为.gz文件:Python 从url加载压缩(.gz).csv文件时出现问题,python,pandas,Python,Pandas,我正在尝试直接从url加载csv文件。csv文件被压缩为.gz文件: #Importing libraries import pandas as pd import requests import io #defining the url url = "https://data.brasil.io/dataset/covid19/caso_full.csv.gz" 错误如下: ------------------------------------------------
#Importing libraries
import pandas as pd
import requests
import io
#defining the url
url = "https://data.brasil.io/dataset/covid19/caso_full.csv.gz"
错误如下:
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-28-58ebbb6aba80> in <module>
7 url = "https://data.brasil.io/dataset/covid19/caso_full.csv.gz"
8 s=requests.get(url).content
----> 9 df=pd.read_csv(io.StringIO(s.decode('utf-8')), sep=',', compression='gzip', index_col=0, quotechar='"')
10
11 #df=pd.read_csv("caso_full.csv.gz")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
s=requests.get(url).content
df=pd.read_csv(io.StringIO(s.decode('utf-8')), sep=',', compression='gzip', index_col=0, quotechar='"')
有关于为什么会发生这种情况的提示吗
谢谢大家! 问题是您正在解码内容,然后使用
io.StringIO
解决方案是不解码字节并使用io.BytesIO
请参阅此堆栈溢出回答:
url以GNU ZIP的形式返回内容pd.read\u csv
需要一个文件路径或缓冲区作为其第一个参数。因为内容是字节,所以必须使用io.BytesIO
对象。Pandas然后将数据解压缩到CSV文件中
import io
import pandas as pd
import requests
# defining the url
url = "https://data.brasil.io/dataset/covid19/caso_full.csv.gz"
response = requests.get(url)
content = response.content
print(type(content))
df = pd.read_csv(
io.BytesIO(content), sep=",", compression="gzip", index_col=0, quotechar='"',
)
print(df.head())
输出:
<class 'bytes'>
你能发布例外情况吗?您能打印申请的s和返回代码吗?非常感谢!请不要忘记接受答案。答案旁边将出现一个绿色复选标记。
<class 'bytes'>
city_ibge_code date epidemiological_week estimated_population_2019 ... place_type state new_confirmed new_deaths
city ...
São Paulo 3550308.0 2020-02-25 9 12252023.0 ... city SP 1 0
NaN 35.0 2020-02-25 9 45919049.0 ... state SP 1 0
São Paulo 3550308.0 2020-02-26 9 12252023.0 ... city SP 0 0
NaN 35.0 2020-02-26 9 45919049.0 ... state SP 0 0
São Paulo 3550308.0 2020-02-27 9 12252023.0 ... city SP 0 0
[5 rows x 16 columns]