如何使用pandas 0.17.1和python 3.5从internet下载压缩文件_Pandas_Download_Zip

如何使用pandas 0.17.1和python 3.5从internet下载压缩文件

pandas download

如何使用pandas 0.17.1和python 3.5从internet下载压缩文件,pandas,download,zip,Pandas,Download,Zip,我做错了什么？以下是我试图做的： import pandas as pd url='http://data.octo.dc.gov/feeds/crime_incidents/archive/crime_incidents_2013_CSV.zip' df = pd.read_csv(url, compression='gzip', header=0, sep=',', quotechar='"', engine = 'py

我做错了什么？以下是我试图做的：

import pandas as pd

url='http://data.octo.dc.gov/feeds/crime_incidents/archive/crime_incidents_2013_CSV.zip'

df = pd.read_csv(url, compression='gzip',
                 header=0, sep=',', quotechar='"',
                 engine = 'python')

IIUC这是一个解决方案，它不是直接将zip文件传递给pandas，而是先将其解压缩，然后传递csv文件：

并将产生如下数据帧：

@阿巴斯，非常感谢。事实上，我一步一步地运行它，下面是我得出的结论。确实不是最快的，但它工作得很好。我在Mac上用Python3.5.1运行pandas 0.18.1

我希望这有帮助。谢谢

在Windows上的Python 3.6中，Cy Bu的答案对我来说不太合适。试图打开文件时，我遇到了一个无效的参数错误。我稍微修改了一下：

import os
from urllib.request import urlopen, Request

r = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
b2 = [z for z in url.split('/') if '.zip' in z][0] #gets just the '.zip' part of the url

with open(b2, "wb") as target:
    target.write(urlopen(r).read()) #saves to file to disk

data = pd.read_csv(b2, compression='zip') #opens the saved zip file
os.remove(b2) #removes the zip file

您面临的问题是什么？此解决方案不起作用，因为我似乎无法使用from urllib导入url open。。。我遇到的问题是，我的代码引发了错误消息：File/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/zipfile.py，第1093行，在_realgetcontentsraise中，BadZipFile不是zip文件zipfile.BadZipFile:文件不是zip文件。您可以使用浏览器下载zip文件吗？您可以使用7zip或类似的应用程序解压它并将csv文件加载到熊猫吗？是的，我可以下载任何zip文件并在mac上解压，而无需手动操作。就像我可以在上面例子中给出的那个文件上做的那样。好吧，如果是这样，我没有一个Mac来评估这个解决方案，理想情况下它应该可以工作。如果没有，那么一步一步地去评估你在每一步之后得到了什么。就像把url.read的内容写进一个文件，对它进行评估，然后每一步都进行。这个答案对我来说是唯一有效的解决方案！略微更新了它，因为如果您试图以编程方式访问API，通常会出现403禁止的错误。然后必须在对API的请求中指定“用户代理”。

from zipfile import ZipFile
from urllib.request import urlopen   
import pandas as pd
import os

URL = \
    'http://data.octo.dc.gov/feeds/crime_incidents/archive/crime_incidents_2013_CSV.zip'

# open and save the zip file onto computer
url = urlopen(URL)
output = open('zipFile.zip', 'wb')    # note the flag:  "wb"        
output.write(url.read())
output.close()

# read the zip file as a pandas dataframe
df = pd.read_csv('zipFile.zip')   # pandas version 0.18.1 takes zip files       

# if keeping on disk the zip file is not wanted, then:
os.remove(zipName)   # remove the copy of the zipfile on disk

import os
from urllib.request import urlopen, Request

r = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
b2 = [z for z in url.split('/') if '.zip' in z][0] #gets just the '.zip' part of the url

with open(b2, "wb") as target:
    target.write(urlopen(r).read()) #saves to file to disk

data = pd.read_csv(b2, compression='zip') #opens the saved zip file
os.remove(b2) #removes the zip file