如何使用python在zip中打开zip中的csv？_Python_Pandas_Csv_Zip

如何使用python在zip中打开zip中的csv？

python pandas csv

如何使用python在zip中打开zip中的csv？,python,pandas,csv,zip,Python,Pandas,Csv,Zip,我一直在使用一个用户定义的函数来打开包含在ZIP文件中的CSV文件，这对我来说非常有效现在我试图打开一个CSV文件，它包含在一个ZIP中，它包含在另一个ZIP中，但遇到了一些问题我没有从CSV获取数据帧的预期输出，而是得到以下错误： UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfd in position 0: invalid start byte 哪种方式有意义，因为我正在尝试使用read\u csv（）这是我一直在使用

我一直在使用一个用户定义的函数来打开包含在ZIP文件中的CSV文件，这对我来说非常有效

现在我试图打开一个CSV文件，它包含在一个ZIP中，它包含在另一个ZIP中，但遇到了一些问题

我没有从CSV获取数据帧的预期输出，而是得到以下错误：

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfd in position 0: invalid start byte

哪种方式有意义，因为我正在尝试使用

read\u csv（）

这是我一直在使用的代码，我想我必须更改以下行：

return {name: pd.read_csv(zip_file.open(name)

因为它不再返回csv文件，而是zip文件

这可以通过一点递归来完成。如果发现ZIP中的文件是ZIP文件，则进行递归调用以提取CSV文件：

try:
    from urllib.request import urlopen
except ImportError:
    from urllib2 import urlopen

from io import BytesIO
import zipfile

import pandas as pd

# Dictionary holding all the dataframes from all zip/zip/csvs
dfs = {}


def zip_to_dfs(data):
    zip_file = zipfile.ZipFile(BytesIO(data))

    for name in zip_file.namelist():
        if name.lower().endswith('.csv'):
            dfs[name] = pd.read_csv(zip_file.open(name))
        elif name.lower().endswith('.zip'):
            zip_to_dfs(zip_file.open(name).read())


def get_zip_data_from_url(url):
    req = urlopen(url)
    zip_to_dfs(req.read())


final_links_list = [
    'http://www.nemweb.com.au/REPORTS/ARCHIVE/Dispatch_SCADA/PUBLIC_DISPATCHSCADA_20170523.zip', 
    'http://www.nemweb.com.au/REPORTS/ARCHIVE/Dispatch_SCADA/PUBLIC_DISPATCHSCADA_20170524.zip']

for link in final_links_list:
    print(link)
    get_zip_data_from_url(link)

# Display the first couple of dataframes    
for name, df in sorted(dfs.items())[:2]:
    print('\n', name, '\n')
    print(df)

这将显示以下内容：

http://www.nemweb.com.au/REPORTS/ARCHIVE/Dispatch_SCADA/PUBLIC_DISPATCHSCADA_20170524.zip
公共调度SCADA_201705240010_0000000283857084.CSV
C NEMP.WORLD Dispatches SCADA AEMO PUBLIC 2017/05/24\
0 I调度装置\u SCADA 1.0结算日期DUID
1 D调度装置\u SCADA 1.0 2017/05/24 00:10:00 BARCSF1
2 D调度装置\u SCADA 1.0 2017/05/24 00:10:00 BUTLERSG
..  ..            ...           ...   ...                  ...        ...   
263 D调度单元\u SCADA 1.0 2017/05/24 00:10:00 YWPS3
264 D调度单元\u SCADA 1.0 2017/05/24 00:10:00 YWPS4
265 C报告结束267楠楠楠
00:05:08 0000000 283857084调度SCADA.1 0000000 283857078
0 SCADAVALUE NaN NaN
10楠楠楠楠
2 8.29998楠楠楠楠楠
..          ...               ...              ...               ...  
263388.745570楠楠楠楠
264 391.568360楠楠楠楠楠楠楠楠楠
265楠楠
[266行x 10列]
公共调度SCADA_201705240015_0000000283857169.CSV
C NEMP.WORLD Dispatches SCADA AEMO PUBLIC 2017/05/24\
0 I调度装置\u SCADA 1.0结算日期DUID
1 D调度装置\u SCADA 1.0 2017/05/24 00:15:00 BARCSF1
2 D调度装置\u SCADA 1.0 2017/05/24 00:15:00 BUTLERSG
..  ..            ...           ...   ...                  ...        ...   
263 D调度单元\u SCADA 1.0 2017/05/24 00:15:00 YWPS3
264 D调度单元\u SCADA 1.0 2017/05/24 00:15:00 YWPS4
265 C报告结束267楠楠楠
00:10:08 0000000 283857169调度SCADA.1 0000000 283857163
0 SCADAVALUE NaN NaN
10楠楠楠楠
2 8.29998楠楠楠楠楠
..          ...               ...              ...               ...  
263 386.205080楠楠楠楠楠楠
264 389.592410楠楠楠楠楠楠
265楠楠
[266行x 10列]

我尝试了这个答案，但还是迷路了，因为该对象是一个zip文件，而不是csv，不知道如何返回并打开zip文件，那么您需要在嵌套的zip对象上添加另一个“zipfile.zipfile”。这不是因为没有用于测试的示例数据，也没有预期的输出。如果有人运行上述代码，是否会显示错误？如果没有，则不是MCVE。不是V，可验证。另外，如果它能正常工作，显示您期望的结果也是很有帮助的。@LukaVlaskalic:对不起，我的意思是

zipfile.zipfile（）

。

try:
    from urllib.request import urlopen
except ImportError:
    from urllib2 import urlopen

from io import BytesIO
import zipfile

import pandas as pd

# Dictionary holding all the dataframes from all zip/zip/csvs
dfs = {}


def zip_to_dfs(data):
    zip_file = zipfile.ZipFile(BytesIO(data))

    for name in zip_file.namelist():
        if name.lower().endswith('.csv'):
            dfs[name] = pd.read_csv(zip_file.open(name))
        elif name.lower().endswith('.zip'):
            zip_to_dfs(zip_file.open(name).read())


def get_zip_data_from_url(url):
    req = urlopen(url)
    zip_to_dfs(req.read())


final_links_list = [
    'http://www.nemweb.com.au/REPORTS/ARCHIVE/Dispatch_SCADA/PUBLIC_DISPATCHSCADA_20170523.zip', 
    'http://www.nemweb.com.au/REPORTS/ARCHIVE/Dispatch_SCADA/PUBLIC_DISPATCHSCADA_20170524.zip']

for link in final_links_list:
    print(link)
    get_zip_data_from_url(link)

# Display the first couple of dataframes    
for name, df in sorted(dfs.items())[:2]:
    print('\n', name, '\n')
    print(df)