无法生成数据帧,因为读取的csv空格间隔不是常量

无法生成数据帧,因为读取的csv空格间隔不是常量,csv,pandas,dataframe,Csv,Pandas,Dataframe,我正在尝试将此文本文件(philadelphia.txt)转换为熊猫数据框: STATION STATION_NAME DATE TAVG TMAX TMIN ----------------- -------------------------------------------------- -------- -------- -------- --------

我正在尝试将此文本文件(philadelphia.txt)转换为熊猫数据框:

STATION           STATION_NAME                                       DATE     TAVG     TMAX     TMIN     
----------------- -------------------------------------------------- -------- -------- -------- -------- 
GHCND:USW00094732         PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970605 -9999    74       47       
GHCND:USW00094732         PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970606 -9999    68       50       
GHCND:USW00094732         PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970608 -9999    72       50       
GHCND:USW00094732         PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970609 -9999    83       47       
GHCND:USW00094732         PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970610 -9999    86       55       
GHCND:USW00094732         PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970611 -9999    88       61       
GHCND:USW00094732         PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970612 -9999    83       70       
GHCND:USW00094732         PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970613 -9999    80       66       
GHCND:USW00094732         PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970614 -9999    80       64       
GHCND:USW00094732         PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970615 -9999    77       55       
GHCND:USW00094732         PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970616 -9999    79       49
然而,如果我使用

data = pd.read_csv('philadelphia.txt', sep="\s+", header=0)
它生成了一个正确的标头,但随后遇到了拆分站名数据的问题。我希望它包含在列名“STATION_name”下,但sep=“\s+”在空格处拆分它,我得到一个错误

pandas.errors.ParserError: Error tokenizing data. C error: Expected 6 fields in line 3, saw 11
如何将数据分为6列,而不将站点名称拆分为单个单词

我还希望能够传递其他具有不同站点名称的文本文档,例如(yellowknife.txt)

使用方法:

栏目:

In [9]: df.columns.tolist()
Out[9]: ['STATION', 'STATION_NAME', 'DATE', 'TAVG', 'TMAX', 'TMIN']
In [7]: df = pd.read_fwf(r'/path/to/file.csv').drop(0)

In [8]: df
Out[8]:
              STATION                                STATION_NAME      DATE   TAVG TMAX TMIN
1   GHCND:USW00094732  PHILADELPHIA NE PHILADELPHIA AIRPORT PA US  19970605  -9999   74   47
2   GHCND:USW00094732  PHILADELPHIA NE PHILADELPHIA AIRPORT PA US  19970606  -9999   68   50
3   GHCND:USW00094732  PHILADELPHIA NE PHILADELPHIA AIRPORT PA US  19970608  -9999   72   50
4   GHCND:USW00094732  PHILADELPHIA NE PHILADELPHIA AIRPORT PA US  19970609  -9999   83   47
5   GHCND:USW00094732  PHILADELPHIA NE PHILADELPHIA AIRPORT PA US  19970610  -9999   86   55
6   GHCND:USW00094732  PHILADELPHIA NE PHILADELPHIA AIRPORT PA US  19970611  -9999   88   61
7   GHCND:USW00094732  PHILADELPHIA NE PHILADELPHIA AIRPORT PA US  19970612  -9999   83   70
8   GHCND:USW00094732  PHILADELPHIA NE PHILADELPHIA AIRPORT PA US  19970613  -9999   80   66
9   GHCND:USW00094732  PHILADELPHIA NE PHILADELPHIA AIRPORT PA US  19970614  -9999   80   64
10  GHCND:USW00094732  PHILADELPHIA NE PHILADELPHIA AIRPORT PA US  19970615  -9999   77   55
11  GHCND:USW00094732  PHILADELPHIA NE PHILADELPHIA AIRPORT PA US  19970616  -9999   79   49
In [9]: df.columns.tolist()
Out[9]: ['STATION', 'STATION_NAME', 'DATE', 'TAVG', 'TMAX', 'TMIN']