无法生成数据帧,因为读取的csv空格间隔不是常量
我正在尝试将此文本文件(philadelphia.txt)转换为熊猫数据框:无法生成数据帧,因为读取的csv空格间隔不是常量,csv,pandas,dataframe,Csv,Pandas,Dataframe,我正在尝试将此文本文件(philadelphia.txt)转换为熊猫数据框: STATION STATION_NAME DATE TAVG TMAX TMIN ----------------- -------------------------------------------------- -------- -------- -------- --------
STATION STATION_NAME DATE TAVG TMAX TMIN
----------------- -------------------------------------------------- -------- -------- -------- --------
GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970605 -9999 74 47
GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970606 -9999 68 50
GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970608 -9999 72 50
GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970609 -9999 83 47
GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970610 -9999 86 55
GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970611 -9999 88 61
GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970612 -9999 83 70
GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970613 -9999 80 66
GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970614 -9999 80 64
GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970615 -9999 77 55
GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970616 -9999 79 49
然而,如果我使用
data = pd.read_csv('philadelphia.txt', sep="\s+", header=0)
它生成了一个正确的标头,但随后遇到了拆分站名数据的问题。我希望它包含在列名“STATION_name”下,但sep=“\s+”在空格处拆分它,我得到一个错误
pandas.errors.ParserError: Error tokenizing data. C error: Expected 6 fields in line 3, saw 11
如何将数据分为6列,而不将站点名称拆分为单个单词
我还希望能够传递其他具有不同站点名称的文本文档,例如(yellowknife.txt)
使用方法:
栏目:
In [9]: df.columns.tolist()
Out[9]: ['STATION', 'STATION_NAME', 'DATE', 'TAVG', 'TMAX', 'TMIN']
In [7]: df = pd.read_fwf(r'/path/to/file.csv').drop(0)
In [8]: df
Out[8]:
STATION STATION_NAME DATE TAVG TMAX TMIN
1 GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970605 -9999 74 47
2 GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970606 -9999 68 50
3 GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970608 -9999 72 50
4 GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970609 -9999 83 47
5 GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970610 -9999 86 55
6 GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970611 -9999 88 61
7 GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970612 -9999 83 70
8 GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970613 -9999 80 66
9 GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970614 -9999 80 64
10 GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970615 -9999 77 55
11 GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970616 -9999 79 49
In [9]: df.columns.tolist()
Out[9]: ['STATION', 'STATION_NAME', 'DATE', 'TAVG', 'TMAX', 'TMIN']