Python 3.x 无法使用pandas从特定的.CSV文件读取内容
我有一个CSV文件,其中包含度分秒格式的坐标。但是我无法使用read\u CSV加载该CSV文件 例如: 或 有人有什么建议吗 错误:Python 3.x 无法使用pandas从特定的.CSV文件读取内容,python-3.x,pandas,csv,coordinates,Python 3.x,Pandas,Csv,Coordinates,我有一个CSV文件,其中包含度分秒格式的坐标。但是我无法使用read\u CSV加载该CSV文件 例如: 或 有人有什么建议吗 错误: Traceback (most recent call last): File "<ipython-input-7-5e6c73be55c1>", line 1, in <module> pd.read_csv("test.csv") File "C:\ProgramData\Anaconda3\envs\obspy\
Traceback (most recent call last):
File "<ipython-input-7-5e6c73be55c1>", line 1, in <module>
pd.read_csv("test.csv")
File "C:\ProgramData\Anaconda3\envs\obspy\lib\site-packages\pandas\io\parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\ProgramData\Anaconda3\envs\obspy\lib\site-packages\pandas\io\parsers.py", line 448, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "C:\ProgramData\Anaconda3\envs\obspy\lib\site-packages\pandas\io\parsers.py", line 880, in __init__
self._make_engine(self.engine)
File "C:\ProgramData\Anaconda3\envs\obspy\lib\site-packages\pandas\io\parsers.py", line 1114, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "C:\ProgramData\Anaconda3\envs\obspy\lib\site-packages\pandas\io\parsers.py", line 1891, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas\_libs\parsers.pyx", line 529, in pandas._libs.parsers.TextReader.__cinit__
File "pandas\_libs\parsers.pyx", line 749, in pandas._libs.parsers.TextReader._get_header
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 5: invalid start byte
回溯(最近一次呼叫最后一次):
文件“”,第1行,在
pd.read_csv(“test.csv”)
解析器中的文件“C:\ProgramData\Anaconda3\envs\obspy\lib\site packages\pandas\io\parsers.py”,第676行
返回读取(文件路径或缓冲区,kwds)
文件“C:\ProgramData\Anaconda3\envs\obspy\lib\site packages\pandas\io\parsers.py”,第448行,已读
parser=TextFileReader(fp_或_buf,**kwds)
文件“C:\ProgramData\Anaconda3\envs\obspy\lib\site packages\pandas\io\parsers.py”,第880行,在\uuu init中__
自制发动机(自制发动机)
文件“C:\ProgramData\Anaconda3\envs\obspy\lib\site packages\pandas\io\parsers.py”,第1114行,在生成引擎中
self.\u engine=CParserWrapper(self.f,**self.options)
文件“C:\ProgramData\Anaconda3\envs\obspy\lib\site packages\pandas\io\parsers.py”,第1891行,在\uuu init中__
self.\u reader=parsers.TextReader(src,**kwds)
文件“pandas\\ libs\parsers.pyx”,第529行,在pandas.\u libs.parsers.TextReader.\uu\cinit中__
文件“pandas\\ libs\parsers.pyx”,第749行,在pandas.\u libs.parsers.TextReader.\u get\u头中
UnicodeDecodeError:“utf-8”编解码器无法解码位置5中的字节0xf8:无效的开始字节
如果它们是固定长度的记录,您可以使用
或
我不能说这是最好的方法,但看看这是否适合你:
# using some random seperator to get the entire row as one column
df = pd.read_csv("coordinates.csv", sep="!", header=None)
df.columns = ['coord']
# added separate columns will blanks for lat and lon (I assumed them to be lat and lon)
df['lat'] = ''*len(df['coord'])
df['lon'] = ''*len(df['coord'])
现阶段:
coord lat lon
0 74° 18' 01.8963" E 32° 56' 40.2788" N
1 76° 05' 57.9815" E 31° 24' 25.0336" N
2 75° 02' 45.5176" E 30° 25' 19.6260" N
3 73° 23' 12.3829" E 31° 47' 47.4578" N
4 74° 18' 01.8963" E 32° 56' 40.2788" N
df:
您会收到什么错误消息,以及您正试图运行的确切命令是什么?它不会总是固定长度。它仍然不起作用。我猜问题在于读取部分“UnicodeDecodeError:”utf-8“编解码器无法解码位置2:无效起始字节中的字节0xf8```
# for this - '''74° 18' 01.8963" E''' '''32° 56' 40.2788" N'''
df = pd.read_fwf('filename.csv',[(3,22),(32,51)], header=None)
df
0 1
0 74A° 18' 01.8963" E 32A° 56' 40.2788" N
1 76A° 05' 57.9815" E 31A° 24' 25.0336" N
2 75A° 02' 45.5176" E 30A° 25' 19.6260" N
3 73A° 23' 12.3829" E 31A° 47' 47.4578" N
4 74A° 18' 01.8963" E 32A° 56' 40.2788" N
# for this - 74° 18' 01.8963" E 32° 56' 40.2788" N
df = pd.read_fwf('filename.csv',[(0,19),(20,39)], header=None)
# using some random seperator to get the entire row as one column
df = pd.read_csv("coordinates.csv", sep="!", header=None)
df.columns = ['coord']
# added separate columns will blanks for lat and lon (I assumed them to be lat and lon)
df['lat'] = ''*len(df['coord'])
df['lon'] = ''*len(df['coord'])
coord lat lon
0 74° 18' 01.8963" E 32° 56' 40.2788" N
1 76° 05' 57.9815" E 31° 24' 25.0336" N
2 75° 02' 45.5176" E 30° 25' 19.6260" N
3 73° 23' 12.3829" E 31° 47' 47.4578" N
4 74° 18' 01.8963" E 32° 56' 40.2788" N
import re
# assuming each coordinate will end with either of one directional indicators - E, W, N or S
pattern = '[EWNS]'
for i, val in enumerate(list(df['coord'])):
idx = re.search(pattern, val).start()
df['lat'][i] = df['coord'][i][:idx+1]
df['lon'][i] = df['coord'][i][idx+1:]
print(df)
coord lat lon
0 74° 18' 01.8963" E 32° 56' 40.2788" N 74° 18' 01.8963" E 32° 56' 40.2788" N
1 76° 05' 57.9815" E 31° 24' 25.0336" N 76° 05' 57.9815" E 31° 24' 25.0336" N
2 75° 02' 45.5176" E 30° 25' 19.6260" N 75° 02' 45.5176" E 30° 25' 19.6260" N
3 73° 23' 12.3829" E 31° 47' 47.4578" N 73° 23' 12.3829" E 31° 47' 47.4578" N
4 74° 18' 01.8963" E 32° 56' 40.2788" N 74° 18' 01.8963" E 32° 56' 40.2788" N