Python 如何使用pandas解析不同列中的特殊日期格式
具有特殊日期格式的xls数据,例如:Python 如何使用pandas解析不同列中的特殊日期格式,python,pandas,Python,Pandas,具有特殊日期格式的xls数据,例如: start day(utc) start time(utc) 20160401 100 20160401 200 20160401 300 20160401 400 20160401 500 我想将其解析为格式2016-04-01 1:00, 我用熊猫看桌子 parse = lambda x: da
start day(utc) start time(utc)
20160401 100
20160401 200
20160401 300
20160401 400
20160401 500
我想将其解析为格式2016-04-01 1:00,
我用熊猫看桌子
parse = lambda x: datetime.strptime(str(x), '%Y%m%d %H')
content=pd.read_excel(filepath,skiprows=1,
na_values=['nan',-9999.0,9999.0,
'-9999.0 -',-99,'-99.000 -',-999],
parse_cols=[1,2,3,4,5,6,7,8,9,10,11,12,14],
header=None, parse_dates = [0,1],
index_col = 0,
date_parser=parse)
但是错误发生了。它表明:
File "D:\Anaconda2\lib\_strptime.py", line 332, in _strptime
(data_string, format))
ValueError: time data '100' does not match format '%Y%m%d'
如何处理它?您可以使用,因为需要除以100:
content=pd.read_excel(filepath,skiprows=1,
na_values=['nan',-9999.0,9999.0,
'-9999.0 -',-99,'-99.000 -',-999],
parse_cols=[1,2,3,4,5,6,7,8,9,10,11,12,14],
header=None, parse_dates = [0],
index_col = 0)
df.index = df.index + pd.to_timedelta(df['start time(utc)'] / 100., unit='h')
df = df.drop('start time(utc)', axis=1)
如果没有必要(小时数为0,1,2..23
),将parse_dates=[0,1]
更改为parse_dates=[0,1]]
:
样本:
import pandas as pd
from pandas.compat import StringIO
temp=u"""start day(utc);start time(utc);a
20160401;1;1
20160401;2;7
20160401;3;7
20160401;4;5
20160401;5;3"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
parse = lambda x: datetime.strptime(x, '%Y%m%d %H')
df = pd.read_csv(StringIO(temp), sep=";",
parse_dates = [[0,1]],
index_col = 0,
date_parser=parse)
print (df)
a
start day(utc)_start time(utc)
2016-04-01 01:00:00 1
2016-04-01 02:00:00 7
2016-04-01 03:00:00 7
2016-04-01 04:00:00 5
2016-04-01 05:00:00 3
print (df.index)
DatetimeIndex(['2016-04-01 01:00:00', '2016-04-01 02:00:00',
'2016-04-01 03:00:00', '2016-04-01 04:00:00',
'2016-04-01 05:00:00'],
dtype='datetime64[ns]', name='start day(utc)_start time(utc)', freq=None)
您可以使用,因为需要除以100:
content=pd.read_excel(filepath,skiprows=1,
na_values=['nan',-9999.0,9999.0,
'-9999.0 -',-99,'-99.000 -',-999],
parse_cols=[1,2,3,4,5,6,7,8,9,10,11,12,14],
header=None, parse_dates = [0],
index_col = 0)
df.index = df.index + pd.to_timedelta(df['start time(utc)'] / 100., unit='h')
df = df.drop('start time(utc)', axis=1)
如果没有必要(小时数为0,1,2..23
),将parse_dates=[0,1]
更改为parse_dates=[0,1]]
:
样本:
import pandas as pd
from pandas.compat import StringIO
temp=u"""start day(utc);start time(utc);a
20160401;1;1
20160401;2;7
20160401;3;7
20160401;4;5
20160401;5;3"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
parse = lambda x: datetime.strptime(x, '%Y%m%d %H')
df = pd.read_csv(StringIO(temp), sep=";",
parse_dates = [[0,1]],
index_col = 0,
date_parser=parse)
print (df)
a
start day(utc)_start time(utc)
2016-04-01 01:00:00 1
2016-04-01 02:00:00 7
2016-04-01 03:00:00 7
2016-04-01 04:00:00 5
2016-04-01 05:00:00 3
print (df.index)
DatetimeIndex(['2016-04-01 01:00:00', '2016-04-01 02:00:00',
'2016-04-01 03:00:00', '2016-04-01 04:00:00',
'2016-04-01 05:00:00'],
dtype='datetime64[ns]', name='start day(utc)_start time(utc)', freq=None)