Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/328.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何使用pandas解析不同列中的特殊日期格式_Python_Pandas - Fatal编程技术网

Python 如何使用pandas解析不同列中的特殊日期格式

Python 如何使用pandas解析不同列中的特殊日期格式,python,pandas,Python,Pandas,具有特殊日期格式的xls数据,例如: start day(utc) start time(utc) 20160401 100 20160401 200 20160401 300 20160401 400 20160401 500 我想将其解析为格式2016-04-01 1:00, 我用熊猫看桌子 parse = lambda x: da

具有特殊日期格式的xls数据,例如:

 start day(utc) start time(utc)
    20160401            100
    20160401            200
    20160401            300
    20160401            400
    20160401            500
我想将其解析为格式2016-04-01 1:00, 我用熊猫看桌子

    parse = lambda x: datetime.strptime(str(x), '%Y%m%d %H')
    content=pd.read_excel(filepath,skiprows=1,
                          na_values=['nan',-9999.0,9999.0,
                          '-9999.0 -',-99,'-99.000 -',-999],
                          parse_cols=[1,2,3,4,5,6,7,8,9,10,11,12,14],
                          header=None, parse_dates = [0,1], 
                          index_col = 0, 
                          date_parser=parse)
但是错误发生了。它表明:

 File "D:\Anaconda2\lib\_strptime.py", line 332, in _strptime
  (data_string, format))

   ValueError: time data '100' does not match format '%Y%m%d'
如何处理它?

您可以使用,因为需要除以100:

content=pd.read_excel(filepath,skiprows=1,
                      na_values=['nan',-9999.0,9999.0,
                      '-9999.0 -',-99,'-99.000 -',-999],
                      parse_cols=[1,2,3,4,5,6,7,8,9,10,11,12,14],
                      header=None, parse_dates = [0], 
                      index_col = 0)

df.index = df.index + pd.to_timedelta(df['start time(utc)'] / 100., unit='h')
df = df.drop('start time(utc)', axis=1)
如果没有必要(小时数为
0,1,2..23
),将
parse_dates=[0,1]
更改为
parse_dates=[0,1]]

样本

import pandas as pd
from pandas.compat import StringIO

temp=u"""start day(utc);start time(utc);a
20160401;1;1
20160401;2;7
20160401;3;7
20160401;4;5
20160401;5;3"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
parse = lambda x: datetime.strptime(x, '%Y%m%d %H')
df = pd.read_csv(StringIO(temp), sep=";", 
                          parse_dates = [[0,1]], 
                          index_col = 0,
                          date_parser=parse)

print (df)
                                a
start day(utc)_start time(utc)   
2016-04-01 01:00:00             1
2016-04-01 02:00:00             7
2016-04-01 03:00:00             7
2016-04-01 04:00:00             5
2016-04-01 05:00:00             3

print (df.index)
DatetimeIndex(['2016-04-01 01:00:00', '2016-04-01 02:00:00',
               '2016-04-01 03:00:00', '2016-04-01 04:00:00',
               '2016-04-01 05:00:00'],
              dtype='datetime64[ns]', name='start day(utc)_start time(utc)', freq=None)
您可以使用,因为需要除以100:

content=pd.read_excel(filepath,skiprows=1,
                      na_values=['nan',-9999.0,9999.0,
                      '-9999.0 -',-99,'-99.000 -',-999],
                      parse_cols=[1,2,3,4,5,6,7,8,9,10,11,12,14],
                      header=None, parse_dates = [0], 
                      index_col = 0)

df.index = df.index + pd.to_timedelta(df['start time(utc)'] / 100., unit='h')
df = df.drop('start time(utc)', axis=1)
如果没有必要(小时数为
0,1,2..23
),将
parse_dates=[0,1]
更改为
parse_dates=[0,1]]

样本

import pandas as pd
from pandas.compat import StringIO

temp=u"""start day(utc);start time(utc);a
20160401;1;1
20160401;2;7
20160401;3;7
20160401;4;5
20160401;5;3"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
parse = lambda x: datetime.strptime(x, '%Y%m%d %H')
df = pd.read_csv(StringIO(temp), sep=";", 
                          parse_dates = [[0,1]], 
                          index_col = 0,
                          date_parser=parse)

print (df)
                                a
start day(utc)_start time(utc)   
2016-04-01 01:00:00             1
2016-04-01 02:00:00             7
2016-04-01 03:00:00             7
2016-04-01 04:00:00             5
2016-04-01 05:00:00             3

print (df.index)
DatetimeIndex(['2016-04-01 01:00:00', '2016-04-01 02:00:00',
               '2016-04-01 03:00:00', '2016-04-01 04:00:00',
               '2016-04-01 05:00:00'],
              dtype='datetime64[ns]', name='start day(utc)_start time(utc)', freq=None)