Python 日期时间格式

Python 日期时间格式,python,pandas,Python,Pandas,是否可以用零后缀表示pd.to_日期时间?似乎零正在被删除 print pd.to_datetime("2000-07-26 14:21:00.00000", format="%Y-%m-%d %H:%M:%S.%f") 结果是 2000-07-26 14:21:00 预期的结果是 2000-07-26 14:21:00.00000 我知道这些值的含义是相同的,但为了保持一致性会更好。进行一些测试表明,当使用format=“%H:%M:%S.%f”格式化日期

是否可以用零后缀表示pd.to_日期时间?似乎零正在被删除

print pd.to_datetime("2000-07-26 14:21:00.00000",
                format="%Y-%m-%d %H:%M:%S.%f")
结果是

2000-07-26 14:21:00
预期的结果是

2000-07-26 14:21:00.00000

我知道这些值的含义是相同的,但为了保持一致性会更好。

进行一些测试表明,当使用format=“%H:%M:%S.%f”格式化日期时间数据时,%f能够达到纳秒分辨率,前提是小数点后的第九位数字不为零。格式化字符串时,根据小数点后最低有效位的位置,并假定其也是最后一位,将添加从无到五的可变数量的尾随零。下面是一个测试数据表,其中位置是最低有效非零的位置,也是最终数字,零是通过格式化添加的尾随零的数量:

    position zeros
       9      0
       8      1
       7      2
       6      0
       5      1
       4      2
       3      3
       2      4
       1      5
当列的格式为“%H:%M:%S.%f”作为一个整体时,其所有元素在小数点后的位数都相同,这可以通过添加或删除尾随零来实现,即使这会增加或降低原始数据的分辨率。我想原因是一致性和美观,通常不会引入过多的误差,因为在数值计算中,尾随零通常不会影响直接结果,但是它们会影响对其误差的估计以及如何表示它们(,)

以下是将“%H:%M:%S.%f”格式应用于具有pandas.to_datetime的单个字符串和pandas.Series(数据帧列)以及将pandas.DataFrame.convert_对象(convert_dates='converte')应用于具有可转换为datetime的列的数据帧的一些观察结果

在字符串上,pandas在时间转换中保留一个非零数字,最多保留小数点后第九位“%H:%M:%S.%f”,如果未提供日期,则添加一个日期:

import pandas as pd
pd.to_datetime ("10:00:00.000000001",format="%H:%M:%S.%f")
Out[15]: Timestamp('1900-01-01 10:00:00.000000001')

pd.to_datetime ("2015-09-17 10:00:00.000000001",format="%Y-%m-%d %H:%M:%S.%f")
Out[15]: Timestamp('2015-09-17 10:00:00.000000001')
在此之前,对于最终非零位数为最终位数的测试,在最终非零位数后加上五个尾随零,以提高原始数据的分辨率,但最终非零位数位于小数点右边第六位的情况除外:

pd.to_datetime ("10:00:00.00000001",format="%H:%M:%S.%f")
Out[15]: Timestamp('1900-01-01 10:00:00.000000010')

pd.to_datetime ("2015-09-17 10:00:00.00000001",format="%Y-%m-%d %H:%M:%S.%f")
Out[16]: Timestamp('2015-09-17 10:00:00.000000010')

pd.to_datetime ("10:00:00.0000001",format="%H:%M:%S.%f")
Out[15]: Timestamp('1900-01-01 10:00:00.000000100')

pd.to_datetime ("2015-09-17 10:00:00.0000001",format="%Y-%m-%d %H:%M:%S.%f")
Out[17]: Timestamp('2015-09-17 10:00:00.000000100')

pd.to_datetime ("10:00:00.000001",format="%H:%M:%S.%f")
Out[33]: Timestamp('1900-01-01 10:00:00.000001')

pd.to_datetime ("2015-09-17 10:00:00.000001",format="%Y-%m-%d %H:%M:%S.%f")
Out[18]: Timestamp('2015-09-17 10:00:00.000001')

pd.to_datetime ("10:00:00.00001",format="%H:%M:%S.%f")
Out[6]: Timestamp('1900-01-01 10:00:00.000010')

pd.to_datetime ("2015-09-17 10:00:00.00001",format="%Y-%m-%d %H:%M:%S.%f")
Out[19]: Timestamp('2015-09-17 10:00:00.000010')

pd.to_datetime ("10:00:00.0001",format="%H:%M:%S.%f")
Out[9]: Timestamp('1900-01-01 10:00:00.000100')

pd.to_datetime ("2015-09-17 10:00:00.0001",format="%Y-%m-%d %H:%M:%S.%f")
Out[21]: Timestamp('2015-09-17 10:00:00.000100')

pd.to_datetime ("10:00:00.001",format="%H:%M:%S.%f")
Out[10]: Timestamp('1900-01-01 10:00:00.001000')

pd.to_datetime ("2015-09-17 10:00:00.001",format="%Y-%m-%d %H:%M:%S.%f")
Out[22]: Timestamp('2015-09-17 10:00:00.001000')

pd.to_datetime ("10:00:00.01",format="%H:%M:%S.%f")
Out[12]: Timestamp('1900-01-01 10:00:00.010000')

pd.to_datetime ("2015-09-17 10:00:00.01",format="%Y-%m-%d %H:%M:%S.%f")
Out[24]: Timestamp('2015-09-17 10:00:00.010000'

pd.to_datetime ("10:00:00.1",format="%H:%M:%S.%f")
Out[13]: Timestamp('1900-01-01 10:00:00.100000')

pd.to_datetime ("2015-09-17 10:00:00.1",format="%Y-%m-%d %H:%M:%S.%f")
Out[26]: Timestamp('2015-09-17 10:00:00.100000')
让我们看看它是如何与数据帧一起工作的:

!type test.csv # here type is Windows substitute for Linux cat command
date,mesg
10:00:00.000000001,one
10:00:00.00000001,two
10:00:00.0000001,three
10:00:00.000001,four
10:00:00.00001,five
10:00:00.0001,six
10:00:00.001,seven
10:00:00.01,eight
10:00:00.1,nine
10:00:00.000000001,ten
10:00:00.000000002,eleven
10:00:00.000000003,twelve

df = pd.read_csv('test.csv')
df
Out[30]: 
                  date    mesg
0   10:00:00.000000001     one
1    10:00:00.00000001     two
2     10:00:00.0000001   three
3      10:00:00.000001    four
4       10:00:00.00001    five
5        10:00:00.0001     six
6         10:00:00.001   seven
7          10:00:00.01   eight
8           10:00:00.1    nine
9   10:00:00.000000001     ten
10  10:00:00.000000002  eleven
11  10:00:00.000000003  twelve

df.dtypes
Out[31]: 
date    object
mesg    object
dtype: object
带有convert_对象的数据帧的Datetime转换(不具有格式选项)提供微秒分辨率,即使某些原始数据的分辨率小于或大于该分辨率,并添加今天的日期:

df2 = df.convert_objects(convert_dates='coerce')
df2
Out[32]: 
                     date    mesg
0  2015-09-17 10:00:00.000000     one
1  2015-09-17 10:00:00.000000     two
2  2015-09-17 10:00:00.000000   three
3  2015-09-17 10:00:00.000001    four
4  2015-09-17 10:00:00.000010    five
5  2015-09-17 10:00:00.000100     six
6  2015-09-17 10:00:00.001000   seven
7  2015-09-17 10:00:00.010000   eight
8  2015-09-17 10:00:00.100000    nine
9  2015-09-17 10:00:00.000000     ten
10 2015-09-17 10:00:00.000000  eleven
11 2015-09-17 10:00:00.000000  twelve

df2.dtypes
Out[33]: 
date    datetime64[ns]
mesg            object
dtype: object
在没有显式格式说明符(即DataFrame.convert\u对象)的情况下完成日期时间转换后,使用“%H:%M:%S.%f”格式无法恢复从原始数据创建的DataFrame列中元素值的更高分辨率,其中一些分辨率大于微秒:

如果至少有一个元素在第九位(如中所宣传)具有非零数字,则在日期时间转换之前将数据帧列格式化为“%H:%M:%S.%f”可提供纳秒分辨率,但也会将分辨率小于纳秒的原始数据的分辨率提高到该级别,并将1900-01-01添加为日期:

df3 = df.copy(deep=True)
df3['date'] = pd.to_datetime(df3['date'],format="%H:%M:%S.%f",coerce=True)
df3
Out[35]:
                            date    mesg
0  1900-01-01 10:00:00.000000001     one
1  1900-01-01 10:00:00.000000010     two
2  1900-01-01 10:00:00.000000100   three
3  1900-01-01 10:00:00.000001000    four
4  1900-01-01 10:00:00.000010000    five
5  1900-01-01 10:00:00.000100000     six
6  1900-01-01 10:00:00.001000000   seven
7  1900-01-01 10:00:00.010000000   eight
8  1900-01-01 10:00:00.100000000    nine
9  1900-01-01 10:00:00.000000001     ten
10 1900-01-01 10:00:00.000000002  eleven
11 1900-01-01 10:00:00.000000003  twelve
将数据帧列格式化为“%H:%M:%S.%f”将在数据后面添加零,并在小数点之后添加最低有效非零位(在整个列上,并根据上面的位置:零表添加零)并将所有其他数据的分辨率与此对齐,即使这样做会增加或降低某些原始数据的分辨率:

df4 = pd.read_csv('test2.csv')
df4
Out[36]: 
                  date    mesg
0   10:00:00.000000000     one
1    10:00:00.00000000     two
2     10:00:00.0000000   three
3      10:00:00.000000    four
4       10:00:00.00000    five
5        10:00:00.0001     six
6          10:00:00.00   seven
7           10:00:00.0   eight
8            10:00:00.    nine
9   10:00:00.000000000     ten
10  10:00:00.000000000  eleven
11   10:00:00.00000000  twelve

df4['date'] = pd.to_datetime(df4['date'],format="%H:%M:%S.%f",coerce=True)
df4
Out[37]: 
                         date    mesg
0  1900-01-01 10:00:00.000000     one
1  1900-01-01 10:00:00.000000     two
2  1900-01-01 10:00:00.000000   three
3  1900-01-01 10:00:00.000000    four
4  1900-01-01 10:00:00.000000    five
5  1900-01-01 10:00:00.000100     six
6  1900-01-01 10:00:00.000000   seven
7  1900-01-01 10:00:00.000000   eight
8                         NaT    nine # nothing after decimal point in raw data
9  1900-01-01 10:00:00.000000     ten
10 1900-01-01 10:00:00.000000  eleven
11 1900-01-01 10:00:00.000000  twelve
当使用相同的数据帧尝试此操作时,日期列中包含日期,发生了相同的情况:

df25
Out[38]: 
                             date    mesg
0   2015-09-10 10:00:00.000000000     one
1    2015-09-11 10:00:00.00000000     two
2     2015-09-12 10:00:00.0000000   three
3      2015-09-13 10:00:00.000000    four
4       2015-09-14 10:00:00.00000    five
5        2015-09-15 10:00:00.0001     six
6          2015-09-16 10:00:00.00   seven
7           2015-09-17 10:00:00.0   eight
8            2015-09-18 10:00:00.    nine
9   2015-09-19 10:00:00.000000000     ten
10  2015-09-20 10:00:00.000000000  eleven
11   2015-09-21 10:00:00.00000000  twelve

df25['date'] = pd.to_datetime(df25['date'],format="%Y-%m-%d %H:%M:%S.%f",coerce=True)
df25
Out[39]: 
                         date    mesg
0  2015-09-10 10:00:00.000000     one
1  2015-09-11 10:00:00.000000     two
2  2015-09-12 10:00:00.000000   three
3  2015-09-13 10:00:00.000000    four
4  2015-09-14 10:00:00.000000    five
5  2015-09-15 10:00:00.000100     six
6  2015-09-16 10:00:00.000000   seven
7  2015-09-17 10:00:00.000000   eight
8                         NaT    nine # nothing after decimal point in raw data
9  2015-09-19 10:00:00.000000     ten
10 2015-09-20 10:00:00.000000  eleven
11 2015-09-21 10:00:00.000000  twelve
如果没有原始数据在小数点后有非零有效位,则使用数据帧列“%H:%M:%S.%f”进行格式化可能会在所有数据的小数点后统一提供两个零,即使这会增加或降低某些原始数据的分辨率:

df5 = pd.read_csv('test3.csv')
df5
Out[40]: 
                  date    mesg
0         10:00:00.000     one
1           10:00:00.0     two
2         10:00:00.000   three
3         10:00:00.000    four
4          10:00:00.00    five
5         10:00:00.000     six
6          10:00:00.00   seven
7           10:00:00.0   eight
8           10:00:00.0    nine
9   10:00:00.000000000     ten
10        10:00:00.000  eleven
11        10:00:00.000  twelve

df5['date'] = pd.to_datetime(df5['date'],format="%H:%M:%S.%f",coerce=True)
df5
Out[41]: 
                  date    mesg
0  1900-01-01 10:00:00     one
1  1900-01-01 10:00:00     two
2  1900-01-01 10:00:00   three
3  1900-01-01 10:00:00    four
4  1900-01-01 10:00:00    five
5  1900-01-01 10:00:00     six
6  1900-01-01 10:00:00   seven
7  1900-01-01 10:00:00   eight
8  1900-01-01 10:00:00    nine
9  1900-01-01 10:00:00     ten
10 1900-01-01 10:00:00  eleven
11 1900-01-01 10:00:00  twelve
在使用相同的数据帧但日期列中包含日期进行此测试时,也会发生同样的情况:

df45
Out[42]: 
                             date    mesg
0         2015-09-10 10:00:00.000     one
1           2015-09-11 10:00:00.0     two
2         2015-09-12 10:00:00.000   three
3         2015-09-13 10:00:00.000    four
4          2015-09-14 10:00:00.00    five
5         2015-09-15 10:00:00.000     six
6          2015-09-16 10:00:00.00   seven
7           2015-09-17 10:00:00.0   eight
8           2015-09-18 10:00:00.0    nine
9   2015-09-19 10:00:00.000000000     ten
10        2015-09-20 10:00:00.000  eleven
11        2015-09-21 10:00:00.000  twelve

df45['date'] = pd.to_datetime(df45['date'],format="%Y-%m-%d %H:%M:    %S.%f",coerce=True)
df45
Out[43]: 
                  date    mesg
0  2015-09-10 10:00:00     one
1  2015-09-11 10:00:00     two
2  2015-09-12 10:00:00   three
3  2015-09-13 10:00:00    four
4  2015-09-14 10:00:00    five
5  2015-09-15 10:00:00     six
6  2015-09-16 10:00:00   seven
7  2015-09-17 10:00:00   eight
8  2015-09-18 10:00:00    nine
9  2015-09-19 10:00:00     ten
10 2015-09-20 10:00:00  eleven
11 2015-09-21 10:00:00  twelve

对不起,没有足够的代表发表评论,所以我将在这里尝试我的答案。完全同意EdChum,这是一个显示问题。如果您尝试:

pd.to_datetime ("10:00:00.00001",format="%H:%M:%S.%f")
答复应当是:


时间戳('1900-01-01 10:00:00.000010')

您是否可以发布您的输出,因为这可能只是一个显示问题
pd.to_datetime ("10:00:00.00001",format="%H:%M:%S.%f")