Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/351.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 更改datetime64列的时间组件_Python_Datetime_Pandas_Dataframe - Fatal编程技术网

Python 更改datetime64列的时间组件

Python 更改datetime64列的时间组件,python,datetime,pandas,dataframe,Python,Datetime,Pandas,Dataframe,我有一个数据帧,可以简化为: date id 0 02/04/2015 02:34 1 1 06/04/2015 12:34 2 2 09/04/2015 23:03 3 3 12/04/2015 01:00 4 4 15/04/2015 07:12 5 5 21/04/2015 12:59 6 6 29/04/2015 17:33 7 7 04/05/2015 10:44 8 8 06/05/

我有一个数据帧,可以简化为:

                date  id
0   02/04/2015 02:34   1
1   06/04/2015 12:34   2
2   09/04/2015 23:03   3
3   12/04/2015 01:00   4
4   15/04/2015 07:12   5
5   21/04/2015 12:59   6
6   29/04/2015 17:33   7
7   04/05/2015 10:44   8
8   06/05/2015 11:12   9
9   10/05/2015 08:52  10
10  12/05/2015 14:19  11
11  19/05/2015 19:22  12
12  27/05/2015 22:31  13
13  01/06/2015 11:09  14
14  04/06/2015 12:57  15
15  10/06/2015 04:00  16
16  15/06/2015 03:23  17
17  19/06/2015 05:37  18
18  23/06/2015 13:41  19
19  27/06/2015 15:43  20
可以使用以下方法创建该文件:

tempDF = pd.DataFrame({ 'id': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20],
                        'date': ["02/04/2015 02:34","06/04/2015 12:34","09/04/2015 23:03","12/04/2015 01:00","15/04/2015 07:12","21/04/2015 12:59","29/04/2015 17:33","04/05/2015 10:44","06/05/2015 11:12","10/05/2015 08:52","12/05/2015 14:19","19/05/2015 19:22","27/05/2015 22:31","01/06/2015 11:09","04/06/2015 12:57","10/06/2015 04:00","15/06/2015 03:23","19/06/2015 05:37","23/06/2015 13:41","27/06/2015 15:43"]})
数据具有以下类型:

tempDF.dtypes
date     object
id        int64
dtype: object
我已使用以下方法将“date”变量设置为Pandas datefime64格式(如果这是正确的描述方式):

现在,数据类型如下所示:

tempDF.dtypes
date     datetime64[ns]
id                int64
dtype: object
我想更改原始日期数据的小时数。我可以使用.normalize()通过.dt访问器转换为午夜:

tempDF['date'] = tempDF['date'].dt.normalize()
而且,我可以通过以下方式访问各个日期时间组件(例如年):

这将产生:

0     2015
1     2015
2     2015
3     2015
4     2015
5     2015
6     2015
7     2015
8     2015
9     2015
10    2015
11    2015
12    2015
13    2015
14    2015
15    2015
16    2015
17    2015
18    2015
19    2015
Name: date, dtype: int64

问题是,如何更改特定的日期和时间组件?例如,如何更改所有日期的中午(12:00)?我发现datetime.datetime有一个.replace()函数。然而,将日期转换为熊猫格式后,保持这种格式是有意义的。有没有办法在不再次更改格式的情况下执行此操作?

编辑:

实现这一点的矢量化方法是对序列进行规范化,然后使用
timedelta
向序列中添加
12
小时。范例-

tempDF['date'].dt.normalize() + datetime.timedelta(hours=12)
tempDF['date'] = tempDF['date'].apply(lambda x:x.replace(hour=12,minute=0))
演示-

In [59]: tempDF
Out[59]:
                  date  id
0  2015-02-04 12:00:00   1
1  2015-06-04 12:00:00   2
2  2015-09-04 12:00:00   3
3  2015-12-04 12:00:00   4
4  2015-04-15 12:00:00   5
5  2015-04-21 12:00:00   6
6  2015-04-29 12:00:00   7
7  2015-04-05 12:00:00   8
8  2015-06-05 12:00:00   9
9  2015-10-05 12:00:00  10
10 2015-12-05 12:00:00  11
11 2015-05-19 12:00:00  12
12 2015-05-27 12:00:00  13
13 2015-01-06 12:00:00  14
14 2015-04-06 12:00:00  15
15 2015-10-06 12:00:00  16
16 2015-06-15 12:00:00  17
17 2015-06-19 12:00:00  18
18 2015-06-23 12:00:00  19
19 2015-06-27 12:00:00  20

In [60]: tempDF['date'].dt.normalize() + datetime.timedelta(hours=12)
Out[60]:
0    2015-02-04 12:00:00
1    2015-06-04 12:00:00
2    2015-09-04 12:00:00
3    2015-12-04 12:00:00
4    2015-04-15 12:00:00
5    2015-04-21 12:00:00
6    2015-04-29 12:00:00
7    2015-04-05 12:00:00
8    2015-06-05 12:00:00
9    2015-10-05 12:00:00
10   2015-12-05 12:00:00
11   2015-05-19 12:00:00
12   2015-05-27 12:00:00
13   2015-01-06 12:00:00
14   2015-04-06 12:00:00
15   2015-10-06 12:00:00
16   2015-06-15 12:00:00
17   2015-06-19 12:00:00
18   2015-06-23 12:00:00
19   2015-06-27 12:00:00
dtype: datetime64[ns]
In [12]: tempDF
Out[12]:
                  date  id
0  2015-02-04 02:34:00   1
1  2015-06-04 12:34:00   2
2  2015-09-04 23:03:00   3
3  2015-12-04 01:00:00   4
4  2015-04-15 07:12:00   5
5  2015-04-21 12:59:00   6
6  2015-04-29 17:33:00   7
7  2015-04-05 10:44:00   8
8  2015-06-05 11:12:00   9
9  2015-10-05 08:52:00  10
10 2015-12-05 14:19:00  11
11 2015-05-19 19:22:00  12
12 2015-05-27 22:31:00  13
13 2015-01-06 11:09:00  14
14 2015-04-06 12:57:00  15
15 2015-10-06 04:00:00  16
16 2015-06-15 03:23:00  17
17 2015-06-19 05:37:00  18
18 2015-06-23 13:41:00  19
19 2015-06-27 15:43:00  20

In [13]: tempDF['date'] = tempDF['date'].apply(lambda x:x.replace(hour=12,minute=0))

In [14]: tempDF
Out[14]:
                  date  id
0  2015-02-04 12:00:00   1
1  2015-06-04 12:00:00   2
2  2015-09-04 12:00:00   3
3  2015-12-04 12:00:00   4
4  2015-04-15 12:00:00   5
5  2015-04-21 12:00:00   6
6  2015-04-29 12:00:00   7
7  2015-04-05 12:00:00   8
8  2015-06-05 12:00:00   9
9  2015-10-05 12:00:00  10
10 2015-12-05 12:00:00  11
11 2015-05-19 12:00:00  12
12 2015-05-27 12:00:00  13
13 2015-01-06 12:00:00  14
14 2015-04-06 12:00:00  15
15 2015-10-06 12:00:00  16
16 2015-06-15 12:00:00  17
17 2015-06-19 12:00:00  18
18 2015-06-23 12:00:00  19
19 2015-06-27 12:00:00  20
底部两种方法的计时信息


一种方法是与OP在其帖子中提到的方法一起使用。范例-

tempDF['date'].dt.normalize() + datetime.timedelta(hours=12)
tempDF['date'] = tempDF['date'].apply(lambda x:x.replace(hour=12,minute=0))
演示-

In [59]: tempDF
Out[59]:
                  date  id
0  2015-02-04 12:00:00   1
1  2015-06-04 12:00:00   2
2  2015-09-04 12:00:00   3
3  2015-12-04 12:00:00   4
4  2015-04-15 12:00:00   5
5  2015-04-21 12:00:00   6
6  2015-04-29 12:00:00   7
7  2015-04-05 12:00:00   8
8  2015-06-05 12:00:00   9
9  2015-10-05 12:00:00  10
10 2015-12-05 12:00:00  11
11 2015-05-19 12:00:00  12
12 2015-05-27 12:00:00  13
13 2015-01-06 12:00:00  14
14 2015-04-06 12:00:00  15
15 2015-10-06 12:00:00  16
16 2015-06-15 12:00:00  17
17 2015-06-19 12:00:00  18
18 2015-06-23 12:00:00  19
19 2015-06-27 12:00:00  20

In [60]: tempDF['date'].dt.normalize() + datetime.timedelta(hours=12)
Out[60]:
0    2015-02-04 12:00:00
1    2015-06-04 12:00:00
2    2015-09-04 12:00:00
3    2015-12-04 12:00:00
4    2015-04-15 12:00:00
5    2015-04-21 12:00:00
6    2015-04-29 12:00:00
7    2015-04-05 12:00:00
8    2015-06-05 12:00:00
9    2015-10-05 12:00:00
10   2015-12-05 12:00:00
11   2015-05-19 12:00:00
12   2015-05-27 12:00:00
13   2015-01-06 12:00:00
14   2015-04-06 12:00:00
15   2015-10-06 12:00:00
16   2015-06-15 12:00:00
17   2015-06-19 12:00:00
18   2015-06-23 12:00:00
19   2015-06-27 12:00:00
dtype: datetime64[ns]
In [12]: tempDF
Out[12]:
                  date  id
0  2015-02-04 02:34:00   1
1  2015-06-04 12:34:00   2
2  2015-09-04 23:03:00   3
3  2015-12-04 01:00:00   4
4  2015-04-15 07:12:00   5
5  2015-04-21 12:59:00   6
6  2015-04-29 17:33:00   7
7  2015-04-05 10:44:00   8
8  2015-06-05 11:12:00   9
9  2015-10-05 08:52:00  10
10 2015-12-05 14:19:00  11
11 2015-05-19 19:22:00  12
12 2015-05-27 22:31:00  13
13 2015-01-06 11:09:00  14
14 2015-04-06 12:57:00  15
15 2015-10-06 04:00:00  16
16 2015-06-15 03:23:00  17
17 2015-06-19 05:37:00  18
18 2015-06-23 13:41:00  19
19 2015-06-27 15:43:00  20

In [13]: tempDF['date'] = tempDF['date'].apply(lambda x:x.replace(hour=12,minute=0))

In [14]: tempDF
Out[14]:
                  date  id
0  2015-02-04 12:00:00   1
1  2015-06-04 12:00:00   2
2  2015-09-04 12:00:00   3
3  2015-12-04 12:00:00   4
4  2015-04-15 12:00:00   5
5  2015-04-21 12:00:00   6
6  2015-04-29 12:00:00   7
7  2015-04-05 12:00:00   8
8  2015-06-05 12:00:00   9
9  2015-10-05 12:00:00  10
10 2015-12-05 12:00:00  11
11 2015-05-19 12:00:00  12
12 2015-05-27 12:00:00  13
13 2015-01-06 12:00:00  14
14 2015-04-06 12:00:00  15
15 2015-10-06 12:00:00  16
16 2015-06-15 12:00:00  17
17 2015-06-19 12:00:00  18
18 2015-06-23 12:00:00  19
19 2015-06-27 12:00:00  20

定时信息

In [52]: df = pd.DataFrame([[datetime.datetime.now()] for _ in range(100000)],columns=['date'])

In [54]: %%timeit
   ....: df['date'].dt.normalize() + datetime.timedelta(hours=12)
   ....:
The slowest run took 12.53 times longer than the fastest. This could mean that an intermediate result is being cached
1 loops, best of 3: 32.3 ms per loop

In [57]: %%timeit
   ....: df['date'].apply(lambda x:x.replace(hour=12,minute=0))
   ....:
1 loops, best of 3: 1.09 s per loop

回答得很好。非常感谢。我总是回避使用lambda函数,因为我通常包含的数据帧超过一百万行,我认为lambda函数会很慢。但是,也许我需要重新讨论这些函数。有没有一种方法可以使用基于列的方法而不是单步遍历每一行?我找到了一个向量化的方法,检查它,并在答案中进行了更新。在带有时区和夏令时的时间戳中添加时间增量可能会得到意想不到的结果。(pd.时间戳('2022-03-27 00:00',tz='CET')+pd.时间增量(12,单位为h')。小时==13