Python 3.x 有没有一种方法可以在不将数据类型更改为对象的情况下,将NaT附加到带有时区的datetime?
我的数据框中有一列数据类型为:datetime64[ns,UTC]。当我在该列中附加一行None或NaT时,该列的数据类型将更改为“object”。这不会发生在数据类型为datetime64[ns]的列上 下面是一个演示:Python 3.x 有没有一种方法可以在不将数据类型更改为对象的情况下,将NaT附加到带有时区的datetime?,python-3.x,pandas,datetime,Python 3.x,Pandas,Datetime,我的数据框中有一列数据类型为:datetime64[ns,UTC]。当我在该列中附加一行None或NaT时,该列的数据类型将更改为“object”。这不会发生在数据类型为datetime64[ns]的列上 下面是一个演示: # Test pandas with datetime columns import pandas as pd from datetime import datetime, timezone df = pd.DataFrame([{'D': datetime.utcnow()
# Test pandas with datetime columns
import pandas as pd
from datetime import datetime, timezone
df = pd.DataFrame([{'D': datetime.utcnow()}])
df_wtz = pd.DataFrame([{'D': datetime.now().astimezone(timezone.utc)}])
df_None = pd.DataFrame([{'D': None}])
# Note that the tz below is ignored even though specified
df_Nat = pd.DataFrame([{'D': pd.Timestamp(None,tz=timezone.utc)}])
print('df:\n', df['D'])
print('df_wtz:\n', df_wtz['D'])
print('df_None:\n', df_None['D'])
print('df_Nat:\n', df_Nat['D'])
print('df append df_None:\n', df.append(df_None, ignore_index=True, sort=False)['D'])
print('df append df_Nat:\n', df.append(df_Nat, ignore_index=True, sort=False)['D'])
print('df_wtz append df_None:\n', df_wtz.append(df_None, ignore_index=True, sort=False)['D'])
print('df_wtz append df_Nat:\n', df_wtz.append(df_Nat, ignore_index=True, sort=False)['D'])
以下是输出:
df:
0 2019-08-13 19:58:18.811492
Name: D, dtype: datetime64[ns]
df_wtz:
0 2019-08-13 19:58:18.811968+00:00
Name: D, **dtype: datetime64[ns, UTC]**
df_None:
0 None
Name: D, dtype: object
df_Nat:
0 NaT
Name: D, dtype: datetime64[ns]
df append df_None:
0 2019-08-13 19:58:18.811492
1 NaT
Name: D, dtype: datetime64[ns]
df append df_Nat:
0 2019-08-13 19:58:18.811492
1 NaT
Name: D, dtype: datetime64[ns]
df_wtz append df_None:
0 2019-08-13 19:58:18.811968+00:00
1 None
Name: D, dtype: object
df_wtz append df_Nat:
0 2019-08-13 19:58:18.811968+00:00
1 NaT
Name: D, dtype: object
我原本希望在datetime64[ns,UTC]列追加None或NaT时保留列类型,但事实并非如此。这是预期的行为还是会被视为错误?您可以通过以下方式将NaT放置在带有dtype
datetime64[ns,UTC]
的列中:
In [380]: df_Nat = pd.DataFrame({'D': pd.to_datetime([None], utc=True)}); df_Nat
Out[380]:
D
0 NaT
In [381]: df_Nat.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 1 columns):
D 0 non-null datetime64[ns, UTC]
dtypes: datetime64[ns, UTC](1)
memory usage: 88.0 bytes
屈服
df_wtz append df_Nat:
0 2019-08-13 20:28:15.928023+00:00
1 NaT
Name: D, dtype: datetime64[ns, UTC]
NaT本身不知道时区:
In [383]: pd.Timestamp(None) is pd.Timestamp(None, tz=utc)
Out[383]: True
因此,pd.DataFrame([{'D':pd.Timestamp(None,tz=utc)}])
不会生成具有时区感知数据类型的列
由于无法使数据帧从NaT本身推断出时区感知数据类型,
我们需要构建一个容器(如Series或DatetimeIndex),它已经具有正确的时区感知数据类型。这就是pd.to_datetime([None],utc=True)所做的:
In [385]: pd.to_datetime([None], utc=True)
Out[385]: DatetimeIndex(['NaT'], dtype='datetime64[ns, UTC]', freq=None)
谢谢-这对我有用。我仍然很遗憾,添加None会修改列类型,尽管。。。
In [385]: pd.to_datetime([None], utc=True)
Out[385]: DatetimeIndex(['NaT'], dtype='datetime64[ns, UTC]', freq=None)