Python 从Datetime操作创建TimeDelta时出错

Python 从Datetime操作创建TimeDelta时出错,python,datetime,numpy,pandas,timedelta,Python,Datetime,Numpy,Pandas,Timedelta,我看了其他几个相关的问题,没有一个和我遇到过完全相同的问题 我使用的是熊猫版本0.16.2。我在一个数据框中有几列,数据类型为datetime64[ns]: In [6]: date_list = ["SubmittedDate","PolicyStartDate", "PaidUpDate", "MaturityDate", "DraftDate", "CurrentValuationDate", "DOB", "InForceDate"] In [11]: data[date_list].

我看了其他几个相关的问题,没有一个和我遇到过完全相同的问题

我使用的是熊猫版本0.16.2。我在一个数据框中有几列,数据类型为datetime64[ns]:

In [6]: date_list = ["SubmittedDate","PolicyStartDate", "PaidUpDate", "MaturityDate", "DraftDate", "CurrentValuationDate", "DOB", "InForceDate"]

In [11]: data[date_list].head()

Out[11]:
      SubmittedDate PolicyStartDate PaidUpDate MaturityDate DraftDate  \
    0           NaT      2002-11-18        NaT   2041-03-04       NaT
    1           NaT      2015-01-13        NaT          NaT       NaT
    2           NaT      2014-10-15        NaT          NaT       NaT
    3           NaT      2009-08-27        NaT          NaT       NaT
    4           NaT      2007-04-19        NaT   2013-10-01       NaT

      CurrentValuationDate        DOB InForceDate
    0           2015-04-30 1976-03-04  2002-11-18
    1                  NaT 1949-09-27  2015-01-13
    2                  NaT 1947-06-15  2014-10-15
    3           2015-07-30 1960-06-07  2009-08-27
    4           2010-04-21 1950-10-01  2007-04-19
它们最初是字符串格式(例如“1976-03-04”),我使用以下方法将其转换为datetime对象:

In [7]: for datecol in date_list:
   ...:         data[datecol] = pd.to_datetime(data[datecol], coerce=True, errors = 'raise')
以下是每列的数据类型:

In [8]: for datecol in date_list:
              print data[datecol].dtypes
返回:

datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]
到目前为止,一切顺利。但我想做的是为这些列中的每一列创建一个新列,该列给出从某个日期算起的日期(以天为单位)(作为整数)

In [13]: current_date = pd.to_datetime("2015-07-31")
我首先运行的是:

In [14]: for i in date_list:
   ....:         data[i+"InDays"] = data[i].apply(lambda x: current_date - x)
但是,当我检查返回列的数据类型时:

In [15]: for datecol in date_list:
   ....:         print data[datecol + "InDays"].dtypes
我得到这些:

object
timedelta64[ns]
object
timedelta64[ns]
object
timedelta64[ns]
timedelta64[ns]
timedelta64[ns]
我不知道为什么其中三个是对象,而它们应该是时间增量。接下来我想做的是:

In [16]: for i in date_list:
   ....:         data[i+"InDays"] = data[i+"InDays"].dt.days
这种方法适用于timedelta列。但是,由于其中三列不是TimeDelta,因此出现以下错误:

AttributeError: Can only use .dt accessor with datetimelike values

我怀疑这三列中有一些值阻止Pandas将它们转换为TimeDelta。我不知道如何计算出这些值可能是什么。

出现这个问题是因为您有三个列,其中只有
NaT
值,这导致当您对这些列应用条件时,这些列被视为对象

您应该在
apply
部分中添加某种条件,以便在
NaT
的情况下默认为某种时间增量。范例-

for i in date_list:
    data[i+"InDays"] = data[i].apply(lambda x: current_date - x if x is not pd.NaT else pd.Timedelta(0))

或者,如果您无法执行上述操作,则应设置一个条件,即您希望执行的操作-
data[i+“InDays”]=data[i+“InDays”].dt.days
,仅当序列的
dtype
允许时才执行

或者一种更简单的方法来更改
apply
部分,以直接获得您想要的内容-

for i in date_list:
    data[i+"InDays"] = data[i].apply(lambda x: (current_date - x).days if x is not pd.NaT else x)
这将产生-

In [110]: data
Out[110]:
  SubmittedDate PolicyStartDate PaidUpDate MaturityDate DraftDate  \
0           NaT      2002-11-18        NaT   2041-03-04       NaT
1           NaT      2015-01-13        NaT          NaT       NaT
2           NaT      2014-10-15        NaT          NaT       NaT
3           NaT      2009-08-27        NaT          NaT       NaT
4           NaT      2007-04-19        NaT   2013-10-01       NaT

  CurrentValuationDate        DOB InForceDate SubmittedDateInDays  \
0           2015-04-30 1976-03-04  2002-11-18                 NaT
1                  NaT 1949-09-27  2015-01-13                 NaT
2                  NaT 1947-06-15  2014-10-15                 NaT
3           2015-07-30 1960-06-07  2009-08-27                 NaT
4           2010-04-21 1950-10-01  2007-04-19                 NaT

   PolicyStartDateInDays PaidUpDateInDays MaturityDateInDays DraftDateInDays  \
0                   4638              NaT              -9348             NaT
1                    199              NaT                NaN             NaT
2                    289              NaT                NaN             NaT
3                   2164              NaT                NaN             NaT
4                   3025              NaT                668             NaT

  CurrentValuationDateInDays  DOBInDays  InForceDateInDays
0                         92      14393               4638
1                        NaN      24048                199
2                        NaN      24883                289
3                          1      20142               2164
4                       1927      23679               3025
如果要将
NaT
更改为
NaN
,可以使用-

for i in date_list:
    data[i+"InDays"] = data[i].apply(lambda x: (current_date - x).days if x is not pd.NaT else np.NaN)
示例/演示-

In [114]: for i in date_list:
   .....:     data[i+"InDays"] = data[i].apply(lambda x: (current_date - x).days if x is not pd.NaT else np.NaN)
   .....:

In [115]: data
Out[115]:
  SubmittedDate PolicyStartDate PaidUpDate MaturityDate DraftDate  \
0           NaT      2002-11-18        NaT   2041-03-04       NaT
1           NaT      2015-01-13        NaT          NaT       NaT
2           NaT      2014-10-15        NaT          NaT       NaT
3           NaT      2009-08-27        NaT          NaT       NaT
4           NaT      2007-04-19        NaT   2013-10-01       NaT

  CurrentValuationDate        DOB InForceDate  SubmittedDateInDays  \
0           2015-04-30 1976-03-04  2002-11-18                  NaN
1                  NaT 1949-09-27  2015-01-13                  NaN
2                  NaT 1947-06-15  2014-10-15                  NaN
3           2015-07-30 1960-06-07  2009-08-27                  NaN
4           2010-04-21 1950-10-01  2007-04-19                  NaN

   PolicyStartDateInDays  PaidUpDateInDays  MaturityDateInDays  \
0                   4638               NaN               -9348
1                    199               NaN                 NaN
2                    289               NaN                 NaN
3                   2164               NaN                 NaN
4                   3025               NaN                 668

   DraftDateInDays  CurrentValuationDateInDays  DOBInDays  InForceDateInDays
0              NaN                          92      14393               4638
1              NaN                         NaN      24048                199
2              NaN                         NaN      24883                289
3              NaN                           1      20142               2164
4              NaN                        1927      23679               3025

谢谢,但这三列不是只有NaT值的情况;我显示的数据[date\u list].head()中没有任何内容。另外,我不想将NAT转换为Timedelta(0),我只想忽略它们。我尝试使用以下更改运行代码:for I in date_list:data[I+“InDays”]=data[I]。应用(lambda x:current_date-x如果x不是pd.NaT else pd.NaT),但输出列仍然是objectsno,仍然不起作用。这三列的输出中有NAN而不是NAT。这就像熊猫在之后对这些列执行某些操作,将它们转换回对象。请检查最新更新,如果仍然不起作用,请告诉我,或者您确实希望将
TimeDelta()
中的数据用于其他计算。很高兴我能提供帮助。