Python 在构建模型时使用datetime64要素类型?

Python 在构建模型时使用datetime64要素类型?,python,pandas,machine-learning,type-conversion,boosting,Python,Pandas,Machine Learning,Type Conversion,Boosting,我有一个数据框架,其中包括大约50个功能。在我的实验中,我遇到了一个分类问题,所以我想通过“GradientBoostingClassifier”来训练模型。数据帧(mydata)被视为一个训练集。这50个特性中的一个(特征20)是一个日期,并且我需要在训练集中考虑这个特性,所以我尝试将日期转换为DATETEM64如下: mydata['feature20']=pd.to_datetime(mydata['feature20']) 现在,当我尝试使用分类器训练模型时,它给出了以下错误:

我有一个数据框架,其中包括大约50个功能。在我的实验中,我遇到了一个分类问题,所以我想通过“GradientBoostingClassifier”来训练模型。数据帧(mydata)被视为一个训练集。这50个特性中的一个(特征20)是一个日期,并且我需要在训练集中考虑这个特性,所以我尝试将日期转换为DATETEM64如下:

  mydata['feature20']=pd.to_datetime(mydata['feature20'])
现在,当我尝试使用分类器训练模型时,它给出了以下错误:

  float() argument must be a string or a number, not 'Timestamp'

有没有办法解决这个问题?

您可以轻松地将日期转换为整数:df[“feature20”].astype(“int64”)//10**9

注意:但保持datetime功能不变不是一个好主意,除非您使用的是时间序列。通常,您希望从该日期时间中提取其他信息—周中的某一天、月中的某一天、年中的某一周、月等


演示:

如果精度为微秒:

In [28]: df = pd.DataFrame({'feature20':pd.date_range('2010-01-01 01:01:01.123456', freq="123S", periods=10)})

In [29]: df
Out[29]:
                   feature20
0 2010-01-01 01:01:01.123456
1 2010-01-01 01:03:04.123456
2 2010-01-01 01:05:07.123456
3 2010-01-01 01:07:10.123456
4 2010-01-01 01:09:13.123456
5 2010-01-01 01:11:16.123456
6 2010-01-01 01:13:19.123456
7 2010-01-01 01:15:22.123456
8 2010-01-01 01:17:25.123456
9 2010-01-01 01:19:28.123456

In [30]: df["new"] = df["feature20"].astype("int64") // 10**3

In [31]: df
Out[31]:
                   feature20               new
0 2010-01-01 01:01:01.123456  1262307661123456
1 2010-01-01 01:03:04.123456  1262307784123456
2 2010-01-01 01:05:07.123456  1262307907123456
3 2010-01-01 01:07:10.123456  1262308030123456
4 2010-01-01 01:09:13.123456  1262308153123456
5 2010-01-01 01:11:16.123456  1262308276123456
6 2010-01-01 01:13:19.123456  1262308399123456
7 2010-01-01 01:15:22.123456  1262308522123456
8 2010-01-01 01:17:25.123456  1262308645123456
9 2010-01-01 01:19:28.123456  1262308768123456

In [32]: df["date"] = pd.to_datetime(df["new"], unit="us")

In [33]: df
Out[33]:
                   feature20               new                       date
0 2010-01-01 01:01:01.123456  1262307661123456 2010-01-01 01:01:01.123456
1 2010-01-01 01:03:04.123456  1262307784123456 2010-01-01 01:03:04.123456
2 2010-01-01 01:05:07.123456  1262307907123456 2010-01-01 01:05:07.123456
3 2010-01-01 01:07:10.123456  1262308030123456 2010-01-01 01:07:10.123456
4 2010-01-01 01:09:13.123456  1262308153123456 2010-01-01 01:09:13.123456
5 2010-01-01 01:11:16.123456  1262308276123456 2010-01-01 01:11:16.123456
6 2010-01-01 01:13:19.123456  1262308399123456 2010-01-01 01:13:19.123456
7 2010-01-01 01:15:22.123456  1262308522123456 2010-01-01 01:15:22.123456
8 2010-01-01 01:17:25.123456  1262308645123456 2010-01-01 01:17:25.123456
9 2010-01-01 01:19:28.123456  1262308768123456 2010-01-01 01:19:28.123456

这与pandas的关系不如与您正在使用的ML库的关系大,因为您没有标记它。这是比“datetime”标记更有用的信息。您还没有展示如何将数据传递给模型您可以轻松地将日期转换为整数:
df[“feature20”].astype(“int64”)//10**9
。但是,除非您正在处理时间序列,否则保持日期时间功能不变不是一个好主意。通常,您会希望从该日期时间提取其他信息-周中的天、月中的天、年中的周、月等。是的,这可能是一个选项,但如何将它们重新转换为日期时间?
In [28]: df = pd.DataFrame({'feature20':pd.date_range('2010-01-01 01:01:01.123456', freq="123S", periods=10)})

In [29]: df
Out[29]:
                   feature20
0 2010-01-01 01:01:01.123456
1 2010-01-01 01:03:04.123456
2 2010-01-01 01:05:07.123456
3 2010-01-01 01:07:10.123456
4 2010-01-01 01:09:13.123456
5 2010-01-01 01:11:16.123456
6 2010-01-01 01:13:19.123456
7 2010-01-01 01:15:22.123456
8 2010-01-01 01:17:25.123456
9 2010-01-01 01:19:28.123456

In [30]: df["new"] = df["feature20"].astype("int64") // 10**3

In [31]: df
Out[31]:
                   feature20               new
0 2010-01-01 01:01:01.123456  1262307661123456
1 2010-01-01 01:03:04.123456  1262307784123456
2 2010-01-01 01:05:07.123456  1262307907123456
3 2010-01-01 01:07:10.123456  1262308030123456
4 2010-01-01 01:09:13.123456  1262308153123456
5 2010-01-01 01:11:16.123456  1262308276123456
6 2010-01-01 01:13:19.123456  1262308399123456
7 2010-01-01 01:15:22.123456  1262308522123456
8 2010-01-01 01:17:25.123456  1262308645123456
9 2010-01-01 01:19:28.123456  1262308768123456

In [32]: df["date"] = pd.to_datetime(df["new"], unit="us")

In [33]: df
Out[33]:
                   feature20               new                       date
0 2010-01-01 01:01:01.123456  1262307661123456 2010-01-01 01:01:01.123456
1 2010-01-01 01:03:04.123456  1262307784123456 2010-01-01 01:03:04.123456
2 2010-01-01 01:05:07.123456  1262307907123456 2010-01-01 01:05:07.123456
3 2010-01-01 01:07:10.123456  1262308030123456 2010-01-01 01:07:10.123456
4 2010-01-01 01:09:13.123456  1262308153123456 2010-01-01 01:09:13.123456
5 2010-01-01 01:11:16.123456  1262308276123456 2010-01-01 01:11:16.123456
6 2010-01-01 01:13:19.123456  1262308399123456 2010-01-01 01:13:19.123456
7 2010-01-01 01:15:22.123456  1262308522123456 2010-01-01 01:15:22.123456
8 2010-01-01 01:17:25.123456  1262308645123456 2010-01-01 01:17:25.123456
9 2010-01-01 01:19:28.123456  1262308768123456 2010-01-01 01:19:28.123456